Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data
Gaston K. Mazandu¹ and Nicola J. Mulder*¹

(1) Computational Biology Group, Department of Clinical Laboratory Sciences
     Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Observatory 7925, South Africa
* Corresponding author

Email: Gaston K. Mazandu - <gmazandu at cbio.uct.ac.za>; Nicola J. Mulder - <Nicola.Mulder at uct.ac.za>

Abstract

Motivation: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to exploit these data effectively and elucidate local and genome wide functional connections between protein pairs by integrating data from these sources, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, and functional properties of a protein can often be predicted by knowing only some of its sub-sequence characteristics. These sub-sequences, or domains, are considered to be features related to functional aspects of a protein and can thus be used to predict protein pairwise functional relationships using knowledge about shared evolutionary history. This helps us extract useful information from genomes through comparative, functional or structural genomics, and thus contributes to the function prediction process of uncharacterized proteins in order to capitalize on knowledge gained through sequencing efforts.

Results: In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved signature patterns. The proposed schemes are effective for data-driven scoring connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce the homology functional network of the organism with high confidence.

Availability: Protein pairwise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available here.

Acknowledgements

This work has been supported by the National Bioinformatics Network in South Africa and Computational Biology (CBIO) research group at the Institute of Infectious Disease and Molecular Medicine, University of Cape Town.