1
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
2
|
Gupta M, Chauhan R, Prasad Y, Wadhwa G, Jain CK. Protein-protein interaction and molecular dynamics analysis for identification of novel inhibitors in Burkholderia cepacia GG4. Comput Biol Chem 2016; 65:80-90. [PMID: 27776248 DOI: 10.1016/j.compbiolchem.2016.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Revised: 09/24/2016] [Accepted: 10/06/2016] [Indexed: 11/25/2022]
Abstract
The lack of complete treatments and appearance of multiple drug-resistance strains of Burkholderia cepacia complex (Bcc) are causing an increased risk of lung infections in cystic fibrosis patients. Bcc infection is a big risk to human health and demands an urgent need to identify new therapeutics against these bacteria. Network biology has emerged as one of the prospective hope in identifying novel drug targets and hits. We have applied protein-protein interaction methodology to identify new drug-target candidates (orthologs) in Burkhloderia cepacia GG4, which is an important strain for studying the quorum-sensing phenomena. An evolutionary based ortholog mapping approach has been applied for generating the large scale protein-protein interactions in B. Cepacia. As a case study, one of the identified drug targets; GEM_3202, a NH (3)-dependent NAD synthetase protein has been studied and the potential ligand molecules were screened using the ZINC database. The three dimensional structure (NH (3)-dependent NAD synthetase protein) has been predicted from MODELLERv9.11 tool using multiple PDB templates such as 3DPI, 2PZ8 and 1NSY with sequence identity of 76%, 50% and 50% respectively. The structure has been validated with Ramachandaran plot having 100% residues of NadE in allowed region and overall quality factor of 81.75 using ERRAT tool. High throughput screening and Vina resulted in two potential hits against NadE such as ZINC83103551 and ZINC38008121. These molecules showed lowest binding energy of -5.7kcalmol-1 and high stability in the binding pockets during molecular dynamics simulation analysis. The similar approach for target identification could be applied for clinical strains of other pathogenic microbes.
Collapse
Affiliation(s)
- Money Gupta
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India
| | - Rashi Chauhan
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India
| | - Yamuna Prasad
- Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, 110016, India
| | - Gulshan Wadhwa
- Department of Biotechnology (DBT), Ministry of Science & Technology, New Delhi-110003, India
| | - Chakresh Kumar Jain
- Department of Biotechnology, Jaypee Institute of Information Technology, A-10, Sector-62, Noida, Uttar Pradesh, 201307, India.
| |
Collapse
|
3
|
Rid R, Strasser W, Siegl D, Frech C, Kommenda M, Kern T, Hintner H, Bauer JW, Önder K. PRIMOS: an integrated database of reassessed protein-protein interactions providing web-based access to in silico validation of experimentally derived data. Assay Drug Dev Technol 2014; 11:333-46. [PMID: 23772554 DOI: 10.1089/adt.2013.506] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Steady improvements in proteomics present a bioinformatic challenge to retrieve, store, and process the accumulating and often redundant amount of information. In particular, a large-scale comparison and analysis of protein-protein interaction (PPI) data requires tools for data interpretation as well as validation. At this juncture, the Protein Interaction and Molecule Search (PRIMOS) platform represents a novel web portal that unifies six primary PPI databases (BIND, Biomolecular Interaction Network Database; DIP, Database of Interacting Proteins; HPRD, Human Protein Reference Database; IntAct; MINT, Molecular Interaction Database; and MIPS, Munich Information Center for Protein Sequences) into a single consistent repository, which currently includes more than 196,700 redundancy-removed PPIs. PRIMOS supports three advanced search strategies centering on disease-relevant PPIs, on inter- and intra-organismal crosstalk relations (e.g., pathogen-host interactions), and on highly connected protein nodes analysis ("hub" identification). The main novelties distinguishing PRIMOS from other secondary PPI databases are the reassessment of known PPIs, and the capacity to validate personal experimental data by our peer-reviewed, homology-based validation. This article focuses on definite PRIMOS use cases (presentation of embedded biological concepts, example applications) to demonstrate its broad functionality and practical value. PRIMOS is publicly available at http://primos.fh-hagenberg.at.
Collapse
Affiliation(s)
- Raphaela Rid
- Division of Molecular Dermatology, Department of Dermatology, Paracelsus Medical University Salzburg, Salzburg, Austria
| | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Ahmed MH, Habtemariam M, Safo MK, Scarsdale JN, Spyrakis F, Cozzini P, Mozzarelli A, Kellogg GE. Unintended consequences? Water molecules at biological and crystallographic protein–protein interfaces. Comput Biol Chem 2013; 47:126-41. [DOI: 10.1016/j.compbiolchem.2013.08.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 08/27/2013] [Accepted: 08/27/2013] [Indexed: 01/31/2023]
|
5
|
Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins. PLoS One 2013; 8:e79606. [PMID: 24260261 PMCID: PMC3832534 DOI: 10.1371/journal.pone.0079606] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 09/24/2013] [Indexed: 11/20/2022] Open
Abstract
Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.
Collapse
|
6
|
Diversity in genetic in vivo methods for protein-protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system. Microbiol Mol Biol Rev 2012; 76:331-82. [PMID: 22688816 DOI: 10.1128/mmbr.05021-11] [Citation(s) in RCA: 135] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The yeast two-hybrid system pioneered the field of in vivo protein-protein interaction methods and undisputedly gave rise to a palette of ingenious techniques that are constantly pushing further the limits of the original method. Sensitivity and selectivity have improved because of various technical tricks and experimental designs. Here we present an exhaustive overview of the genetic approaches available to study in vivo binary protein interactions, based on two-hybrid and protein fragment complementation assays. These methods have been engineered and employed successfully in microorganisms such as Saccharomyces cerevisiae and Escherichia coli, but also in higher eukaryotes. From single binary pairwise interactions to whole-genome interactome mapping, the self-reassembly concept has been employed widely. Innovative studies report the use of proteins such as ubiquitin, dihydrofolate reductase, and adenylate cyclase as reconstituted reporters. Protein fragment complementation assays have extended the possibilities in protein-protein interaction studies, with technologies that enable spatial and temporal analyses of protein complexes. In addition, one-hybrid and three-hybrid systems have broadened the types of interactions that can be studied and the findings that can be obtained. Applications of these technologies are discussed, together with the advantages and limitations of the available assays.
Collapse
|
7
|
Som A, Luštrek M, Singh NK, Fuellen G. Derivation of an interaction/regulation network describing pluripotency in human. Gene 2012; 502:99-107. [PMID: 22548825 DOI: 10.1016/j.gene.2012.04.025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2012] [Revised: 03/21/2012] [Accepted: 04/09/2012] [Indexed: 01/08/2023]
Abstract
Identification of the key genes/proteins of pluripotency and their interrelationships is an important step in understanding the induction and maintenance of pluripotency. Experimental approaches have accumulated large amounts of interaction/regulation data in mouse. We investigate how far such information can be transferred to human, the species of maximum interest, for which experimental data are much more limited. To address this issue, we mapped an existing mouse pluripotency network (the PluriNetWork) to human. We transferred interaction and regulation links between genes/proteins from mouse to human on the basis of orthologous relationship of the genes/proteins (called interolog mapping). To reduce the number of false positives, we used four different methods: phylogenetic profiling, Gene Ontology semantic similarity, gene co-expression, and RNA interference (RNAi) data. The methods and the resulting networks were evaluated by a novel approach using the information about the genes known to be involved in pluripotency from the literature. The RNAi method proved best for filtering out unlikely interactions, so it was used to construct the final human pluripotency network. The RNAi data are based on human embryonic stem cells (hESCs) that are generally considered to be in a (primed) epiblast stem cell state. Therefore, we assume that the final human network may reflect the (primed) epiblast stem cell state more closely, while the mouse network reflects the (unprimed/naïve) embryonic stem cell state more closely.
Collapse
Affiliation(s)
- Anup Som
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Ernst-Heydemann-Str. 8, 18057, Rostock, Germany
| | | | | | | |
Collapse
|
8
|
Acuner Ozbabacan SE, Engin HB, Gursoy A, Keskin O. Transient protein-protein interactions. Protein Eng Des Sel 2011; 24:635-48. [DOI: 10.1093/protein/gzr025] [Citation(s) in RCA: 170] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
|
9
|
Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics 2009; 10 Suppl 1:S16. [PMID: 19594875 PMCID: PMC2709259 DOI: 10.1186/1471-2164-10-s1-s16] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Human protein-protein interaction (PPIs) data are the foundation for understanding molecular signalling networks and the functional roles of biomolecules. Several human PPI databases have become available; however, comparisons of these datasets have suggested limited data coverage and poor data quality. Ongoing collection and integration of human PPIs from different sources, both experimentally and computationally, can enable disease-specific network biology modelling in translational bioinformatics studies. Results We developed a new web-based resource, the Human Annotated and Predicted Protein Interaction (HAPPI) database, located at . The HAPPI database was created by extracting and integrating publicly available protein interaction databases, including HPRD, BIND, MINT, STRING, and OPHID, using database integration techniques. We designed a unified entity-relationship data model to resolve semantic level differences of diverse concepts involved in PPI data integration. We applied a unified scoring model to give each PPI a measure of its reliability that can place each PPI at one of the five star rank levels from 1 to 5. We assessed the quality of PPIs contained in the new HAPPI database, using evolutionary conserved co-expression pairs called "MetaGene" pairs to measure the extent of MetaGene pair and PPI pair overlaps. While the overall quality of the HAPPI database across all star ranks is comparable to the overall qualities of HPRD or IntNetDB, the subset of the HAPPI database with star ranks between 3 and 5 has a much higher average quality than all other human PPI databases. As of summer 2008, the database contains 142,956 non-redundant, medium to high-confidence level human protein interaction pairs among 10,592 human proteins. The HAPPI database web application also provides …” should be “The HAPPI database web application also provides hyperlinked information of genes, pathways, protein domains, protein structure displays, and sequence feature maps for interactive exploration of PPI data in the database. Conclusion HAPPI is by far the most comprehensive public compilation of human protein interaction information. It enables its users to fully explore PPI data with quality measures and annotated information necessary for emerging network biology studies.
Collapse
Affiliation(s)
- Jake Yue Chen
- School of Informatics, Indiana University - Purdue University, Indianapolis, IN, USA.
| | | | | |
Collapse
|
10
|
De Bodt S, Proost S, Vandepoele K, Rouzé P, Van de Peer Y. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics 2009; 10:288. [PMID: 19563678 PMCID: PMC2719670 DOI: 10.1186/1471-2164-10-288] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 06/29/2009] [Indexed: 12/31/2022] Open
Abstract
Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.
Collapse
Affiliation(s)
- Stefanie De Bodt
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|
11
|
|
12
|
Bhardwaj N, Lu H. Co-expression among constituents of a motif in the protein-protein interaction network. J Bioinform Comput Biol 2009; 7:1-17. [PMID: 19226657 DOI: 10.1142/s0219720009003959] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2008] [Revised: 09/19/2008] [Accepted: 09/22/2008] [Indexed: 11/18/2022]
Abstract
Almost all cellular functions are the results of well-coordinated interactions between various proteins. A more connected hub or motif in the interaction network is expected to be more important, and any perturbation in this motif would be more damaging to the smooth performance of the related functions. Thus, some coherent robustness of these hubs has to be derived. Here, we provide the global evidence that interaction hubs obtain their robustness against uneven protein concentrations through co-expression of the constituents, and that the degree of co-expression correlates strongly with the complexity of the embedded motif. We calculated the gene expression correlations between the proteins embedded in 3-, 4-, 5-, and 6-node interaction motifs of increasing complexities, and compared them to those between proteins from random motifs of similar complexities. We find that as the connectedness of these motifs increases, there is higher co-expression between the constituent proteins. For example, when the expression correlation is 0.7, the kernel density of the correlation increases from 0.152 for 4-node motifs with three edges to 0.403 for 4-node cliques. This implies that the robustness of the interaction system emerges from a proportionate synchronicity among the constituents of the motif via co-expression. We further show that such biological coherence via co-expression of component proteins can be reinforced by integrating conservation data in the analysis. For example, with addition of evolutionary information from other genomes, the ratio of kernel density for interaction and random data in the case of 5- and 6-node cliques in yeast increases from 37.8 to 123 and 98.4 to 1300, respectively, given that the expression correlation is 0.8. Our results show that genes whose products are involved in motifs have transcription and translation properties that minimize the noise in final protein concentrations, compared to random sets of genes.
Collapse
Affiliation(s)
- Nitin Bhardwaj
- Bioinformatics Program, University of Illinois at Chicago, 820 S. Woods Street, Room 103, Chicago, IL 60607, USA.
| | | |
Collapse
|
13
|
Frech C, Kommenda M, Dorfer V, Kern T, Hintner H, Bauer JW, Onder K. Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis. BMC Bioinformatics 2009; 10:21. [PMID: 19152684 PMCID: PMC2637843 DOI: 10.1186/1471-2105-10-21] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Accepted: 01/19/2009] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for PPI validation. Results To validate an experimentally observed PPI, we combine FASTA and PSI-BLAST to perform a sensitive sequence-based search for pairs of interacting homologous proteins within a large, integrated PPI database. A novel scoring scheme that incorporates both quality and quantity of all observed matches allows us (1) to consider also tentative paralogs and orthologs in this analysis and (2) to combine search results from more than one homology detection method. ROC curves illustrate the high efficacy of this approach and its improvement over other homology-based validation methods. Conclusion New PPIs are primarily derived from preexisting PPIs and not invented de novo. Thus, the hallmark of true PPIs is the existence of homologous PPIs. The sensitive search for homologous PPIs within a large body of known PPIs is an efficient strategy to separate biologically relevant PPIs from the many spurious PPIs reported by high-throughput experiments.
Collapse
Affiliation(s)
- Christian Frech
- Upper Austria University of Applied Sciences, Hagenberg, Austria.
| | | | | | | | | | | | | |
Collapse
|
14
|
Sadanandam A, Varney ML, Singh RK. Identification of semaphorin 5A interacting protein by applying apriori knowledge and peptide complementarity related to protein evolution and structure. GENOMICS, PROTEOMICS & BIOINFORMATICS 2008; 6:163-74. [PMID: 19329067 PMCID: PMC5054137 DOI: 10.1016/s1672-0229(09)60004-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In the post-genomic era, various computational methods that predict protein-protein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root-Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.
Collapse
Affiliation(s)
| | | | - Rakesh K. Singh
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198-5845, USA
| |
Collapse
|
15
|
Riley R, Pellegrini M, Eisenberg D. Identifying cognate binding pairs among a large set of paralogs: the case of PE/PPE proteins of Mycobacterium tuberculosis. PLoS Comput Biol 2008; 4:e1000174. [PMID: 18787688 PMCID: PMC2519833 DOI: 10.1371/journal.pcbi.1000174] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 08/01/2008] [Indexed: 12/19/2022] Open
Abstract
We consider the problem of how to detect cognate pairs of proteins that bind when each belongs to a large family of paralogs. To illustrate the problem, we have undertaken a genomewide analysis of interactions of members of the PE and PPE protein families of Mycobacterium tuberculosis. Our computational method uses structural information, operon organization, and protein coevolution to infer the interaction of PE and PPE proteins. Some 289 PE/PPE complexes were predicted out of a possible 5,590 PE/PPE pairs genomewide. Thirty-five of these predicted complexes were also found to have correlated mRNA expression, providing additional evidence for these interactions. We show that our method is applicable to other protein families, by analyzing interactions of the Esx family of proteins. Our resulting set of predictions is a starting point for genomewide experimental interaction screens of the PE and PPE families, and our method may be generally useful for detecting interactions of proteins within families having many paralogs. We consider the problem of detecting protein interactions from genome sequences when the potential interacting partners belong to large families of similar (homologous) proteins. Many computational methods for predicting protein interactions rely on similarity to a pair of known interacting proteins. When the proteins in question are members of large groups of similar proteins within the same organism (paralogs), the problem of inferring the correct interactions becomes difficult. To illustrate the problem, we undertook prediction of interactions of some highly expanded protein families of Mycobacterium tuberculosis (Mtb), which are believed to contribute to the bacterium's ability to infect human beings. To generate predictions, we analyzed patterns of coevolution in a small subset of likely interacting proteins, and extended these patterns to predict additional interactions throughout the genome. Our results provide a map for experimental probes of the Mtb interaction network, for the benefit of drug and vaccine discovery. More generally, our procedure is applicable to detecting interactions of proteins that belong to large families of paralogs in any organism with a sequenced genome.
Collapse
Affiliation(s)
- Robert Riley
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Howard Hughes Medical Institute, University of California Los Angeles, Los Angeles, California, United States of America
- UCLA–DOE Institute of Genomics and Proteomics, University of California Los Angeles, Los Angeles, California, United States of America
- Genome Biology Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Matteo Pellegrini
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, Los Angeles, California, United States of America
| | - David Eisenberg
- Howard Hughes Medical Institute, University of California Los Angeles, Los Angeles, California, United States of America
- UCLA–DOE Institute of Genomics and Proteomics, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
16
|
Tirosh I, Bilu Y, Barkai N. Comparative biology: beyond sequence analysis. Curr Opin Biotechnol 2007; 18:371-7. [PMID: 17693073 DOI: 10.1016/j.copbio.2007.07.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 07/12/2007] [Indexed: 12/18/2022]
Abstract
Comparative analysis is a fundamental tool in biology. Conservation among species greatly assists the detection and characterization of functional elements, whereas inter-species differences are probably the best indicators of biological adaptation. Traditionally, comparative approaches were applied to the analysis of genomic sequences. With the growing availability of functional genomic data, comparative paradigms are now being extended also to the study of other functional attributes, most notably the gene expression. Here we review recent works applying comparative analysis to large-scale gene expression datasets and discuss the central principles and challenges of such approaches.
Collapse
Affiliation(s)
- Itay Tirosh
- Department of Molecular Genetics, Weizmann Institute of Science, 76100 Rehovot, Israel
| | | | | |
Collapse
|
17
|
Jothi R, Przytycka TM, Aravind L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 2007; 8:173. [PMID: 17521444 PMCID: PMC1904249 DOI: 10.1186/1471-2105-8-173] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
18
|
Jothi R, Cherukuri PF, Tasneem A, Przytycka TM. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol 2006; 362:861-75. [PMID: 16949097 PMCID: PMC1618801 DOI: 10.1016/j.jmb.2006.07.072] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2006] [Revised: 06/19/2006] [Accepted: 07/14/2006] [Indexed: 11/28/2022]
Abstract
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain-domain interactions. Given a protein-protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| | - Praveen F. Cherukuri
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Bioinformatics Program Boston University, Boston, MA 02215, USA
| | - Asba Tasneem
- Booz Allen Hamilton Inc., Rockville, MD 20852, USA
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| |
Collapse
|
19
|
Davis FP, Braberg H, Shen MY, Pieper U, Sali A, Madhusudhan M. Protein complex compositions predicted by structural similarity. Nucleic Acids Res 2006; 34:2943-52. [PMID: 16738133 PMCID: PMC1474056 DOI: 10.1093/nar/gkl353] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Proteins function through interactions with other molecules. Thus, the network of physical interactions among proteins is of great interest to both experimental and computational biologists. Here we present structure-based predictions of 3387 binary and 1234 higher order protein complexes in Saccharomyces cerevisiae involving 924 and 195 proteins, respectively. To generate candidate complexes, comparative models of individual proteins were built and combined together using complexes of known structure as templates. These candidate complexes were then assessed using a statistical potential, derived from binary domain interfaces in PIBASE (). The statistical potential discriminated a benchmark set of 100 interface structures from a set of sequence-randomized negative examples with a false positive rate of 3% and a true positive rate of 97%. Moreover, the predicted complexes were also filtered using functional annotation and sub-cellular localization data. The ability of the method to select the correct binding mode among alternates is demonstrated for three camelid VHH domain—porcine α–amylase interactions. We also highlight the prediction of co-complexed domain superfamilies that are not present in template complexes. Through integration with MODBASE, the application of the method to proteomes that are less well characterized than that of S.cerevisiae will contribute to expansion of the structural and functional coverage of protein interaction space. The predicted complexes are deposited in MODBASE ().
Collapse
Affiliation(s)
- Fred P. Davis
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
| | - Hannes Braberg
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
| | - Min-Yi Shen
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
| | - Ursula Pieper
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
| | - Andrej Sali
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Correspondence may also be addressed to A. Sali. Tel: +1 415 514 4227; Fax: +1 415 514 4231;
| | - M.S. Madhusudhan
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- Department of Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, 1700 4th Street, Byers Hall, San Francisco, CA 94143-2552, USA
- To whom correspondence should be addressed. Tel: + 1 415 514 4232; Fax: +1 415 514 4231;
| |
Collapse
|
20
|
Mika S, Rost B. Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2006; 2:e79. [PMID: 16854211 PMCID: PMC1513270 DOI: 10.1371/journal.pcbi.0020079] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Indexed: 11/21/2022] Open
Abstract
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/ The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.
Collapse
Affiliation(s)
- Sven Mika
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
21
|
Dolinski K, Botstein D. Changing perspectives in yeast research nearly a decade after the genome sequence. Genome Res 2006; 15:1611-9. [PMID: 16339358 DOI: 10.1101/gr.3727505] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Research with budding yeast (Saccharomyces cerevisiae) has been transformed by the publication, nearly a decade ago, of the entire genome DNA sequence. The introduction of this first eukaryotic genomic sequence changed the yeast research environment significantly, not just because of dramatic progress in technical means but also because the sequence made accessible a new class of scientific questions. A central goal of yeast research remains the determination of the biological role of every sequence feature in the yeast genome. The most remarkable change has been the shift in perspective from focus on individual genes and functionalities to a more global view of how the cellular networks and systems interact and function together to produce the highly evolved organism we see today.
Collapse
Affiliation(s)
- Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544 USA
| | | |
Collapse
|