1
|
Chung HC, Friedberg I, Bromberg Y. Assembling bacterial puzzles: piecing together functions into microbial pathways. NAR Genom Bioinform 2024; 6:lqae109. [PMID: 39184378 PMCID: PMC11344244 DOI: 10.1093/nargab/lqae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/24/2024] [Accepted: 08/07/2024] [Indexed: 08/27/2024] Open
Abstract
Functional metagenomics enables the study of unexplored bacterial diversity, gene families, and pathways essential to microbial communities. However, discovering biological insights with these data is impeded by the scarcity of quality annotations. Here, we use a co-occurrence-based analysis of predicted microbial protein functions to uncover pathways in genomic and metagenomic biological systems. Our approach, based on phylogenetic profiles, improves the identification of functional relationships, or participation in the same biochemical pathway, between enzymes over a comparable homology-based approach. We optimized the design of our profiles to identify potential pathways using minimal data, clustered functionally related enzyme pairs into multi-enzymatic pathways, and evaluated our predictions against reference pathways in the KEGG database. We then demonstrated a novel extension of this approach to predict inter-bacterial protein interactions amongst members of a marine microbiome. Most significantly, we show our method predicts emergent biochemical pathways between known and unknown functions. Thus, our work establishes a basis for identifying the potential functional capacities of the entire metagenome, capturing previously unknown and abstract functions into discrete putative pathways.
Collapse
Affiliation(s)
- Henri C Chung
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 , USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Computer Science, Emory University, Atlanta, GA 30307, USA
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
2
|
Fang Y, Yang Y, Liu C. Evolutionary Relationships Between Dysregulated Genes in Oral Squamous Cell Carcinoma and Oral Microbiota. Front Cell Infect Microbiol 2022; 12:931011. [PMID: 35909962 PMCID: PMC9328420 DOI: 10.3389/fcimb.2022.931011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 06/20/2022] [Indexed: 11/30/2022] Open
Abstract
Oral squamous cell carcinoma (OSCC) is one of the most prevalent cancers in the world. Changes in the composition and abundance of oral microbiota are associated with the development and metastasis of OSCC. To elucidate the exact roles of the oral microbiota in OSCC, it is essential to reveal the evolutionary relationships between the dysregulated genes in OSCC progression and the oral microbiota. Thus, we interrogated the microarray and high-throughput sequencing datasets to obtain the transcriptional landscape of OSCC. After identifying differentially expressed genes (DEGs) with three different methods, pathway and functional analyses were also performed. A total of 127 genes were identified as common DEGs, which were enriched in extracellular matrix organization and cytokine related pathways. Furthermore, we established a predictive pipeline for detecting the coevolutionary of dysregulated host genes and microbial proteomes based on the homology method, and this pipeline was employed to analyze the evolutionary relations between the seven most dysregulated genes (MMP13, MMP7, MMP1, CXCL13, CRISPO3, CYP3A4, and CRNN) and microbiota obtained from the eHOMD database. We found that cytochrome P450 3A4 (CYP3A4), a member of the cytochrome P450 family of oxidizing enzymes, was associated with 45 microbes from the eHOMD database and involved in the oral habitat of Comamonas testosteroni and Arachnia rubra. The peptidase M10 family of matrix metalloproteinases (MMP13, MMP7, and MMP1) was associated with Lacticaseibacillus paracasei, Lacticaseibacillus rhamnosus, Streptococcus salivarius, Tannerella sp._HMT_286, and Streptococcus infantis in the oral cavity. Overall, this study revealed the dysregulated genes in OSCC and explored their evolutionary relationship with oral microbiota, which provides new insight for exploring the microbiota–host interactions in diseases.
Collapse
Affiliation(s)
- Yang Fang
- Department of Laboratory Medicine, Third Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yi Yang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Department of Periodontics, West China School and Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Chengcheng Liu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Department of Periodontics, West China School and Hospital of Stomatology, Sichuan University, Chengdu, China
- *Correspondence: Chengcheng Liu,
| |
Collapse
|
3
|
Pasternak Z, Chapnik N, Yosef R, Kopelman NM, Jurkevitch E, Segev E. Identifying protein function and functional links based on large-scale co-occurrence patterns. PLoS One 2022; 17:e0264765. [PMID: 35239724 PMCID: PMC8893610 DOI: 10.1371/journal.pone.0264765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/16/2022] [Indexed: 11/23/2022] Open
Abstract
Objective The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a software tool based on the exploration of co-occurrence patterns. Computational model Using a set of more than 23 million proteins divided into 404,947 orthologous clusters, we explored the co-occurrence graph of 4,742 fully sequenced genomes from the three domains of life. Edge weights in this graph represent co-occurrence probabilities. We use the Bron–Kerbosch algorithm to detect maximal cliques in this graph, fully-connected subgraphs that represent meaningful biological networks from different functional categories. Main results We demonstrate that Cliquely can successfully identify known networks from various pathways, including nitrogen fixation, glycolysis, methanogenesis, mevalonate and ribosome proteins. Identifying the virulence-associated type III secretion system (T3SS) network, Cliquely also added 13 previously uncharacterized novel proteins to the T3SS network, demonstrating the strength of this approach. Cliquely is freely available and open source. Users can employ the tool to explore co-occurrence networks using a protein of interest and a customizable level of stringency, either for the entire dataset or for a one of the three domains—Archaea, Bacteria, or Eukarya.
Collapse
Affiliation(s)
- Zohar Pasternak
- Division of Identification and Forensic Science, Israel Police, Jerusalem, Israel
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Noam Chapnik
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Roy Yosef
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Naama M. Kopelman
- Faculty of Science, Holon Institute of Technology, Holon, Israel
- * E-mail:
| | - Edouard Jurkevitch
- Department of Plant Pathology and Microbiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Elad Segev
- Faculty of Science, Holon Institute of Technology, Holon, Israel
| |
Collapse
|
4
|
Alborzi SZ, Ahmed Nacer A, Najjar H, Ritchie DW, Devignes MD. PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions. PLoS Comput Biol 2021; 17:e1008844. [PMID: 34370723 PMCID: PMC8376228 DOI: 10.1371/journal.pcbi.1008844] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 08/19/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022] Open
Abstract
Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called “PPIDM” (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described “CODAC” (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as “Gold-Standard” a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/. We revisit at a large scale the question of inferring DDIs from PPIs. Compared to previous studies, we take a unified approach accross multiple sources of PPIs. This approach is a method for inferring new edges in a tripartite graph setting and can be compared to link prediction approaches in knowledge graphs. Aggregation of several sources is performed using an optimized weighted average of the individual scores calculated in each source. A huge dataset of over 84K DDIs is produced which far exceeds the previous datasets. We show that a significant portion of the PPIDM dataset covers a large number of PPIs from curated (IMEx) or non curated (STRING) databases. Such a reservoir of DDIs deserves further exploration and can be combined with high-throughput methods such as cross-linking mass spectrometry to identify plausible protein partners of proteins of interest.
Collapse
|
5
|
Kurt F. An Insight into Oligopeptide Transporter 3 (OPT3) Family Proteins. Protein Pept Lett 2021; 28:43-54. [PMID: 32586240 DOI: 10.2174/0929866527666200625202028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 03/11/2020] [Accepted: 04/21/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND OPT3s are involved in the transport of Fe from xylem to phloem, in loading Fe into phloem, and in the transmission of shoot-to-root iron signaling. Yet, apart from Arabidopsis, little is known about these transporters'functions in other plant species. OBJECTIVE OPT3 proteins of several plant species were characterized using bioinformatical tools. Also, a probable Fe chelating protein, GSH, was used in docking analyses to shed light on the interactions of ligand binding sites of OPT3s. METHODS The multiple sequence alignment (MSA) analysis, protein secondary and tertiary structure analyses, molecular phylogeny analysis, transcription factor binding site analyses, co-expression and docking analyses were performed using up-to-date bioinformatical tools. RESULTS All OPT3s in this study appear to be transmembrane proteins. They appear to have broad roles and substrate specificities in different metabolic processes. OPT3 gene structures were highly conserved. Promoter analysis showed that bZIP, WRKY, Dof and AT-Hook Transcription factors (TFs) may regulate the expression of OPT3 genes. Consequently, they seemed to be taking part in both biotic and abiotic stress responses as well as growth and developmental processes. CONCLUSION The results showed that OPT3 proteins are involved in ROS regulation, plant stress responses, and basal pathogen resistance. They have species-specific roles in biological processes. Lastly, the transport of iron through OPT3s may occur with GSH according to the binding affinity results of the docking analyses.
Collapse
Affiliation(s)
- Fırat Kurt
- Department of Plant Production and Technologies, Faculty of Applied Sciences, Mus Alparslan University, Mus, Turkey
| |
Collapse
|
6
|
An Integrative Computational Approach for the Prediction of Human- Plasmodium Protein-Protein Interactions. BIOMED RESEARCH INTERNATIONAL 2021; 2020:2082540. [PMID: 33426052 PMCID: PMC7771252 DOI: 10.1155/2020/2082540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 11/08/2020] [Accepted: 12/04/2020] [Indexed: 12/27/2022]
Abstract
Host-pathogen molecular cross-talks are critical in determining the pathophysiology of a specific infection. Most of these cross-talks are mediated via protein-protein interactions between the host and the pathogen (HP-PPI). Thus, it is essential to know how some pathogens interact with their hosts to understand the mechanism of infections. Malaria is a life-threatening disease caused by an obligate intracellular parasite belonging to the Plasmodium genus, of which P. falciparum is the most prevalent. Several previous studies predicted human-plasmodium protein-protein interactions using computational methods have demonstrated their utility, accuracy, and efficiency to identify the interacting partners and therefore complementing experimental efforts to characterize host-pathogen interaction networks. To predict potential putative HP-PPIs, we use an integrative computational approach based on the combination of multiple OMICS-based methods including human red blood cells (RBC) and Plasmodium falciparum 3D7 strain expressed proteins, domain-domain based PPI, similarity of gene ontology terms, structure similarity method homology identification, and machine learning prediction. Our results reported a set of 716 protein interactions involving 302 human proteins and 130 Plasmodium proteins. This work provides a list of potential human-Plasmodium interacting proteins. These findings will contribute to better understand the mechanisms underlying the molecular determinism of malaria disease and potentially to identify candidate pharmacological targets.
Collapse
|
7
|
Tremblay BJM, Lobb B, Doxey AC. PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling. Bioinformatics 2021; 37:17-22. [PMID: 33416870 DOI: 10.1093/bioinformatics/btaa1105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 12/26/2020] [Accepted: 12/29/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Statistical detection of co-occurring genes across genomes, known as "phylogenetic profiling", is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation, and substantial computational requirements. RESULTS We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154,217,052 comparisons for 28,315 genes across 27,372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM, or KEGG query. In total, PhyloCorrelate detected 29,762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function. AVAILABILITY PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Briallen Lobb
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| | - Andrew C Doxey
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
8
|
Defosset A, Kress A, Nevers Y, Ripp R, Thompson JD, Poch O, Lecompte O. Proteome-Scale Detection of Differential Conservation Patterns at Protein and Subprotein Levels with BLUR. Genome Biol Evol 2020; 13:5991441. [PMID: 33211099 PMCID: PMC7851591 DOI: 10.1093/gbe/evaa248] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/18/2020] [Indexed: 11/23/2022] Open
Abstract
In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype–phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein–protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.
Collapse
Affiliation(s)
- Audrey Defosset
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| | - Yannis Nevers
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Raymond Ripp
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| | - Odile Lecompte
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, France
| |
Collapse
|
9
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
10
|
Rasti S, Vogiatzis C. A survey of computational methods in protein–protein interaction networks. ANNALS OF OPERATIONS RESEARCH 2019; 276:35-87. [DOI: 10.1007/s10479-018-2956-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
11
|
Soyemi J, Isewon I, Oyelade J, Adebiyi E. Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed. Curr Bioinform 2018; 13:396-406. [PMID: 31496926 PMCID: PMC6691774 DOI: 10.2174/1574893613666180108155851] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 12/31/2017] [Accepted: 01/02/2018] [Indexed: 01/01/2023]
Abstract
BACKGROUND Host-parasite protein interactions (HPPI) are those interactions occurring between a parasite and its host. Host-parasite protein interaction enhances the understanding of how parasite can infect its host. The interaction plays an important role in initiating infections, although it is not all host-parasite interactions that result in infection. Identifying the protein-protein interactions (PPIs) that allow a parasite to infect its host has a lot do in discovering possible drug targets. Such PPIs, when altered, would prevent the host from being infected by the parasite and in some cases, result in the parasite inability to complete specific stages of its life cycle and invariably lead to the death of such parasite. It therefore becomes important to understand the workings of host-parasite interactions which are the major causes of most infectious diseases. OBJECTIVE Many studies have been conducted in literature to predict HPPI, mostly using computational methods with few experimental methods. Computational method has proved to be faster and more efficient in manipulating and analyzing real life data. This study looks at various computational methods used in literature for host-parasite/inter-species protein-protein interaction predictions with the hope of getting a better insight into computational methods used and identify whether machine learning approaches have been extensively used for the same purpose. METHODS The various methods involved in host-parasite protein interactions were reviewed with their individual strengths. Tabulations of studies that carried out host-parasite/inter-species protein interaction predictions were performed, analyzing their predictive methods, filters used, potential protein-protein interactions discovered in those studies and various validation measurements used as the case may be. The commonly used measurement indexes for such studies were highlighted displaying the various formulas. Finally, future prospects of studies specific to human-plasmodium falciparum PPI predictions were proposed. RESULT We discovered that quite a few studies reviewed implemented machine learning approach for HPPI predictions when compared with methods such as sequence homology search and protein structure and domain-motif. The key challenge well noted in HPPI predictions is getting relevant information. CONCLUSION This review presents useful knowledge and future directions on the subject matter.
Collapse
Affiliation(s)
- Jumoke Soyemi
- Department of Computer Science, The Federal Polytechnic, Ilaro, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Itunnuoluwa Isewon
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| |
Collapse
|
12
|
Nourani E, Khunjush F, Durmuş S. Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data. MOLECULAR BIOSYSTEMS 2017; 12:1976-86. [PMID: 27072625 DOI: 10.1039/c6mb00065g] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods.
Collapse
Affiliation(s)
- Esmaeil Nourani
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Zand Avenue, Shiraz 71348 - 51154, Iran.
| | - Farshad Khunjush
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Zand Avenue, Shiraz 71348 - 51154, Iran. and School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, Kocaeli, Turkey
| |
Collapse
|
13
|
Weißenborn S, Walther D. Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling-A Feasibility Study. FRONTIERS IN PLANT SCIENCE 2017; 8:1831. [PMID: 29163570 PMCID: PMC5664361 DOI: 10.3389/fpls.2017.01831] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 10/10/2017] [Indexed: 05/19/2023]
Abstract
Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes.
Collapse
|
14
|
Abstract
Functional constraints between genes display similar patterns of gain or loss during speciation. Similar phylogenetic profiles, therefore, can be an indication of a functional association between genes. The phylogenetic profiling method has been applied successfully to the reconstruction of gene pathways and the inference of unknown gene functions. This method requires only sequence data to generate phylogenetic profiles. This method therefore has the potential to take advantage of the recent explosion in available sequence data to reveal a significant number of functional associations between genes. Since the initial development of phylogenetic profiling, many modifications to improve this method have been proposed, including improvements in the measurement of profile similarity and the selection of reference species. Here, we describe the existing methods of phylogenetic profiling for the inference of functional associations and discuss their technical limitations and caveats.
Collapse
Affiliation(s)
- Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 120-749, South Korea.
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 120-749, South Korea.
| |
Collapse
|
15
|
Nourani E, Khunjush F, Durmuş S. Computational approaches for prediction of pathogen-host protein-protein interactions. Front Microbiol 2015; 6:94. [PMID: 25759684 PMCID: PMC4338785 DOI: 10.3389/fmicb.2015.00094] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 01/26/2015] [Indexed: 12/25/2022] Open
Abstract
Infectious diseases are still among the major and prevalent health problems, mostly because of the drug resistance of novel variants of pathogens. Molecular interactions between pathogens and their hosts are the key parts of the infection mechanisms. Novel antimicrobial therapeutics to fight drug resistance is only possible in case of a thorough understanding of pathogen-host interaction (PHI) systems. Existing databases, which contain experimentally verified PHI data, suffer from scarcity of reported interactions due to the technically challenging and time consuming process of experiments. These have motivated many researchers to address the problem by proposing computational approaches for analysis and prediction of PHIs. The computational methods primarily utilize sequence information, protein structure and known interactions. Classic machine learning techniques are used when there are sufficient known interactions to be used as training data. On the opposite case, transfer and multitask learning methods are preferred. Here, we present an overview of these computational approaches for predicting PHI systems, discussing their weakness and abilities, with future directions.
Collapse
Affiliation(s)
- Esmaeil Nourani
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University Shiraz, Iran
| | - Farshad Khunjush
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University Shiraz, Iran ; School of Computer Science, Institute for Research in Fundamental Sciences (IPM) Tehran, Iran
| | - Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University Kocaeli, Turkey
| |
Collapse
|
16
|
Ochoa D, Pazos F. Practical aspects of protein co-evolution. Front Cell Dev Biol 2014; 2:14. [PMID: 25364721 PMCID: PMC4207036 DOI: 10.3389/fcell.2014.00014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 04/02/2014] [Indexed: 11/15/2022] Open
Abstract
Co-evolution is a fundamental aspect of Evolutionary Theory. At the molecular level, co-evolutionary linkages between protein families have been used as indicators of protein interactions and functional relationships from long ago. Due to the complexity of the problem and the amount of genomic data required for these approaches to achieve good performances, it took a relatively long time from the appearance of the first ideas and concepts to the quotidian application of these approaches and their incorporation to the standard toolboxes of bioinformaticians and molecular biologists. Today, these methodologies are mature (both in terms of performance and usability/implementation), and the genomic information that feeds them large enough to allow their general application. This review tries to summarize the current landscape of co-evolution-based methodologies, with a strong emphasis on describing interesting cases where their application to important biological systems, alone or in combination with other computational and experimental approaches, allowed getting new insight into these.
Collapse
Affiliation(s)
- David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Hinxton, UK
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC) Madrid, Spain
| |
Collapse
|
17
|
Wang H, Huang H, Ding C, Nie F. Predicting Protein–Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization. J Comput Biol 2013; 20:344-58. [DOI: 10.1089/cmb.2012.0273] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Hua Wang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Heng Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Chris Ding
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Feiping Nie
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| |
Collapse
|
18
|
Hooda Y, Kim PM. Computational structural analysis of protein interactions and networks. Proteomics 2012; 12:1697-705. [PMID: 22593000 DOI: 10.1002/pmic.201100597] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Protein interactions have been at the focus of computational biology in recent years. In particular, interest has come from two different communities--structural and systems biology. Here, we will discuss key systems and structural biology methods that have been used for analysis and prediction of protein-protein interactions and the insight these approaches have provided on the nature and organization of protein-protein interactions inside cells.
Collapse
Affiliation(s)
- Yogesh Hooda
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | | |
Collapse
|
19
|
Gomes M, Hamer R, Reinert G, Deane CM. Mutual information and variants for protein domain-domain contact prediction. BMC Res Notes 2012; 5:472. [PMID: 23244412 PMCID: PMC3532072 DOI: 10.1186/1756-0500-5-472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/10/2012] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific "successful" case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
Collapse
Affiliation(s)
- Mireille Gomes
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | |
Collapse
|
20
|
Arnold R, Boonen K, Sun MG, Kim PM. Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods 2012; 57:508-18. [PMID: 22750305 PMCID: PMC7128575 DOI: 10.1016/j.ymeth.2012.06.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Revised: 06/20/2012] [Accepted: 06/21/2012] [Indexed: 11/05/2022] Open
Abstract
Bacterial and viral pathogens affect their eukaryotic host partly by interacting with proteins of the host cell. Hence, to investigate infection from a systems' perspective we need to construct complete and accurate host-pathogen protein-protein interaction networks. Because of the paucity of available data and the cost associated with experimental approaches, any construction and analysis of such a network in the near future has to rely on computational predictions. Specifically, this challenge consists of a number of sub-problems: First, prediction of possible pathogen interactors (e.g. effector proteins) is necessary for bacteria and protozoa. Second, the prospective host binding partners have to be determined and finally, the impact on the host cell analyzed. This review gives an overview of current bioinformatics approaches to obtain and understand host-pathogen interactions. As an application example of the methods covered, we predict host-pathogen interactions of Salmonella and discuss the value of these predictions as a prospective for further research.
Collapse
Affiliation(s)
- Roland Arnold
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Kurt Boonen
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Mark G.F. Sun
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada M5S 3E1
- Banting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canada M5S 3E1
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada M5S 3E1
- Department of Computer Science, University of Toronto, Toronto, ON, Canada M5S 3E1
| |
Collapse
|
21
|
Basu MK, Selengut JD, Haft DH. ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process. BMC Bioinformatics 2011; 12:434. [PMID: 22070167 PMCID: PMC3226654 DOI: 10.1186/1471-2105-12-434] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Accepted: 11/09/2011] [Indexed: 12/02/2022] Open
Abstract
Background Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies. Results Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries. Conclusion ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.
Collapse
Affiliation(s)
- Malay K Basu
- J. Craig Venter Institute, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
22
|
Zhang YN, Pan XY, Huang Y, Shen HB. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. J Theor Biol 2011; 283:44-52. [PMID: 21635901 DOI: 10.1016/j.jtbi.2011.05.023] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 04/20/2011] [Accepted: 05/16/2011] [Indexed: 12/11/2022]
Abstract
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | |
Collapse
|
23
|
Nguyen CD, Gardiner KJ, Cios KJ. Protein annotation from protein interaction networks and Gene Ontology. J Biomed Inform 2011; 44:824-9. [PMID: 21571095 DOI: 10.1016/j.jbi.2011.04.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 04/17/2011] [Accepted: 04/26/2011] [Indexed: 01/12/2023]
Abstract
We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively.
Collapse
Affiliation(s)
- Cao D Nguyen
- Centre for Diabetes Research, The Western Australian Institute for Medical Research, Australia.
| | | | | |
Collapse
|
24
|
Andreini C, Bertini I, Cavallaro G, Decaria L, Rosato A. A Simple Protocol for the Comparative Analysis of the Structure and Occurrence of Biochemical Pathways Across Superkingdoms. J Chem Inf Model 2011; 51:730-8. [DOI: 10.1021/ci100392q] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Claudia Andreini
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Ivano Bertini
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Gabriele Cavallaro
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Leonardo Decaria
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
25
|
Yellaboina S, Tasneem A, Zaykin DV, Raghavachari B, Jothi R. DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res 2010; 39:D730-5. [PMID: 21113022 PMCID: PMC3013741 DOI: 10.1093/nar/gkq1229] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
DOMINE is a comprehensive collection of known and predicted domain–domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain–domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26 219 among 5140 unique Pfam domains, a 23% increase compared to 20 513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein–protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.
Collapse
Affiliation(s)
- Sailu Yellaboina
- Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | | | | | | | | |
Collapse
|
26
|
Abstract
Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46 900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
Collapse
Affiliation(s)
- Qibin Luo
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | |
Collapse
|
27
|
Gudimella R, Nallapeta S, Varadwaj P, Suravajhala P. Fungome: Annotating proteins implicated in fungal pathogenesis. Bioinformation 2010; 5:202-7. [PMID: 21364798 PMCID: PMC3040500 DOI: 10.6026/97320630005202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 08/25/2010] [Indexed: 12/03/2022] Open
Abstract
Sequencing genomes of different pathogenic fungi produced plethora of genetic information. This "omics" data might be of great interest to probe strain diversity, identify virulence factors and complementary genes in other fungal species, and importantly in predicting the role of proteins specific to pathogenesis in humans. We propose a component called "fungome" for those fungal proteins implicated in pathogenesis which, we believe, will allow researchers to improve the annotation of fungal proteins.
Collapse
Affiliation(s)
| | | | - Pritish Varadwaj
- Bioinformatics division, Indian Institute of Information Technology, Allahabad 211012, UP, India
| | - Prashanth Suravajhala
- Department of Science, Systems and Models, Roskilde University, 4000 Roskilde, Denmark
| |
Collapse
|
28
|
Pan XY, Zhang YN, Shen HB. Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. J Proteome Res 2010; 9:4992-5001. [PMID: 20698572 DOI: 10.1021/pr100618t] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interaction (PPI) is at the core of the entire interactomic system of any living organism. Although there are many human protein-protein interaction links being experimentally determined, the number is still relatively very few compared to the estimation that there are ∼300,000 protein-protein interactions in human beings. Hence, it is still urgent and challenging to develop automated computational methods to accurately and efficiently predict protein-protein interactions. In this paper, we propose a novel hierarchical LDA-RF (latent dirichlet allocation-random forest) model to predict human protein-protein interactions from protein primary sequences directly, which is featured by a high success rate and strong ability for handling large-scale data sets by digging the hidden internal structures buried into the noisy amino acid sequences in low dimensional latent semantic space. First, the local sequential features represented by conjoint triads are constructed from sequences. Then the generative LDA model is used to project the original feature space into the latent semantic space to obtain low dimensional latent topic features, which reflect the hidden structures between proteins. Finally, the powerful random forest model is used to predict the probability for interaction of two proteins. Our results show that the proposed latent topic feature is very promising for PPI prediction and could also become a powerful strategy to deal with many other bioinformatics problems. As a web server, LDA-RF is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LR_PPI for academic use.
Collapse
Affiliation(s)
- Xiao-Yong Pan
- Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, Shanghai, China
| | | | | |
Collapse
|
29
|
Smialowski P, Pagel P, Wong P, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Rattei T, Frishman D, Ruepp A. The Negatome database: a reference set of non-interacting protein pairs. Nucleic Acids Res 2010; 38:D540-4. [PMID: 19920129 PMCID: PMC2808923 DOI: 10.1093/nar/gkp1026] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2009] [Revised: 10/19/2009] [Accepted: 10/20/2009] [Indexed: 12/25/2022] Open
Abstract
The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.
Collapse
Affiliation(s)
- Pawel Smialowski
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Philipp Pagel
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Philip Wong
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Barbara Brauner
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Irmtraud Dunger
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Gisela Fobo
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Goar Frishman
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Corinna Montrone
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Thomas Rattei
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Dmitrij Frishman
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Andreas Ruepp
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU—German Research Center for Environmental Health Ingolstaedter Landstrasse 1, 85764 Neuherberg and Germany Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
30
|
OUYANG YM. The Birth, Development and Applications of Domain-Domain Interaction Databases. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
31
|
Stubben CJ, Duffield ML, Cooper IA, Ford DC, Gans JD, Karlyshev AV, Lingard B, Oyston PCF, de Rochefort A, Song J, Wren BW, Titball RW, Wolinsky M. Steps toward broad-spectrum therapeutics: discovering virulence-associated genes present in diverse human pathogens. BMC Genomics 2009; 10:501. [PMID: 19874620 PMCID: PMC2774872 DOI: 10.1186/1471-2164-10-501] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2009] [Accepted: 10/29/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND New and improved antimicrobial countermeasures are urgently needed to counteract increased resistance to existing antimicrobial treatments and to combat currently untreatable or new emerging infectious diseases. We demonstrate that computational comparative genomics, together with experimental screening, can identify potential generic (i.e., conserved across multiple pathogen species) and novel virulence-associated genes that may serve as targets for broad-spectrum countermeasures. RESULTS Using phylogenetic profiles of protein clusters from completed microbial genome sequences, we identified seventeen protein candidates that are common to diverse human pathogens and absent or uncommon in non-pathogens. Mutants of 13 of these candidates were successfully generated in Yersinia pseudotuberculosis and the potential role of the proteins in virulence was assayed in an animal model. Six candidate proteins are suggested to be involved in the virulence of Y. pseudotuberculosis, none of which have previously been implicated in the virulence of Y. pseudotuberculosis and three have no record of involvement in the virulence of any bacteria. CONCLUSION This work demonstrates a strategy for the identification of potential virulence factors that are conserved across a number of human pathogenic bacterial species, confirming the usefulness of this tool.
Collapse
Affiliation(s)
- Chris J Stubben
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Lewis ACF, Saeed R, Deane CM. Predicting protein-protein interactions in the context of protein evolution. MOLECULAR BIOSYSTEMS 2009; 6:55-64. [PMID: 20024067 DOI: 10.1039/b916371a] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Here we review the methods for the prediction of protein interactions and the ideas in protein evolution that relate to them. The evolutionary assumptions implicit in many of the protein interaction prediction methods are elucidated. We draw attention to the caution needed in deploying certain evolutionary assumptions, in particular cross-organism transfer of interactions by sequence homology, and discuss the known issues in deriving interaction predictions from evidence of co-evolution. We also conject that there is evolutionary knowledge yet to be exploited in the prediction of interactions, in particular the heterogeneity of interactions, the increasing availability of interaction data from multiple species, and the models of protein interaction network growth.
Collapse
Affiliation(s)
- Anna C F Lewis
- Department of Statistics and Systems Biology DTC, University of Oxford, UK
| | | | | |
Collapse
|
33
|
Björkholm P, Sonnhammer ELL. Comparative analysis and unification of domain–domain interaction networks. Bioinformatics 2009; 25:3020-5. [DOI: 10.1093/bioinformatics/btp522] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
34
|
Liu M, Chen XW, Jothi R. Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks. ACTA ACUST UNITED AC 2009; 25:2492-9. [PMID: 19667081 PMCID: PMC2752622 DOI: 10.1093/bioinformatics/btp480] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Motivation: Protein-protein interactions (PPIs), though extremely valuable towards a better understanding of protein functions and cellular processes, do not provide any direct information about the regions/domains within the proteins that mediate the interaction. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Thus, understanding interaction at the domain level is a critical step towards (i) thorough understanding of PPI networks; (ii) precise identification of binding sites; (iii) acquisition of insights into the causes of deleterious mutations at interaction sites; and (iv) most importantly, development of drugs to inhibit pathological protein interactions. In addition, knowledge derived from known domain–domain interactions (DDIs) can be used to understand binding interfaces, which in turn can help discover unknown PPIs. Results: Here, we describe a novel method called K-GIDDI (knowledge-guided inference of DDIs) to narrow down the PPI sites to smaller regions/domains. K-GIDDI constructs an initial DDI network from cross-species PPI networks, and then expands the DDI network by inferring additional DDIs using a divide-and-conquer biclustering algorithm guided by Gene Ontology (GO) information, which identifies partial-complete bipartite sub-networks in the DDI network and makes them complete bipartite sub-networks by adding edges. Our results indicate that K-GIDDI can reliably predict DDIs. Most importantly, K-GIDDI's novel network expansion procedure allows prediction of DDIs that are otherwise not identifiable by methods that rely only on PPI data. Contact:xwchen@ku.edu Availability:http://www.ittc.ku.edu/∼xwchen/domainNetwork/ddinet.html Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mei Liu
- Bioinformatics and Computational Life-Sciences Laboratory, ITTC, Department of Electrical Engineering and Computer Science, University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, USA
| | | | | |
Collapse
|
35
|
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A. Protein function annotation by homology-based inference. Genome Biol 2009; 10:207. [PMID: 19226439 PMCID: PMC2688287 DOI: 10.1186/gb-2009-10-2-207] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Where information on homologous proteins is available,
progress is being made in automated prediction of protein function
from sequence and structure. With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Collapse
Affiliation(s)
- Yaniv Loewenstein
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Sudarsky Center, Jerusalem 91904, Israel
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Yellaboina S, Dudekula DB, Ko MS. Prediction of evolutionarily conserved interologs in Mus musculus. BMC Genomics 2008; 9:465. [PMID: 18842131 PMCID: PMC2571111 DOI: 10.1186/1471-2164-9-465] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2008] [Accepted: 10/08/2008] [Indexed: 12/03/2022] Open
Abstract
Background Identification of protein-protein interactions is an important first step to understand living systems. High-throughput experimental approaches have accumulated large amount of information on protein-protein interactions in human and other model organisms. Such interaction information has been successfully transferred to other species, in which the experimental data are limited. However, the annotation transfer method could yield false positive interologs due to the lack of conservation of interactions when applied to phylogenetically distant organisms. Results To address this issue, we used phylogenetic profile method to filter false positives in interologs based on the notion that evolutionary conserved interactions show similar patterns of occurrence along the genomes. The approach was applied to Mus musculus, in which the experimentally identified interactions are limited. We first inferred the protein-protein interactions in Mus musculus by using two approaches: i) identifying mouse orthologs of interacting proteins (interologs) based on the experimental protein-protein interaction data from other organisms; and ii) analyzing frequency of mouse ortholog co-occurrence in predicted operons of bacteria. We then filtered possible false-positives in the predicted interactions using the phylogenetic profiles. We found that this filtering method significantly increased the frequency of interacting protein-pairs coexpressed in the same cells/tissues in gene expression omnibus (GEO) database as well as the frequency of interacting protein-pairs shared the similar Gene Ontology (GO) terms for biological processes and cellular localizations. The data supports the notion that phylogenetic profile helps to reduce the number of false positives in interologs. Conclusion We have developed protein-protein interaction database in mouse, which contains 41109 interologs. We have also developed a web interface to facilitate the use of database .
Collapse
Affiliation(s)
- Sailu Yellaboina
- Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA.
| | | | | |
Collapse
|
37
|
Banks E, Nabieva E, Chazelle B, Singh M. Organization of physical interactomes as uncovered by network schemas. PLoS Comput Biol 2008; 4:e1000203. [PMID: 18949022 PMCID: PMC2561054 DOI: 10.1371/journal.pcbi.1000203] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 09/09/2008] [Indexed: 11/18/2022] Open
Abstract
Large-scale protein-protein interaction networks provide new opportunities for understanding cellular organization and functioning. We introduce network schemas to elucidate shared mechanisms within interactomes. Network schemas specify descriptions of proteins and the topology of interactions among them. We develop algorithms for systematically uncovering recurring, over-represented schemas in physical interaction networks. We apply our methods to the S. cerevisiae interactome, focusing on schemas consisting of proteins described via sequence motifs and molecular function annotations and interacting with one another in one of four basic network topologies. We identify hundreds of recurring and over-represented network schemas of various complexity, and demonstrate via graph-theoretic representations how more complex schemas are organized in terms of their lower-order constituents. The uncovered schemas span a wide range of cellular activities, with many signaling and transport related higher-order schemas. We establish the functional importance of the schemas by showing that they correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the H. sapiens interactome, and are useful for predicting protein function. Our findings suggest that network schemas are a powerful paradigm for organizing, interrogating, and annotating cellular networks.
Collapse
Affiliation(s)
- Eric Banks
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Elena Nabieva
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Bernard Chazelle
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
38
|
Pazos F, Valencia A. Protein co-evolution, co-adaptation and interactions. EMBO J 2008; 27:2648-55. [PMID: 18818697 PMCID: PMC2556093 DOI: 10.1038/emboj.2008.189] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/28/2008] [Indexed: 01/28/2023] Open
Abstract
Co-evolution has an important function in the evolution of species and it is clearly manifested in certain scenarios such as host–parasite and predator–prey interactions, symbiosis and mutualism. The extrapolation of the concepts and methodologies developed for the study of species co-evolution at the molecular level has prompted the development of a variety of computational methods able to predict protein interactions through the characteristics of co-evolution. Particularly successful have been those methods that predict interactions at the genomic level based on the detection of pairs of protein families with similar evolutionary histories (similarity of phylogenetic trees: mirrortree). Future advances in this field will require a better understanding of the molecular basis of the co-evolution of protein families. Thus, it will be important to decipher the molecular mechanisms underlying the similarity observed in phylogenetic trees of interacting proteins, distinguishing direct specific molecular interactions from other general functional constraints. In particular, it will be important to separate the effects of physical interactions within protein complexes (‘co-adaptation') from other forces that, in a less specific way, can also create general patterns of co-evolution.
Collapse
Affiliation(s)
- Florencio Pazos
- Structure of Macromolecules, Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | | |
Collapse
|
39
|
Banks E, Nabieva E, Peterson R, Singh M. NetGrep: fast network schema searches in interactomes. Genome Biol 2008; 9:R138. [PMID: 18801179 PMCID: PMC2592716 DOI: 10.1186/gb-2008-9-9-r138] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2008] [Revised: 08/22/2008] [Accepted: 09/18/2008] [Indexed: 11/10/2022] Open
Abstract
NetGrep (http://genomics.princeton.edu/singhlab/netgrep/) is a system for searching protein interaction networks for matches to user-supplied 'network schemas'. Each schema consists of descriptions of proteins (for example, their molecular functions or putative domains) along with the desired topology and types of interactions among them. Schemas can thus describe domain-domain interactions, signaling and regulatory pathways, or more complex network patterns. NetGrep provides an advanced graphical interface for specifying schemas and fast algorithms for extracting their matches.
Collapse
Affiliation(s)
- Eric Banks
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| | - Elena Nabieva
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| | - Ryan Peterson
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Current address: Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| |
Collapse
|
40
|
Karimpour-Fard A, Detweiler CS, Erickson KD, Hunter L, Gill RT. Cross-species cluster co-conservation: a new method for generating protein interaction networks. Genome Biol 2008; 8:R185. [PMID: 17803817 PMCID: PMC2375023 DOI: 10.1186/gb-2007-8-9-r185] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2007] [Revised: 08/30/2007] [Accepted: 09/05/2007] [Indexed: 01/26/2023] Open
Abstract
Cluster Co-Conservation (CCC) has been extended to a method for developing protein interaction networks based on co-conservation between protein pairs across multiple species, Cross-Species Cluster Co-Conservation (CS-CCC). Co-conservation (phylogenetic profiles) is a well-established method for predicting functional relationships between proteins. Several publicly available databases use this method and additional clustering strategies to develop networks of protein interactions (cluster co-conservation (CCC)). CCC has previously been limited to interactions within a single target species. We have extended CCC to develop protein interaction networks based on co-conservation between protein pairs across multiple species, cross-species cluster co-conservation.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | | | | | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Ryan T Gill
- Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
41
|
Abstract
Conserved domains carry many of the functional features found in the proteins of an organism. This includes not only catalytic activity, substrate binding, and structural features but also molecular adapters, which mediate the physical interactions between proteins or proteins with other molecules. In addition, two conserved domains can be linked not by physical contact but by a common function like forming a binding pocket. Although a wealth of experimental data has been collected and carefully curated for protein-protein interactions, as of today little useful data is available from major databases with respect to relations on the domain level. This lack of data makes computational prediction of domain-domain interactions a very important endeavor. In this chapter, we discuss the available experimental data (iPfam) and describe some important approaches to the problem of identifying interacting and/or functionally linked domain pairs from different kinds of input data. Specifically, we will discuss phylogenetic profiling on the level of conserved protein domains on one hand and inference of domain-interactions from observed or predicted protein-protein interactions datasets on the other. We explore the predictive power of these predictions and point out the importance of deploying as many different methods as possible for the best results.
Collapse
|
42
|
Kensche PR, van Noort V, Dutilh BE, Huynen MA. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 2008; 5:151-70. [PMID: 17535793 PMCID: PMC2405902 DOI: 10.1098/rsif.2007.1047] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 05/05/2007] [Accepted: 05/05/2007] [Indexed: 11/12/2022] Open
Abstract
The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.
Collapse
Affiliation(s)
- Philip R. Kensche
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Vera van Noort
- European Molecular Biology Laboratory, Meyerhofstrasse 169117 Heidelberg, Germany
| | - Bas E. Dutilh
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Martijn A. Huynen
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
43
|
Protein-protein interactions: analysis and prediction. MODERN GENOME ANNOTATION 2008. [PMCID: PMC7120725 DOI: 10.1007/978-3-211-75123-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Proteins represent the tools and appliances of the cell — they assemble into larger structural elements, catalyze the biochemical reactions of metabolism, transmit signals, move cargo across membrane boundaries and carry out many other tasks. For most of these functions proteins cannot act in isolation but require close cooperation with other proteins to accomplish their task. Often, this collaborative action implies physical interaction of the proteins involved. Accordingly, experimental detection, in silico prediction and computational analysis of protein-protein interactions (PPI) have attracted great attention in the quest for discovering functional links among proteins and deciphering the complex networks of the cell.
Collapse
|
44
|
Nguyen CD, Gardiner KJ, Nguyen D, Cios KJ. Prediction of Protein Functions from Protein Interaction Networks: A Naïve Bayes Approach. PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE 2008. [DOI: 10.1007/978-3-540-89197-0_73] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
45
|
Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007; 8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 361] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
46
|
Pagel P, Oesterheld M, Tovstukhina O, Strack N, Stümpflen V, Frishman D. DIMA 2.0--predicted and known domain interactions. Nucleic Acids Res 2007; 36:D651-5. [PMID: 17999995 PMCID: PMC2238836 DOI: 10.1093/nar/gkm996] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
DIMA—the domain interaction map has evolved from a simple web server for domain phylogenetic profiling into an integrative prediction resource combining both experimental data on domain–domain interactions and predictions from two different algorithms. With this update, DIMA obtains greatly improved coverage at the level of genomes and domains as well as with respect to available prediction approaches. The domain phylogenetic profiling method now uses SIMAP as its backend for exhaustive domain hit coverage: 7038 Pfam domains were profiled over 460 completely sequenced genomes.Domain pair exclusion predictions were produced from 83 969 distinct protein–protein interactions obtained from IntAct resulting in 21 513 domain pairs with significant domain pair exclusion algorithm scores. Additional predictions applying the same algorithm to predicted protein interactions from STRING yielded 2378 high-confidence pairs. Experimental data comes from iPfam (3074) and 3did (3034 pairs), two databases identifying domain contacts in solved protein structures. Taken together, these two resources yielded 3653 distinct interacting domain pairs. DIMA is available at http://mips.gsf.de/genre/proj/dima.
Collapse
Affiliation(s)
- Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Wissenschaftszentrum Weihenstephan, Technische Universität München, Am Forum 1, D-85350 Freising, Germany.
| | | | | | | | | | | |
Collapse
|
47
|
Ranea JAG, Yeats C, Grant A, Orengo CA. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comput Biol 2007; 3:e237. [PMID: 18052542 PMCID: PMC2098864 DOI: 10.1371/journal.pcbi.0030237] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 10/17/2007] [Indexed: 11/17/2022] Open
Abstract
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes. The vast number of protein sequences being determined by the international genomics projects means that it is not possible to functionally characterise all the proteins through direct experimentation. One of the more successful electronic methods for detecting functionally associated genes has been through the comparison of genes' phylogenetic profiles. This method is based on the hypothesis that two functionally related genes will show very similar presence–absence profile patterns throughout different organisms. Whilst these methods have grown increasingly sophisticated, they have largely been based on detecting functionally homologous genes in different species (technically known as orthologous genes) and thus better suited to prokaryotic genomes, where this can be done more easily. We have developed a new type of hierarchical phylogenetic profile by subdividing protein families into subclusters in different sequence identity levels. This new approach encapsulates a more realistic model of the functional variation that uneven natural selection pressure produces on different protein families and organisms, and it can detect functional relationships between protein families without the initial application of rigid sequence similarity thresholds or complex protocols for orthology assignment. These advantages are especially useful in eukaryotes since the larger average size of eukaryotic multigene families makes them more prone to orthology mis-assignment than in prokaryotes.
Collapse
Affiliation(s)
- Juan A G Ranea
- Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom.
| | | | | | | |
Collapse
|
48
|
Raghavachari B, Tasneem A, Przytycka TM, Jothi R. DOMINE: a database of protein domain interactions. Nucleic Acids Res 2007; 36:D656-61. [PMID: 17913741 PMCID: PMC2238965 DOI: 10.1093/nar/gkm761] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
DOMINE is a database of known and predicted protein domain interactions compiled from a variety of sources. The database contains domain–domain interactions observed in PDB entries, and those that were predicted by eight different computational approaches. DOMINE contains a total of 20 513 unique domain–domain interactions among 4036 Pfam domains, out of which 4349 are inferred from PDB entries and 17 781 were predicted by at least one computational approach. This database will serve as a valuable resource to those working in the field of protein and domain interactions. DOMINE may not only serve as a reference to experimentalists who test for new protein and domain interactions, but also offers a consolidated dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that control the topological structure of interaction networks. DOMINE is freely available at http://domine.utdallas.edu.
Collapse
Affiliation(s)
- Balaji Raghavachari
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA, 10401 Grosvenor Pl, Rockville Pike, MD 20852, USA and National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Asba Tasneem
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA, 10401 Grosvenor Pl, Rockville Pike, MD 20852, USA and National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M. Przytycka
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA, 10401 Grosvenor Pl, Rockville Pike, MD 20852, USA and National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Raja Jothi
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA, 10401 Grosvenor Pl, Rockville Pike, MD 20852, USA and National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *To whom correspondence should be addressed.+1 301 402 8221+1 301 480 4637
| |
Collapse
|
49
|
Computational prediction of protein-protein interactions. Mol Biotechnol 2007; 38:1-17. [PMID: 18095187 DOI: 10.1007/s12033-007-0069-2] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 07/16/2007] [Indexed: 01/19/2023]
Abstract
Recently a number of computational approaches have been developed for the prediction of protein-protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.
Collapse
|
50
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|