1
|
Chung HC, Friedberg I, Bromberg Y. Assembling bacterial puzzles: piecing together functions into microbial pathways. NAR Genom Bioinform 2024; 6:lqae109. [PMID: 39184378 PMCID: PMC11344244 DOI: 10.1093/nargab/lqae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/24/2024] [Accepted: 08/07/2024] [Indexed: 08/27/2024] Open
Abstract
Functional metagenomics enables the study of unexplored bacterial diversity, gene families, and pathways essential to microbial communities. However, discovering biological insights with these data is impeded by the scarcity of quality annotations. Here, we use a co-occurrence-based analysis of predicted microbial protein functions to uncover pathways in genomic and metagenomic biological systems. Our approach, based on phylogenetic profiles, improves the identification of functional relationships, or participation in the same biochemical pathway, between enzymes over a comparable homology-based approach. We optimized the design of our profiles to identify potential pathways using minimal data, clustered functionally related enzyme pairs into multi-enzymatic pathways, and evaluated our predictions against reference pathways in the KEGG database. We then demonstrated a novel extension of this approach to predict inter-bacterial protein interactions amongst members of a marine microbiome. Most significantly, we show our method predicts emergent biochemical pathways between known and unknown functions. Thus, our work establishes a basis for identifying the potential functional capacities of the entire metagenome, capturing previously unknown and abstract functions into discrete putative pathways.
Collapse
Affiliation(s)
- Henri C Chung
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 , USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Computer Science, Emory University, Atlanta, GA 30307, USA
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
2
|
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, Moore LR, Subhraveti P, Gama-Castro S, Tierrafria VH, Lara P, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Sun G, Ahn-Horst TA, Choi H, Covert MW, Collado-Vides J, Paulsen I. The EcoCyc Database (2023). EcoSal Plus 2023; 11:eesp00022023. [PMID: 37220074 PMCID: PMC10729931 DOI: 10.1128/ecosalplus.esp-0002-2023] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/04/2023] [Indexed: 01/28/2024]
Abstract
EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D. Karp
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Peter E. Midford
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Lisa R. Moore
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Victor H. Tierrafria
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Paloma Lara
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A. Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Ian Paulsen
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
3
|
Muley VY. Search, Retrieve, Visualize, and Analyze Protein-Protein Interactions from Multiple Databases: A Guide for Experimental Biologists. Methods Mol Biol 2023; 2690:429-443. [PMID: 37450164 DOI: 10.1007/978-1-0716-3327-4_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Functional annotation is lacking for over half of the proteins encoded in genomes and model or representative organisms are not an exception to this trend. One of the popular ways of assigning putative functions to uncharacterized proteins is based on the functions of well-characterized proteins that physically interact with them, i.e., guilt-by-association or functional context approach. In the last two decades, several powerful experimental and computational techniques have been used to determine protein-protein interactions (PPIs) at genome level and are made available through many public databases. The PPI data are often complex and heterogeneously represented across databases posing unique challenges in retrieving, integrating, and analyzing the data even for trained computational biologists, the end users-experimental biologists often struggle to work around the data for the protein of their interests. This chapter provides stepwise protocols to import interaction network of the protein of interest in Cytoscape using PSICQUIC, stringApp, and IntAct App. These are next-generation applications that import PPI from multiple databases/resources and provide seamless functions to study the protein of interest and its functional context directly in Cytoscape.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Independent Researcher, Jijamata Nagar, Hingoli, India.
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México.
| |
Collapse
|
4
|
Muley VY, König R. Human transcriptional gene regulatory network compiled from 14 data resources. Biochimie 2021; 193:115-125. [PMID: 34740743 DOI: 10.1016/j.biochi.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 11/02/2022]
Abstract
The transcriptional regulatory network (TRN) in a cell orchestrates spatio-temporal expression of genes to generate cellular responses for maintenance, reproduction, development and survival of the cell and its hosting organism. Transcription factors (TF) regulate the expression of their target genes (TG) and are the fundamental units of TRN. Several databases have been developed to catalogue human TRN based on low- and high-throughput experimental and computational studies considering their importance in understanding cellular physiology. However, literature lacks their comparative assessment on the strengths and weaknesses. We compared over 2.2 million regulatory pairs between 1379 TF and 22,518 TG from 14 resources. Our study reveals that the TF and TG were common across data resources but not their regulatory pairs. TF and TG of the regulatory pairs showed weak expression correlation, significant gene ontology overlap, co-citations in PubMed and low numbers of TF-TG pairs representing transcriptional repression relationships. We assigned each TF-TG regulatory pair a combined confidence score reflecting its reliability based on its presence in multiple databases. The assembled TRN contains 2,246,598 TF-TG pairs, of which, 44,284 with information on TF's activating or repressing effects on their TG and is available upon request. This study brings the information about transcriptional regulation scattered over the literature and databases at one place in the form of one of the most comprehensive and complete human TRN assembled to date. It will be a valuable resource for benchmarking TRN prediction tools, and to the scientific community working in functional genomics, gene expression and regulation analysis.
Collapse
Affiliation(s)
| | - Rainer König
- Institute for Infectious Diseases and Infection Control, Jena University Hospital, Jena, Germany; Integrated Research and Treatment Center, Center for Sepsis Control and Care, Jena University Hospital, Jena, Germany.
| |
Collapse
|
5
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
6
|
Li F, Zhu F, Ling X, Liu Q. Protein Interaction Network Reconstruction Through Ensemble Deep Learning With Attention Mechanism. Front Bioeng Biotechnol 2020; 8:390. [PMID: 32432096 PMCID: PMC7215070 DOI: 10.3389/fbioe.2020.00390] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/07/2020] [Indexed: 11/13/2022] Open
Abstract
Protein interactions play an essential role in studying living systems and life phenomena. A considerable amount of literature has been published on analyzing and predicting protein interactions, such as support vector machine method, homology-based method and similarity-based method, each has its pros and cons. Most existing methods for predicting protein interactions require prior domain knowledge, making it difficult to effectively extract protein features. Single method is dissatisfactory in predicting protein interactions, declaring the need for a comprehensive method that combines the advantages of various methods. On this basis, a deep ensemble learning method called EnAmDNN (Ensemble Deep Neural Networks with Attention Mechanism) is proposed to predict protein interactions which is an appropriate candidate for comprehensive learning, combining multiple models, and considering the advantages of various methods. Particularly, it encode protein sequences by the local descriptor, auto covariance, conjoint triad, pseudo amino acid composition and combine the vector representation of each protein in the protein interaction network. Then it takes advantage of the multi-layer convolutional neural networks to automatically extract protein features and construct an attention mechanism to analyze deep-seated relationships between proteins. We set up four different structures of deep learning models. In the ensemble learning model, second layer data sets are generated with five-fold cross validation from basic learners, then predict the protein interaction network by combining 16 models. Results on five independent PPI data sets demonstrate that EnAmDNN achieves superior prediction performance than other comparing methods.
Collapse
Affiliation(s)
- Feifei Li
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, China.,Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
| | - Xinghong Ling
- School of Computer Science and Technology, Soochow University, Suzhou, China
| | - Quan Liu
- School of Computer Science and Technology, Soochow University, Suzhou, China
| |
Collapse
|
7
|
Muley VY, Akhter Y, Galande S. PDZ Domains Across the Microbial World: Molecular Link to the Proteases, Stress Response, and Protein Synthesis. Genome Biol Evol 2019; 11:644-659. [PMID: 30698789 PMCID: PMC6411480 DOI: 10.1093/gbe/evz023] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/25/2019] [Indexed: 02/07/2023] Open
Abstract
The PSD-95/Dlg-A/ZO-1 (PDZ) domain is highly expanded, diversified, and well distributed across metazoa where it assembles diverse signaling components by virtue of interactions with other proteins in a sequence-specific manner. In contrast, in the microbial world they are reported to be involved in protein quality control during stress response. The distribution, functions, and origins of PDZ domain-containing proteins in the prokaryotic organisms remain largely unexplored. We analyzed 7,852 PDZ domain-containing proteins in 1,474 microbial genomes in this context. PDZ domain-containing proteins from planctomycetes, myxobacteria, and other eubacteria occupying terrestrial and aquatic niches are found to be in multiple copies within their genomes. Over 93% of the 7,852 PDZ domain-containing proteins were classified into 12 families including six novel families based on additional structural and functional domains present in these proteins. The higher PDZ domain encoding capacity of the investigated organisms was observed to be associated with adaptation to the ecological niche where multicellular life might have originated and flourished. Predicted subcellular localization of PDZ domain-containing proteins and their genomic context argue in favor of crucial roles in translation and membrane remodeling during stress response. Based on rigorous sequence, structure, and phylogenetic analyses, we propose that the highly diverse PDZ domain of the uncharacterized Fe-S oxidoreductase superfamily, exclusively found in gladobacteria and several anaerobes and acetogens, might represent the most ancient form among all the existing PDZ domains.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México
- Department of Biology, Indian Institute of Science Education and Research, Pune, India
| | - Yusuf Akhter
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Lucknow, India
| | - Sanjeev Galande
- Department of Biology, Indian Institute of Science Education and Research, Pune, India
| |
Collapse
|
8
|
Karp PD, Ong WK, Paley S, Billington R, Caspi R, Fulcher C, Kothari A, Krummenacker M, Latendresse M, Midford PE, Subhraveti P, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2018; 8:10.1128/ecosalplus.ESP-0006-2018. [PMID: 30406744 PMCID: PMC6504970 DOI: 10.1128/ecosalplus.esp-0006-2018] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Indexed: 01/28/2023]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Wai Kit Ong
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Mario Latendresse
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
9
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
10
|
Computational Approaches for Predicting Binding Partners, Interface Residues, and Binding Affinity of Protein-Protein Complexes. Methods Mol Biol 2017; 1484:237-253. [PMID: 27787830 DOI: 10.1007/978-1-4939-6406-2_16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Studying protein-protein interactions leads to a better understanding of the underlying principles of several biological pathways. Cost and labor-intensive experimental techniques suggest the need for computational methods to complement them. Several such state-of-the-art methods have been reported for analyzing diverse aspects such as predicting binding partners, interface residues, and binding affinity for protein-protein complexes with reliable performance. However, there are specific drawbacks for different methods that indicate the need for their improvement. This review highlights various available computational algorithms for analyzing diverse aspects of protein-protein interactions and endorses the necessity for developing new robust methods for gaining deep insights about protein-protein interactions.
Collapse
|
11
|
Kara A, Vickers M, Swain M, Whitworth DE, Fernandez-Fuentes N. Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor. BMC Bioinformatics 2015; 16:297. [PMID: 26384938 PMCID: PMC4575426 DOI: 10.1186/s12859-015-0741-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Accepted: 09/16/2015] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information. RESULTS We present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor. CONCLUSION We have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs.
Collapse
Affiliation(s)
- Altan Kara
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Martin Vickers
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Martin Swain
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - David E Whitworth
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK.
| |
Collapse
|
12
|
Ochoa D, Juan D, Valencia A, Pazos F. Detection of significant protein coevolution. ACTA ACUST UNITED AC 2015; 31:2166-73. [PMID: 25717190 DOI: 10.1093/bioinformatics/btv102] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 02/11/2015] [Indexed: 11/14/2022]
Abstract
MOTIVATION The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals. RESULTS In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information. AVAILABILITY AND IMPLEMENTATION The software and datasets used in this work are freely available at: http://csbg.cnb.csic.es/pMT/. CONTACT pazos@cnb.csic.es SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Ochoa
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - David Juan
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Alfonso Valencia
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
13
|
Škunca N, Dessimoz C. Phylogenetic profiling: how much input data is enough? PLoS One 2015; 10:e0114701. [PMID: 25679783 PMCID: PMC4332489 DOI: 10.1371/journal.pone.0114701] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/10/2014] [Indexed: 12/04/2022] Open
Abstract
Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.
Collapse
Affiliation(s)
- Nives Škunca
- ETH Zürich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| |
Collapse
|
14
|
Keseler IM, Skrzypek M, Weerasinghe D, Chen AY, Fulcher C, Li GW, Lemmer KC, Mladinich KM, Chow ED, Sherlock G, Karp PD. Curation accuracy of model organism databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau058. [PMID: 24923819 PMCID: PMC4207230 DOI: 10.1093/database/bau058] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org//
Collapse
Affiliation(s)
- Ingrid M Keseler
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Marek Skrzypek
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Deepika Weerasinghe
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Albert Y Chen
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Carol Fulcher
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gene-Wei Li
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Kimberly C Lemmer
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Katherine M Mladinich
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Edmond D Chow
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gavin Sherlock
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Peter D Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| |
Collapse
|
15
|
Karp PD, Weaver D, Paley S, Fulcher C, Kubo A, Kothari A, Krummenacker M, Subhraveti P, Weerasinghe D, Gama-Castro S, Huerta AM, Muñiz-Rascado L, Bonavides-Martinez C, Weiss V, Peralta-Gil M, Santos-Zavaleta A, Schröder I, Mackie A, Gunsalus R, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2014; 6:10.1128/ecosalplus.ESP-0009-2013. [PMID: 26442933 PMCID: PMC4243172 DOI: 10.1128/ecosalplus.esp-0009-2013] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Indexed: 11/20/2022]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Daniel Weaver
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Aya Kubo
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | | | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Araceli M Huerta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Verena Weiss
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Martin Peralta-Gil
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Imke Schröder
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
- UCLA Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Robert Gunsalus
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
16
|
Ochoa D, Pazos F. Practical aspects of protein co-evolution. Front Cell Dev Biol 2014; 2:14. [PMID: 25364721 PMCID: PMC4207036 DOI: 10.3389/fcell.2014.00014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 04/02/2014] [Indexed: 11/15/2022] Open
Abstract
Co-evolution is a fundamental aspect of Evolutionary Theory. At the molecular level, co-evolutionary linkages between protein families have been used as indicators of protein interactions and functional relationships from long ago. Due to the complexity of the problem and the amount of genomic data required for these approaches to achieve good performances, it took a relatively long time from the appearance of the first ideas and concepts to the quotidian application of these approaches and their incorporation to the standard toolboxes of bioinformaticians and molecular biologists. Today, these methodologies are mature (both in terms of performance and usability/implementation), and the genomic information that feeds them large enough to allow their general application. This review tries to summarize the current landscape of co-evolution-based methodologies, with a strong emphasis on describing interesting cases where their application to important biological systems, alone or in combination with other computational and experimental approaches, allowed getting new insight into these.
Collapse
Affiliation(s)
- David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Hinxton, UK
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC) Madrid, Spain
| |
Collapse
|
17
|
Zahiri J, Bozorgmehr JH, Masoudi-Nejad A. Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources. Curr Genomics 2014; 14:397-414. [PMID: 24396273 PMCID: PMC3861891 DOI: 10.2174/1389202911314060004] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 08/07/2013] [Accepted: 08/26/2013] [Indexed: 01/15/2023] Open
Abstract
Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability.
Collapse
Affiliation(s)
- Javad Zahiri
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Joseph Hannon Bozorgmehr
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| |
Collapse
|
18
|
Folador EL, Hassan SS, Lemke N, Barh D, Silva A, Ferreira RS, Azevedo V. An improved interolog mapping-based computational prediction of protein–protein interactions with increased network coverage. Integr Biol (Camb) 2014; 6:1080-7. [DOI: 10.1039/c4ib00136b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Automated and efficient methods that map ortholog interactions from several organisms and public databases (pDB) are needed to identify new interactions in an organism of interest (interolog mapping).
Collapse
Affiliation(s)
- Edson Luiz Folador
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Syed Shah Hassan
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Ney Lemke
- Laboratory of Bioinformatic and Computational Biofisic
- Instituto de Biociência
- Universidade Estadual de São Paulo (UNESP)
- Botucatu, Brazil
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology
- Institute of Integrative Omics and Applied Biotechnology (IIOAB)
- Purba Medinipur, India
| | - Artur Silva
- Instituto de Ciências Biológicas
- Universidade Federal do Para
- Belém, Brazil
| | - Rafaela Salgado Ferreira
- Department of Biochemistry and Immunology
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| | - Vasco Azevedo
- Department of General Biology
- Instituto de Ciências Biológicas (ICB)
- Federal University of Minas Gerais (UFMG)
- Belo Horizonte, Brazil
| |
Collapse
|
19
|
Zhou H, Jakobsson E. Predicting protein-protein interaction by the mirrortree method: possibilities and limitations. PLoS One 2013; 8:e81100. [PMID: 24349035 PMCID: PMC3862474 DOI: 10.1371/journal.pone.0081100] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 10/11/2013] [Indexed: 12/02/2022] Open
Abstract
Molecular co-evolution analysis as a sequence-only based method has been used to predict protein-protein interactions. In co-evolution analysis, Pearson's correlation within the mirrortree method is a well-known way of quantifying the correlation between protein pairs. Here we studied the mirrortree method on both known interacting protein pairs and sets of presumed non-interacting protein pairs, to evaluate the utility of this correlation analysis method for predicting protein-protein interactions within eukaryotes. We varied metrics for computing evolutionary distance and evolutionary span of the species analyzed. We found the differences between co-evolutionary correlation scores of the interacting and non-interacting proteins, normalized for evolutionary span, to be significantly predictive for proteins conserved over a wide range of eukaryotic clades (from mammals to fungi). On the other hand, for narrower ranges of evolutionary span, the predictive power was much weaker.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Biochemistry, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Eric Jakobsson
- Department of Biochemistry, University of Illinois, Urbana-Champaign, Illinois, United States of America
- Beckman Institute, National Center for Supercomputing Applications, Program in Biophysics and Computational Biology, Department of Molecular and Integrative Physiology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
20
|
Muley VY, Ranjan A. Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways. PLoS One 2013; 8:e54325. [PMID: 23349851 PMCID: PMC3547882 DOI: 10.1371/journal.pone.0054325] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 12/11/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway. METHODS Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity. CONCLUSIONS Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
- * E-mail:
| |
Collapse
|
21
|
Rezende AM, Folador EL, Resende DDM, Ruiz JC. Computational prediction of protein-protein interactions in Leishmania predicted proteomes. PLoS One 2012; 7:e51304. [PMID: 23251492 PMCID: PMC3519578 DOI: 10.1371/journal.pone.0051304] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2012] [Accepted: 10/31/2012] [Indexed: 11/18/2022] Open
Abstract
The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.
Collapse
Affiliation(s)
- Antonio M. Rezende
- Laboratório de Parasitologia Celular e Molecular, Centro de Pesquisa René Rachou – FIOCRUZ, Belo Horizonte, Minas Gerais, Brazil
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (AMR); (JCR)
| | - Edson L. Folador
- Laboratório de Parasitologia Celular e Molecular, Centro de Pesquisa René Rachou – FIOCRUZ, Belo Horizonte, Minas Gerais, Brazil
- Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Daniela de M. Resende
- Laboratório de Parasitologia Celular e Molecular, Centro de Pesquisa René Rachou – FIOCRUZ, Belo Horizonte, Minas Gerais, Brazil
- Laboratório de Pesquisas Clínicas, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil
| | - Jeronimo C. Ruiz
- Laboratório de Parasitologia Celular e Molecular, Centro de Pesquisa René Rachou – FIOCRUZ, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (AMR); (JCR)
| |
Collapse
|