1
|
Sahoo A, Pechmann S. Functional network motifs defined through integration of protein-protein and genetic interactions. PeerJ 2022; 10:e13016. [PMID: 35223214 PMCID: PMC8877332 DOI: 10.7717/peerj.13016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/06/2022] [Indexed: 01/11/2023] Open
Abstract
Cells are enticingly complex systems. The identification of feedback regulation is critically important for understanding this complexity. Network motifs defined as small graphlets that occur more frequently than expected by chance have revolutionized our understanding of feedback circuits in cellular networks. However, with their definition solely based on statistical over-representation, network motifs often lack biological context, which limits their usefulness. Here, we define functional network motifs (FNMs) through the systematic integration of genetic interaction data that directly inform on functional relationships between genes and encoded proteins. Occurring two orders of magnitude less frequently than conventional network motifs, we found FNMs significantly enriched in genes known to be functionally related. Moreover, our comprehensive analyses of FNMs in yeast showed that they are powerful at capturing both known and putative novel regulatory interactions, thus suggesting a promising strategy towards the systematic identification of feedback regulation in biological networks. Many FNMs appeared as excellent candidates for the prioritization of follow-up biochemical characterization, which is a recurring bottleneck in the targeting of complex diseases. More generally, our work highlights a fruitful avenue for integrating and harnessing genomic network data.
Collapse
Affiliation(s)
- Amruta Sahoo
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
2
|
Saul M, Dinu V. Family Rank: A graphical domain knowledge informed feature ranking algorithm. Bioinformatics 2021; 37:3626-3631. [PMID: 34009295 DOI: 10.1093/bioinformatics/btab387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 04/11/2021] [Accepted: 05/18/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data. RESULTS A simulated data set is used to demonstrate a scenario in which family rank outperforms other state-of-the-art graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2 to 3-fold. An example from oncology is then used to explore a real-world application of family rank. AVAILABILITY An implementation of Family Rank is freely available at https://cran.r-project.org/package=FamilyRank. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michelle Saul
- College of Health Solutions, Arizona State University, Tempe AZ 85287-9020.,Caris Life Sciences, Tempe, AZ 85281
| | - Valentin Dinu
- College of Health Solutions, Arizona State University, Tempe AZ 85287-9020
| |
Collapse
|
3
|
Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M. PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features. PLoS One 2021; 16:e0248682. [PMID: 33730083 PMCID: PMC7968678 DOI: 10.1371/journal.pone.0248682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 03/03/2021] [Indexed: 11/20/2022] Open
Abstract
A new web server called PhotoModPlus is presented as a platform for predicting photosynthetic proteins via genome neighborhood networks (GNN) and genome neighborhood-based machine learning. GNN enables users to visualize the overview of the conserved neighboring genes from multiple photosynthetic prokaryotic genomes and provides functional guidance on the query input. In the platform, we also present a new machine learning model utilizing genome neighborhood features for predicting photosynthesis-specific functions based on 24 prokaryotic photosynthesis-related GO terms, namely PhotoModGO. The new model performed better than the sequence-based approaches with an F1 measure of 0.872, based on nested five-fold cross-validation. Finally, we demonstrated the applications of the webserver and the new model in the identification of novel photosynthetic proteins. The server is user-friendly, compatible with all devices, and available at bicep.kmutt.ac.th/photomod.
Collapse
Affiliation(s)
- Apiwat Sangphukieo
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
- School of Information Technology, KMUTT, Thung Khru, Bangkok, Thailand
| | - Teeraphan Laomettachit
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
| | - Marasri Ruengjitchatchawalya
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
- Biotechnology Program, School of Bioresources and Technology, KMUTT, Bang Khun Thian, Bangkok, Thailand
- Algal Biotechnology Research Group, Pilot Plant Development and Training Institute, KMUTT, Bang Khun Thian, Bangkok, Thailand
| |
Collapse
|
4
|
Xiong E, Li Z, Zhang C, Zhang J, Liu Y, Peng T, Chen Z, Zhao Q. A study of leaf-senescence genes in rice based on a combination of genomics, proteomics and bioinformatics. Brief Bioinform 2020; 22:5998850. [PMID: 33257942 DOI: 10.1093/bib/bbaa305] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 09/15/2020] [Accepted: 10/10/2020] [Indexed: 12/14/2022] Open
Abstract
Leaf senescence is a highly complex, genetically regulated and well-ordered process with multiple layers and pathways. Delaying leaf senescence would help increase grain yields in rice. Over the past 15 years, more than 100 rice leaf-senescence genes have been cloned, greatly improving the understanding of leaf senescence in rice. Systematically elucidating the molecular mechanisms underlying leaf senescence will provide breeders with new tools/options for improving many important agronomic traits. In this study, we summarized recent reports on 125 rice leaf-senescence genes, providing an overview of the research progress in this field by analyzing the subcellular localizations, molecular functions and the relationship of them. These data showed that chlorophyll synthesis and degradation, chloroplast development, abscisic acid pathway, jasmonic acid pathway, nitrogen assimilation and ROS play an important role in regulating the leaf senescence in rice. Furthermore, we predicted and analyzed the proteins that interact with leaf-senescence proteins and achieved a more profound understanding of the molecular principles underlying the regulatory mechanisms by which leaf senescence occurs, thus providing new insights for future investigations of leaf senescence in rice.
Collapse
Affiliation(s)
- Erhui Xiong
- College of Agriculture, Henan Agricultural University (HAU), China
| | - Zhiyong Li
- Academy for Advanced Interdisciplinary Studies, South University of Science and Technology, Shenzhen, China
| | - Chen Zhang
- College of Life Sciences, Nanjing Agricultural University, Nanjing, China
| | | | - Ye Liu
- College of Agriculture, HAU
| | | | | | | |
Collapse
|
5
|
Kaushik AC, Mehmood A, Dai X, Wei DQ. WeiBI (web-based platform): Enriching integrated interaction network with increased coverage and functional proteins from genome-wide experimental OMICS data. Sci Rep 2020; 10:5618. [PMID: 32221380 PMCID: PMC7101429 DOI: 10.1038/s41598-020-62508-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 03/10/2020] [Indexed: 12/27/2022] Open
Abstract
Many molecular system biology approaches recognize various interactions and functional associations of proteins that occur in cellular processing. Further understanding of the characterization technique reveals noteworthy information. These types of known and predicted interactions, gained through multiple resources, are thought to be important for experimental data to satisfy comprehensive and quality needs. The current work proposes the “WeiBI (WeiBiologicalInteractions)” database that clarifies direct and indirect partnerships associated with biological interactions. This database contains information concerning protein’s functional partnerships and interactions along with their integration into a statistical model that can be computationally predicted for humans. This novel approach in WeiBI version 1.0 collects information using an improved algorithm by transferring interactions between more than 115570 entries, allowing statistical analysis with the automated background for the given inputs for functional enrichment. This approach also allows the input of an entity’s list from a database along with the visualization of subsets as an interaction network and successful performance of the enrichment analysis for a gene set. This wisely improved algorithm is user-friendly, and its accessibility and higher accuracy make it the best database for exploring interactions among genomes’ network and reflects the importance of this study. The proposed server “WeiBI” is accessible at http://weislab.com/WeiDOCK/?page=PKPD.
Collapse
Affiliation(s)
- Aman Chandra Kaushik
- Wuxi School of Medicine, Jiangnan University, Wuxi, China. .,State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Aamir Mehmood
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaofeng Dai
- Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
6
|
Spark’s GraphX-based link prediction for social communication using triangle counting. SOCIAL NETWORK ANALYSIS AND MINING 2019. [DOI: 10.1007/s13278-019-0573-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
7
|
Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018; 93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanism of diseases. PPIs are also important targets for developing drugs. Experimental methods, both small-scale and large-scale, have identified PPIs in several model organisms. However, results cover only a part of PPIs of organisms; moreover, there are many organisms whose PPIs have not yet been investigated. To complement experimental methods, many computational methods have been developed that predict PPIs from various characteristics of proteins. Here we provide an overview of literature reports to classify computational PPI prediction methods that consider different features of proteins, including protein sequence, genomes, protein structure, function, PPI network topology, and those which integrate multiple methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA
- Corresponding author: DK; , Phone: 1-765-496-2284 (DK)
| |
Collapse
|
8
|
Solar-panel and parasol strategies shape the proteorhodopsin distribution pattern in marine Flavobacteriia. ISME JOURNAL 2018; 12:1329-1343. [PMID: 29410487 PMCID: PMC5932025 DOI: 10.1038/s41396-018-0058-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 12/17/2017] [Accepted: 01/02/2018] [Indexed: 12/30/2022]
Abstract
Proteorhodopsin (PR) is a light-driven proton pump that is found in diverse bacteria and archaea species, and is widespread in marine microbial ecosystems. To date, many studies have suggested the advantage of PR for microorganisms in sunlit environments. The ecophysiological significance of PR is still not fully understood however, including the drivers of PR gene gain, retention, and loss in different marine microbial species. To explore this question we sequenced 21 marine Flavobacteriia genomes of polyphyletic origin, which encompassed both PR-possessing as well as PR-lacking strains. Here, we show that the possession or alternatively the lack of PR genes reflects one of two fundamental adaptive strategies in marine bacteria. Specifically, while PR-possessing bacteria utilize light energy ("solar-panel strategy"), PR-lacking bacteria exclusively possess UV-screening pigment synthesis genes to avoid UV damage and would adapt to microaerobic environment ("parasol strategy"), which also helps explain why PR-possessing bacteria have smaller genomes than those of PR-lacking bacteria. Collectively, our results highlight the different strategies of dealing with light, DNA repair, and oxygen availability that relate to the presence or absence of PR phototrophy.
Collapse
|
9
|
Mitsopoulos C, Schierz AC, Workman P, Al-Lazikani B. Distinctive Behaviors of Druggable Proteins in Cellular Networks. PLoS Comput Biol 2015; 11:e1004597. [PMID: 26699810 PMCID: PMC4689399 DOI: 10.1371/journal.pcbi.1004597] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 10/13/2015] [Indexed: 01/12/2023] Open
Abstract
The interaction environment of a protein in a cellular network is important in defining the role that the protein plays in the system as a whole, and thus its potential suitability as a drug target. Despite the importance of the network environment, it is neglected during target selection for drug discovery. Here, we present the first systematic, comprehensive computational analysis of topological, community and graphical network parameters of the human interactome and identify discriminatory network patterns that strongly distinguish drug targets from the interactome as a whole. Importantly, we identify striking differences in the network behavior of targets of cancer drugs versus targets from other therapeutic areas and explore how they may relate to successful drug combinations to overcome acquired resistance to cancer drugs. We develop, computationally validate and provide the first public domain predictive algorithm for identifying druggable neighborhoods based on network parameters. We also make available full predictions for 13,345 proteins to aid target selection for drug discovery. All target predictions are available through canSAR.icr.ac.uk. Underlying data and tools are available at https://cansar.icr.ac.uk/cansar/publications/druggable_network_neighbourhoods/. The need for well-validated targets for drug discovery is more pressing than ever, especially in cancer in view of resistance to current therapeutics coupled with late stage drug failures. Target prioritization and selection methodologies have typically not taken the protein interaction environment into account. Here we analyze a large representation of the human interactome comprising almost 90,000 interactions between 13,345 proteins. We assess these interactions using an extensive set of topological, graphical and community parameters, and we identify behaviors that distinguish the protein interaction environments of drug targets from the general interactome. Moreover, we identify clear distinctions between the network environment of cancer-drug targets and targets from other therapeutics areas. We use these distinguishing properties to build a predictive methodology to prioritize potential drug targets based on network parameters alone and we validate our predictive models using current FDA-approved drug targets. Our models provide an objective, interactome-based target prioritization methodology to complement existing structure-based and ligand-based prioritization methods. We provide our interactome-based predictions alongside other druggability predictors within the public canSAR resource (cansar.icr.ac.uk).
Collapse
Affiliation(s)
- Costas Mitsopoulos
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Amanda C. Schierz
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Paul Workman
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Bissan Al-Lazikani
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Efficient Generation of Mice with Consistent Transgene Expression by FEEST. Sci Rep 2015; 5:16284. [PMID: 26573149 PMCID: PMC4648098 DOI: 10.1038/srep16284] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 10/07/2015] [Indexed: 12/21/2022] Open
Abstract
Transgenic mouse models are widely used in biomedical research; however, current techniques for producing transgenic mice are limited due to the unpredictable nature of transgene expression. Here, we report a novel, highly efficient technique for the generation of transgenic mice with single-copy integration of the transgene and guaranteed expression of the gene-of-interest (GOI). We refer to this technique as functionally enriched ES cell transgenics, or FEEST. ES cells harboring an inducible Cre gene enabled the efficient selection of transgenic ES cell clones using hygromycin before Cre-mediated recombination. Expression of the GOI was confirmed by assaying for the GFP after Cre recombination. As a proof-of-principle, we produced a transgenic mouse line containing Cre-activatable tTA (cl-tTA6). This tTA mouse model was able to induce tumor formation when crossed with a transgenic mouse line containing a doxycycline-inducible oncogene. We also showed that the cl-tTA6 mouse is a valuable tool for faithfully recapitulating the clinical course of tumor development. We showed that FEEST can be easily adapted for other genes by preparing a transgenic mouse model of conditionally activatable EGFR L858R. Thus, FEEST is a technique with the potential to generate transgenic mouse models at a genome-wide scale.
Collapse
|
11
|
César-Razquin A, Snijder B, Frappier-Brinton T, Isserlin R, Gyimesi G, Bai X, Reithmeier RA, Hepworth D, Hediger MA, Edwards AM, Superti-Furga G. A Call for Systematic Research on Solute Carriers. Cell 2015; 162:478-87. [PMID: 26232220 DOI: 10.1016/j.cell.2015.07.022] [Citation(s) in RCA: 381] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Indexed: 01/10/2023]
Abstract
Solute carrier (SLC) membrane transport proteins control essential physiological functions, including nutrient uptake, ion transport, and waste removal. SLCs interact with several important drugs, and a quarter of the more than 400 SLC genes are associated with human diseases. Yet, compared to other gene families of similar stature, SLCs are relatively understudied. The time is right for a systematic attack on SLC structure, specificity, and function, taking into account kinship and expression, as well as the dependencies that arise from the common metabolic space.
Collapse
Affiliation(s)
- Adrián César-Razquin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Berend Snijder
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Ruth Isserlin
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Gergely Gyimesi
- Institute of Biochemistry and Molecular Medicine and Swiss National Center of Competence in Research, NCCR TransCure, University of Bern, 3012 Bern, Switzerland
| | - Xiaoyun Bai
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8 Canada
| | | | - David Hepworth
- Worldwide Medicinal Chemistry, Pfizer Worldwide Research and Development, Cambridge, MA 02139, USA
| | - Matthias A Hediger
- Institute of Biochemistry and Molecular Medicine and Swiss National Center of Competence in Research, NCCR TransCure, University of Bern, 3012 Bern, Switzerland.
| | - Aled M Edwards
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada.
| | - Giulio Superti-Furga
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; Center for Physiology and Pharmacology, Medical University of Vienna, 1090 Vienna, Austria.
| |
Collapse
|
12
|
Maheshwari S, Brylinski M. Predicting protein interface residues using easily accessible on-line resources. Brief Bioinform 2015; 16:1025-34. [PMID: 25797794 DOI: 10.1093/bib/bbv009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Indexed: 01/20/2023] Open
Abstract
It has been more than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we review 10 methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experimental structures and high-quality homology models, structure-based methods outperform those using only protein sequences, with global template-based approaches providing the best performance. For moderate-quality models, sequence-based methods often perform better than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in several methods quantitatively improve the results only for experimental structures, suggesting that these procedures should be tuned up for computer-generated models. Finally, we anticipate that advanced meta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improvements, easily accessible web servers already provide the scientific community with convenient resources for the identification of protein-protein interaction sites.
Collapse
|
13
|
Whidden CE, DeZeeuw KG, Zorz JK, Joy AP, Barnett DA, Johnson MS, Zhaxybayeva O, Cockshutt AM. Quantitative and functional characterization of the hyper-conserved protein of Prochlorococcus and marine Synechococcus. PLoS One 2014; 9:e109327. [PMID: 25360678 PMCID: PMC4215834 DOI: 10.1371/journal.pone.0109327] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 09/11/2014] [Indexed: 11/26/2022] Open
Abstract
A large fraction of any bacterial genome consists of hypothetical protein-coding open reading frames (ORFs). While most of these ORFs are present only in one or a few sequenced genomes, a few are conserved, often across large phylogenetic distances. Such conservation provides clues to likely uncharacterized cellular functions that need to be elucidated. Marine cyanobacteria from the Prochlorococcus/marine Synechococcus clade are dominant bacteria in oceanic waters and are significant contributors to global primary production. A Hyper Conserved Protein (PSHCP) of unknown function is 100% conserved at the amino acid level in genomes of Prochlorococcus/marine Synechococcus, but lacks homologs outside of this clade. In this study we investigated Prochlorococcus marinus strains MED4 and MIT 9313 and Synechococcus sp. strain WH 8102 for the transcription of the PSHCP gene using RT-Q-PCR, for the presence of the protein product through quantitative immunoblotting, and for the protein's binding partners in a pull down assay. Significant transcription of the gene was detected in all strains. The PSHCP protein content varied between 8±1 fmol and 26±9 fmol per ug total protein, depending on the strain. The 50 S ribosomal protein L2, the Photosystem I protein PsaD and the Ycf48-like protein were found associated with the PSHCP protein in all strains and not appreciably or at all in control experiments. We hypothesize that PSHCP is a protein associated with the ribosome, and is possibly involved in photosystem assembly.
Collapse
Affiliation(s)
- Caroline E. Whidden
- Department of Chemistry & Biochemistry, Mount Allison University, Sackville, NB, Canada
| | - Katrina G. DeZeeuw
- Department of Chemistry & Biochemistry, Mount Allison University, Sackville, NB, Canada
| | - Jackie K. Zorz
- Department of Chemistry & Biochemistry, Mount Allison University, Sackville, NB, Canada
| | - Andrew P. Joy
- Atlantic Cancer Research Institute, Moncton, NB, Canada
| | | | - Milo S. Johnson
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Olga Zhaxybayeva
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire, United States of America
- * E-mail: (OZ); (AMC)
| | - Amanda M. Cockshutt
- Department of Chemistry & Biochemistry, Mount Allison University, Sackville, NB, Canada
- * E-mail: (OZ); (AMC)
| |
Collapse
|
14
|
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2014; 43:D447-52. [PMID: 25352553 PMCID: PMC4383874 DOI: 10.1093/nar/gku1003] [Citation(s) in RCA: 7123] [Impact Index Per Article: 712.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein–protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein–protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Andrea Franceschini
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Stefan Wyder
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | | | - Davide Heller
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | | | - Milan Simonovic
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Alexander Roth
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Alberto Santos
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Kalliopi P Tsafou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Michael Kuhn
- Biotechnology Center, Technische Universität Dresden, 01062 Dresden, Germany Max Planck Institute of Molecular Cell Biology and Genetics, 01062 Dresden, Germany
| | - Peer Bork
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
15
|
Zybailov BL, Glazko GV, Jaiswal M, Raney KD. Large Scale Chemical Cross-linking Mass Spectrometry Perspectives. ACTA ACUST UNITED AC 2013; 6:001. [PMID: 25045217 PMCID: PMC4101816 DOI: 10.4172/jpb.s2-001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The spectacular heterogeneity of a complex protein mixture from biological samples becomes even more difficult to tackle when one’s attention is shifted towards different protein complex topologies, transient interactions, or localization of PPIs. Meticulous protein-by-protein affinity pull-downs and yeast-two-hybrid screens are the two approaches currently used to decipher proteome-wide interaction networks. Another method is to employ chemical cross-linking, which gives not only identities of interactors, but could also provide information on the sites of interactions and interaction interfaces. Despite significant advances in mass spectrometry instrumentation over the last decade, mapping Protein-Protein Interactions (PPIs) using chemical cross-linking remains time consuming and requires substantial expertise, even in the simplest of systems. While robust methodologies and software exist for the analysis of binary PPIs and also for the single protein structure refinement using cross-linking-derived constraints, undertaking a proteome-wide cross-linking study is highly complex. Difficulties include i) identifying cross-linkers of the right length and selectivity that could capture interactions of interest; ii) enrichment of the cross-linked species; iii) identification and validation of the cross-linked peptides and cross-linked sites. In this review we examine existing literature aimed at the large-scale protein cross-linking and discuss possible paths for improvement. We also discuss short-length cross-linkers of broad specificity such as formaldehyde and diazirine-based photo-cross-linkers. These cross-linkers could potentially capture many types of interactions, without strict requirement for a particular amino-acid to be present at a given protein-protein interface. How these shortlength, broad specificity cross-linkers be applied to proteome-wide studies? We will suggest specific advances in methodology, instrumentation and software that are needed to make such a leap.
Collapse
Affiliation(s)
- Boris L Zybailov
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Galina V Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Mihir Jaiswal
- UALR/UAMS Joint Bioinformatics Program, University of Arkansas Little Rock, Little Rock, AR, USA
| | - Kevin D Raney
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
16
|
Armean IM, Lilley KS, Trotter MWB. Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments. Mol Cell Proteomics 2012; 12:1-13. [PMID: 23071097 DOI: 10.1074/mcp.r112.019554] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.
Collapse
Affiliation(s)
- Irina M Armean
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
17
|
Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours. Mol Syst Biol 2012; 8:581. [PMID: 22569339 PMCID: PMC3377989 DOI: 10.1038/msb.2012.13] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 03/24/2012] [Indexed: 11/09/2022] Open
Abstract
Many characterized metabolic enzymes currently lack associated gene and protein sequences. Here, pathway and genomic neighbour data are used to assign genes to these ‘orphan enzymes,' and the predictions are validated with experimental assays and genome-scale metabolic modelling. ![]()
A computational method is developed for assigning candidate sequences to orphan enzymes. The method uses metabolic pathway, genomic neighbourhood, genomic co-occurrence, and protein domain information to predict genes that are likely to perform a particular enzymatic function. Benchmarking of the scoring scheme based on the 4 features above revealed that some combinations of parameters yielded greater than 70% accuracy, and that high-confidence predictions could be generated for 131 orphan enzymes. Enzyme assay experiments confirmed the predicted enzymatic activity for two of the high-confidence candidate sequences. Predicted functions can improve the annotation of genomic and metagenomic data, and can reveal putative genes for enzymes with potential biotechnological applications. Incorporating the predicted enzymatic reactions into genome-scale metabolic models changed the flux connectivity and improved their ability to correctly predict gene essentiality, supporting the biological relevance of these predictions.
Despite the current wealth of sequencing data, one-third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome-scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction.
Collapse
|
18
|
Raftery AE, Niu X, Hoff PD, Yeung KY. Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood. J Comput Graph Stat 2012; 21:901-919. [PMID: 27570438 DOI: 10.1080/10618600.2012.679240] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Network models are widely used in social sciences and genome sciences. The latent space model proposed by (Hoff et al. 2002), and extended by (Handcock et al. 2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), where N is the number of nodes. This makes it infeasible for large networks. In this paper, we propose an approximation of the log likelihood function. We adopt the case-control idea from epidemiology and construct a case-control likelihood which is an unbiased estimator of the full likelihood. Replacing the full likelihood by the case-control likelihood in the MCMC estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links.
Collapse
Affiliation(s)
- Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Xiaoyue Niu
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Peter D Hoff
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Ka Yee Yeung
- Department of Statistics, University of Washington, Seattle, Wash., USA
| |
Collapse
|
19
|
Doerks T, van Noort V, Minguez P, Bork P. Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 2012; 7:e34302. [PMID: 22485162 PMCID: PMC3317503 DOI: 10.1371/journal.pone.0034302] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 02/26/2012] [Indexed: 11/18/2022] Open
Abstract
The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as ‘hypothetical’ implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.
Collapse
Affiliation(s)
- Tobias Doerks
- European Molecular Biology Laboratory, Heidelberg, Germany.
| | | | | | | |
Collapse
|
20
|
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, Jensen LJ, von Mering C, Bork P. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 2012; 40:D284-9. [PMID: 22096231 PMCID: PMC3245133 DOI: 10.1093/nar/gkr1060] [Citation(s) in RCA: 386] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/26/2011] [Accepted: 10/26/2011] [Indexed: 11/28/2022] Open
Abstract
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721,801 orthologous groups, encompassing a total of 4,396,591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101,208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450,904 orthologous groups (62.5%).
Collapse
Affiliation(s)
- Sean Powell
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Damian Szklarczyk
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Kalliopi Trachana
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Alexander Roth
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Michael Kuhn
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Jean Muller
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Roland Arnold
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Thomas Rattei
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Ivica Letunic
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Tobias Doerks
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Lars J. Jensen
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Christian von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Peer Bork
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| |
Collapse
|
21
|
Ng SK, Tan SH. DISCOVERING PROTEIN–PROTEIN INTERACTIONS. J Bioinform Comput Biol 2011; 1:711-41. [PMID: 15290761 DOI: 10.1142/s0219720004000600] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2003] [Revised: 12/12/2003] [Accepted: 12/13/2003] [Indexed: 11/18/2022]
Abstract
The ongoing genomics and proteomics efforts have helped identify many new genes and proteins in living organisms. However, simply knowing the existence of genes and proteins does not tell us much about the biological processes in which they participate. Many major biological processes are controlled by protein interaction networks. A comprehensive description of protein–protein interactions is therefore necessary to understand the genetic program of life. In this tutorial, we provide an overview of the various current high-throughput methods for discovering protein–protein interactions, covering both the conventional experimental methods and new computational approaches.
Collapse
Affiliation(s)
- See-Kiong Ng
- Knowledge Discovery Department, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore.
| | | |
Collapse
|
22
|
De Las Rivas J, de Luis A. Interactome data and databases: different types of protein interaction. Comp Funct Genomics 2011; 5:173-8. [PMID: 18629062 PMCID: PMC2447346 DOI: 10.1002/cfg.377] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2003] [Revised: 12/10/2003] [Accepted: 12/18/2003] [Indexed: 11/29/2022] Open
Abstract
In recent years, the biomolecular sciences have been driven forward by overwhelming
advances in new biotechnological high-throughput experimental methods and bioinformatic
genome-wide computational methods. Such breakthroughs are producing
huge amounts of new data that need to be carefully analysed to obtain correct and
useful scientific knowledge. One of the fields where this advance has become more
intense is the study of the network of ‘protein–protein interactions’, i.e. the ‘interactome’.
In this short review we comment on the main data and databases produced
in this field in last 5 years. We also present a rationalized scheme of biological definitions
that will be useful for a better understanding and interpretation of ‘what a
protein–protein interaction is’ and ‘which types of protein–protein interactions are
found in a living cell’. Finally, we comment on some assignments of interactome data
to defined types of protein interaction and we present a new bioinformatic tool called
APIN (Agile Protein Interaction Network browser), which is in development and will
be applied to browsing protein interaction databases.
Collapse
Affiliation(s)
- Javier De Las Rivas
- Cancer Research Center (CIC, USAL-CSIC), University of Salamanca and CSIC, Campus Miguel de Unamuno, Salamanca E37007, Spain.
| | | |
Collapse
|
23
|
Assessing the biological significance of gene expression signatures and co-expression modules by studying their network properties. PLoS One 2011; 6:e17474. [PMID: 21408226 PMCID: PMC3049771 DOI: 10.1371/journal.pone.0017474] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 02/03/2011] [Indexed: 12/23/2022] Open
Abstract
Microarray experiments have been extensively used to define signatures, which are sets of genes that can be considered markers of experimental conditions (typically diseases). Paradoxically, in spite of the apparent functional role that might be attributed to such gene sets, signatures do not seem to be reproducible across experiments. Given the close relationship between function and protein interaction, network properties can be used to study to what extent signatures are composed of genes whose resulting proteins show a considerable level of interaction (and consequently a putative common functional role).We have analysed 618 signatures and 507 modules of co-expression in cancer looking for significant values of four main protein-protein interaction (PPI) network parameters: connection degree, cluster coefficient, betweenness and number of components. A total of 3904 gene ontology (GO) modules, 146 KEGG pathways, and 263 Biocarta pathways have been used as functional modules of reference.Co-expression modules found in microarray experiments display a high level of connectivity, similar to the one shown by conventional modules based on functional definitions (GO, KEGG and Biocarta). A general observation for all the classes studied is that the networks formed by the modules improve their topological parameters when an external protein is allowed to be introduced within the paths (up to the 70% of GO modules show network parameters beyond the random expectation). This fact suggests that functional definitions are incomplete and some genes might still be missing. Conversely, signatures are clearly not capturing the altered functions in the corresponding studies. This is probably because the way in which the genes have been selected in the signatures is too conservative. These results suggest that gene selection methods which take into account relationships among genes should be superior to methods that assume independence among genes outside their functional contexts.
Collapse
|
24
|
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ, von Mering C. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011; 39:D561-8. [PMID: 21045058 PMCID: PMC3013807 DOI: 10.1093/nar/gkq973] [Citation(s) in RCA: 2547] [Impact Index Per Article: 195.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 10/03/2010] [Indexed: 12/12/2022] Open
Abstract
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Andrea Franceschini
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael Kuhn
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Milan Simonovic
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Alexander Roth
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Pablo Minguez
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Tobias Doerks
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Manuel Stark
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Jean Muller
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Peer Bork
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Lars J. Jensen
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Christian von Mering
- Faculty of Health Sciences, Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Denmark, Faculty of Science, Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Biotechnology Center, Technical University Dresden, Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| |
Collapse
|
25
|
Jaeger S, Sers CT, Leser U. Combining modularity, conservation, and interactions of proteins significantly increases precision and coverage of protein function prediction. BMC Genomics 2010; 11:717. [PMID: 21171995 PMCID: PMC3017542 DOI: 10.1186/1471-2164-11-717] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Accepted: 12/20/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While the number of newly sequenced genomes and genes is constantly increasing, elucidation of their function still is a laborious and time-consuming task. This has led to the development of a wide range of methods for predicting protein functions in silico. We report on a new method that predicts function based on a combination of information about protein interactions, orthology, and the conservation of protein networks in different species. RESULTS We show that aggregation of these independent sources of evidence leads to a drastic increase in number and quality of predictions when compared to baselines and other methods reported in the literature. For instance, our method generates more than 12,000 novel protein functions for human with an estimated precision of ~76%, among which are 7,500 new functional annotations for 1,973 human proteins that previously had zero or only one function annotated. We also verified our predictions on a set of genes that play an important role in colorectal cancer (MLH1, PMS2, EPHB4 ) and could confirm more than 73% of them based on evidence in the literature. CONCLUSIONS The combination of different methods into a single, comprehensive prediction method infers thousands of protein functions for every species included in the analysis at varying, yet always high levels of precision and very good coverage.
Collapse
Affiliation(s)
- Samira Jaeger
- Knowledge Management in Bioinformatics, Humboldt-Universitat zu Berlin Unter den Linden 6, 10099 Berlin, Germany.
| | | | | |
Collapse
|
26
|
Schliep K, Lopez P, Lapointe FJ, Bapteste E. Harvesting evolutionary signals in a forest of prokaryotic gene trees. Mol Biol Evol 2010; 28:1393-405. [PMID: 21172835 DOI: 10.1093/molbev/msq323] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Phylogenomic studies produce increasingly large phylogenetic forests of trees with patchy taxonomical sampling. Typically, prokaryotic data generate thousands of gene trees of all sizes that are difficult, if not impossible, to root. Their topologies do not match the genealogy of lineages, as they are influenced not only by duplication, losses, and vertical descent but also by lateral gene transfer (LGT) and recombination. Because this complexity in part reflects the diversity of evolutionary processes, the study of phylogenetic forests is thus a great opportunity to improve our understanding of prokaryotic evolution. Here, we show how the rich evolutionary content of such novel phylogenetic objects can be exploited through the development of new approaches designed specifically for extracting the multiple evolutionary signals present in the forest of life, that is, by slicing up trees into remarkable bits and pieces: clans, slices, and clips. We harvested a forest of 6,901 unrooted gene trees comprising up to 100 prokaryotic genomes (41 archaea and 59 bacteria) to search for evolutionary events that a species tree would not account for. We identified 1) trees and partitions of trees that reflected the lifestyle of organisms rather than their taxonomy, 2) candidate lifestyle-specific genetic modules, used by distinct unrelated organisms to adapt to the same environment, 3) gene families, nonrandomly distributed in the functional space, that were frequently exchanged between archaea and bacteria, sometimes without major changes in their sequences. Finally, 4) we reconstructed polarized networks of genetic partnerships between archaea and bacteria to describe some of the rules affecting LGT between these two Domains.
Collapse
Affiliation(s)
- Klaus Schliep
- UMR CNRS 7138 Systématique, Adaptation, Evolution, Muséum National d'Histoire Naturelle, Paris, France
| | | | | | | |
Collapse
|
27
|
Hawkins T, Chitale M, Kihara D. Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 2010; 11:265. [PMID: 20482861 PMCID: PMC2882935 DOI: 10.1186/1471-2105-11-265] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 05/19/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance. RESULTS Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted. CONCLUSION The analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
28
|
Rodriguez-Soca Y, Munteanu CR, Dorado J, Pazos A, Prado-Prado FJ, González-Díaz H. Trypano-PPI: A Web Server for Prediction of Unique Targets in Trypanosome Proteome by using Electrostatic Parameters of Protein−protein Interactions. J Proteome Res 2009; 9:1182-90. [DOI: 10.1021/pr900827b] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Yamilet Rodriguez-Soca
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Cristian R. Munteanu
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Julián Dorado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Alejandro Pazos
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Francisco J. Prado-Prado
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| | - Humberto González-Díaz
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain, and Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, Spain
| |
Collapse
|
29
|
Abstract
Bioinformatics is a central discipline in modern life sciences aimed at describing the complex properties of living organisms starting from large-scale data sets of cellular constituents such as genes and proteins. In order for this wealth of information to provide useful biological knowledge, databases and software tools for data collection, analysis and interpretation need to be developed. In this paper, we review recent advances in the design and implementation of bioinformatics resources devoted to the study of metals in biological systems, a research field traditionally at the heart of bioinorganic chemistry. We show how metalloproteomes can be extracted from genome sequences, how structural properties can be related to function, how databases can be implemented, and how hints on interactions can be obtained from bioinformatics.
Collapse
Affiliation(s)
- Ivano Bertini
- Magnetic Resonance Center (CERM)-University of Florence, Via L. Sacconi 6, Sesto Fiorentino, Italy.
| | | |
Collapse
|
30
|
Li X, Chen H, Li J, Zhang Z. Gene function prediction with gene interaction networks: a context graph kernel approach. ACTA ACUST UNITED AC 2009; 14:119-28. [PMID: 19789115 DOI: 10.1109/titb.2009.2033116] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.
Collapse
Affiliation(s)
- Xin Li
- Department of Information Systems, City University of Hong Kong, Kowloon Tong, Hong Kong.
| | | | | | | |
Collapse
|
31
|
Babu M, Musso G, Díaz-Mejía JJ, Butland G, Greenblatt JF, Emili A. Systems-level approaches for identifying and analyzing genetic interaction networks in Escherichia coli and extensions to other prokaryotes. MOLECULAR BIOSYSTEMS 2009; 5:1439-55. [PMID: 19763343 DOI: 10.1039/b907407d] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Molecular interactions define the functional organization of the cell. Epistatic (genetic, or gene-gene) interactions, one of the most informative and commonly encountered forms of functional relationships, are increasingly being used to map process architecture in model eukaryotic organisms. In particular, 'systems-level' screens in yeast and worm aimed at elucidating genetic interaction networks have led to the generation of models describing the global modular organization of gene products and protein complexes within a cell. However, comparable data for prokaryotic organisms have not been available. Given its ease of growth and genetic manipulation, the Gram-negative bacterium Escherichia coli appears to be an ideal model system for performing comprehensive genome-scale examinations of genetic redundancy in bacteria. In this review, we highlight emerging experimental and computational techniques that have been developed recently to examine functional relationships and redundancy in E. coli at a systems-level, and their potential application to prokaryotes in general. Additionally, we have scanned PubMed abstracts and full-text published articles to manually curate a list of approximately 200 previously reported synthetic sick or lethal genetic interactions in E. coli derived from small-scale experimental studies.
Collapse
Affiliation(s)
- Mohan Babu
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | | | | | | | | | | |
Collapse
|
32
|
Kushwaha SK, Shakya M. PINAT1.0: protein interaction network analysis tool. Bioinformation 2009; 3:419-21. [PMID: 19759862 PMCID: PMC2737494 DOI: 10.6026/97320630003419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Revised: 04/01/2009] [Accepted: 04/08/2009] [Indexed: 11/28/2022] Open
Abstract
Cellular processes are regulated by interaction of various proteins i.e. multiprotein complexes and absences of these
interactions are often the cause of disorder or disease. Such type of protein interactions are of great interest for drug
designing. In hostparasite diseases like Tuberculosis, non-homologous proteins as drug target are first preference. Most
potent drug target can be identifying among large number of non-homologous protein through protein interaction network
analysis. Drug target should be those non-homologous protein which is associated with maximum number of functional
proteins i.e. has highest number of interactants, so that maximum harm can be caused to pathogen only. In present work,
Protein Interaction Network Analysis Tool (PINAT) has been developed to identification of potential protein interaction for
drug target identification. PINAT is standalone, GUI application software made for protein-protein interaction (PPI) analysis
and network building by using coevolutionary profile. PINAT is very useful for large data PPI study with easiest handling
among available softwares. PINAT provides excellent facilities for the assembly of data for network building with visual
presentation of the results and interaction score. The software is written in JAVA and provides reliability through
transparency with user.
Collapse
|
33
|
Dotan-Cohen D, Letovsky S, Melkman AA, Kasif S. Biological process linkage networks. PLoS One 2009; 4:e5313. [PMID: 19390589 PMCID: PMC2669181 DOI: 10.1371/journal.pone.0005313] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Accepted: 03/24/2009] [Indexed: 12/21/2022] Open
Abstract
Background The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes. We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN). Results We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent. As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods. Conclusions Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.
Collapse
Affiliation(s)
- Dikla Dotan-Cohen
- Department of Computer Science, Ben-Gurion University, Beer Sheva, Israel.
| | | | | | | |
Collapse
|
34
|
Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009; 74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | | | |
Collapse
|
35
|
Bernthaler A, Mühlberger I, Fechete R, Perco P, Lukas A, Mayer B. A dependency graph approach for the analysis of differential gene expression profiles. MOLECULAR BIOSYSTEMS 2009; 5:1720-31. [PMID: 19585005 DOI: 10.1039/b903109j] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Affiliation(s)
- Andreas Bernthaler
- Theory and Logics Group, Institute of Computer Languages, Vienna University of Technology, Favoritenstrasse 9-11, A-1040 Vienna, Austria.
| | | | | | | | | | | |
Collapse
|
36
|
Jaeger S, Gaudan S, Leser U, Rebholz-Schuhmann D. Integrating protein-protein interactions and text mining for protein function prediction. BMC Bioinformatics 2008; 9 Suppl 8:S2. [PMID: 18673526 PMCID: PMC2500093 DOI: 10.1186/1471-2105-9-s8-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Background Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature. Results Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator. Conclusion Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.
Collapse
Affiliation(s)
- Samira Jaeger
- Knowledge Management in Bioinformatics, Humboldt-University Berlin, Unter den Linden 6, 10099 Berlin, Germany.
| | | | | | | |
Collapse
|
37
|
Dea-Ayuela MA, Pérez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernández F, Chou KC, González-Díaz H. HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 2008; 16:7770-6. [PMID: 18662882 DOI: 10.1016/j.bmc.2008.07.023] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Revised: 06/23/2008] [Accepted: 07/02/2008] [Indexed: 10/21/2022]
Abstract
The toxicity and inefficacy of actual organic drugs against Leishmaniosis justify research projects to find new molecular targets in Leishmania species including Leishmania infantum (L. infantum) and Leishmaniamajor (L. major), both important pathogens. In this sense, quantitative structure-activity relationship (QSAR) methods, which are very useful in Bioorganic and Medicinal Chemistry to discover small-sized drugs, may help to identify not only new drugs but also new drug targets, if we apply them to proteins. Dyneins are important proteins of these parasites governing fundamental processes such as cilia and flagella motion, nuclear migration, organization of the mitotic splinde, and chromosome separation during mitosis. However, despite the interest for them as potential drug targets, so far there has been no report whatsoever on dyneins with QSAR techniques. To the best of our knowledge, we report here the first QSAR for dynein proteins. We used as input the Spectral Moments of a Markov matrix associated to the HP-Lattice Network of the protein sequence. The data contain 411 protein sequences of different species selected by ClustalX to develop a QSAR that correctly discriminates on average between 92.75% and 92.51% of dyneins and other proteins in four different train and cross-validation datasets. We also report a combined experimental and theoretic study of a new dynein sequence in order to illustrate the utility of the model to search for potential drug targets with a practical example. First, we carried out a 2D-electrophoresis analysis of L. infantum biological samples. Next, we excised from 2D-E gels one spot of interest belonging to an unknown protein or protein fragment in the region M<20,200 and pI<4. We used MASCOT search engine to find proteins in the L. major data base with the highest similarity score to the MS of the protein isolated from L. infantum. We used the QSAR model to predict the new sequence as dynein with probability of 99.99% without relying upon alignment. In order to confirm the previous function annotation we predicted the sequences as dynein with BLAST and the omniBLAST tools (96% alignment similarity to dyneins of other species). Using this combined strategy, we have successfully identified L. infantum protein containing dynein heavy chain, and illustrated the potential use of the QSAR model as a complement to alignment tools.
Collapse
|
38
|
Aguilar D, Oliva B. Topological comparison of methods for predicting transcriptional cooperativity in yeast. BMC Genomics 2008; 9:137. [PMID: 18366726 PMCID: PMC2315657 DOI: 10.1186/1471-2164-9-137] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 03/25/2008] [Indexed: 11/10/2022] Open
Abstract
Background The cooperative interaction between transcription factors has a decisive role in the control of the fate of the eukaryotic cell. Computational approaches for characterizing cooperative transcription factors in yeast, however, are based on different rationales and provide a low overlap between their results. Because the wealth of information contained in protein interaction networks and regulatory networks has proven highly effective in elucidating functional relationships between proteins, we compared different sets of cooperative transcription factor pairs (predicted by four different computational methods) within the frame of those networks. Results Our results show that the overlap between the sets of cooperative transcription factors predicted by the different methods is low yet significant. Cooperative transcription factors predicted by all methods are closer and more clustered in the protein interaction network than expected by chance. On the other hand, members of a cooperative transcription factor pair neither seemed to regulate each other nor shared similar regulatory inputs, although they do regulate similar groups of target genes. Conclusion Despite the different definitions of transcriptional cooperativity and the different computational approaches used to characterize cooperativity between transcription factors, the analysis of their roles in the framework of the protein interaction network and the regulatory network indicates a common denominator for the predictions under study. The knowledge of the shared topological properties of cooperative transcription factor pairs in both networks can be useful not only for designing better prediction methods but also for better understanding the complexities of transcriptional control in eukaryotes.
Collapse
Affiliation(s)
- Daniel Aguilar
- Structural Bioinformatics Group (GRIB), IMIM-Universitat Pompeu Fabra, C/Doctor Aiguader, 88, Barcelona 08003, Spain.
| | | |
Collapse
|
39
|
Groth P, Weiss B, Pohlenz HD, Leser U. Mining phenotypes for gene function prediction. BMC Bioinformatics 2008; 9:136. [PMID: 18315868 PMCID: PMC2311305 DOI: 10.1186/1471-2105-9-136] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Accepted: 03/03/2008] [Indexed: 01/29/2023] Open
Abstract
Background Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. Results We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. Conclusion The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.
Collapse
Affiliation(s)
- Philip Groth
- Research Laboratories of Bayer Schering Pharma AG, Berlin, Germany.
| | | | | | | |
Collapse
|
40
|
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8:750-78. [DOI: 10.1002/pmic.200700638] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
41
|
Kensche PR, van Noort V, Dutilh BE, Huynen MA. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 2008; 5:151-70. [PMID: 17535793 PMCID: PMC2405902 DOI: 10.1098/rsif.2007.1047] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.
Collapse
Affiliation(s)
- Philip R. Kensche
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
- Author for correspondence ()
| | - Vera van Noort
- European Molecular Biology Laboratory, Meyerhofstrasse 169117 Heidelberg, Germany
| | - Bas E. Dutilh
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Martijn A. Huynen
- Centre for Molecular and Biomolecular Informatics/Nijmegen, Centre for Molecular Life Sciences, Radboud University Medical CentrePO Box 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
42
|
Gimona M. Protein Linguistics and the Modular Code of the Cytoskeleton. BIOSEMIOTICS 2008:189-206. [DOI: 10.1007/978-1-4020-6340-4_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
43
|
Sun J, Sun Y, Ding G, Liu Q, Wang C, He Y, Shi T, Li Y, Zhao Z. InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes. BMC Bioinformatics 2007; 8:414. [PMID: 17963500 PMCID: PMC2238723 DOI: 10.1186/1471-2105-8-414] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 10/26/2007] [Indexed: 01/04/2023] Open
Abstract
Background Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there have been few integration studies for PPI prediction; one failed to yield appreciable improvement of prediction and the others did not conduct performance comparison. It remains unclear whether an integration of multiple genomic features can improve the PPI prediction and, if it can, how to integrate these features. Results In this study, we first performed a systematic evaluation on the PPI prediction in Escherichia coli (E. coli) by four genomic context based methods: the phylogenetic profile method, the gene cluster method, the gene fusion method, and the gene neighbor method. The number of predicted PPIs and the average degree in the predicted PPI networks varied greatly among the four methods. Further, no method outperformed the others when we tested using three well-defined positive datasets from the KEGG, EcoCyc, and DIP databases. Based on these comparisons, we developed a novel integrated method, named InPrePPI. InPrePPI first normalizes the AC value (an integrated value of the accuracy and coverage) of each method using three positive datasets, then calculates a weight for each method, and finally uses the weight to calculate an integrated score for each protein pair predicted by the four genomic context based methods. We demonstrate that InPrePPI outperforms each of the four individual methods and, in general, the other two existing integrated methods: the joint observation method and the integrated prediction method in STRING. These four methods and InPrePPI are implemented in a user-friendly web interface. Conclusion This study evaluated the PPI prediction by four genomic context based methods, and presents an integrated evaluation method that shows better performance in E. coli.
Collapse
Affiliation(s)
- Jingchun Sun
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Raes J, Foerstner KU, Bork P. Get the most out of your metagenome: computational analysis of environmental sequence data. Curr Opin Microbiol 2007; 10:490-8. [DOI: 10.1016/j.mib.2007.09.001] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Revised: 08/27/2007] [Accepted: 09/03/2007] [Indexed: 11/28/2022]
|
45
|
Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007; 104:13913-8. [PMID: 17717083 PMCID: PMC1955820 DOI: 10.1073/pnas.0702636104] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
Collapse
Affiliation(s)
- E. D. Harrington
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - A. H. Singh
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - T. Doerks
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - I. Letunic
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - C. von Mering
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - L. J. Jensen
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - J. Raes
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
| | - P. Bork
- *Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany; and
- Max Delbrück Centre for Molecular Medicine, D-13092 Berlin, Germany
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
46
|
Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim PM, Montelione GT, Gerstein MB. Comparing classical pathways and modern networks: towards the development of an edge ontology. Trends Biochem Sci 2007; 32:320-31. [PMID: 17583513 DOI: 10.1016/j.tibs.2007.06.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Revised: 05/02/2007] [Accepted: 06/06/2007] [Indexed: 02/04/2023]
Abstract
Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput (HTP) experiments offer standardized networks that facilitate topological calculations. Combining these perspectives, classical pathways can be embedded within large-scale networks and thus demonstrate the crosstalk between them. As more diverse types of HTP data become available, both perspectives can be effectively merged, embedding pathways simultaneously in multiple networks. However, the original problem still remains - the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized and well-defined edge ontology is necessary and propose a prototype as a starting point for reaching this goal.
Collapse
Affiliation(s)
- Long J Lu
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Raes J, Harrington ED, Singh AH, Bork P. Protein function space: viewing the limits or limited by our view? Curr Opin Struct Biol 2007; 17:362-9. [PMID: 17574832 DOI: 10.1016/j.sbi.2007.05.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Revised: 04/25/2007] [Accepted: 05/31/2007] [Indexed: 12/13/2022]
Abstract
Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%. Although this fraction is more uncertain in metagenomics surveys, because of environmental complexities and differences in analysis protocols, our global knowledge of protein functions still appears to be considerable. However, when we consider protein families, continued sequencing seems to yield an ever-increasing number of novel families. Until we reconcile these two views, the limits of protein space will remain obscured.
Collapse
Affiliation(s)
- Jeroen Raes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | | | |
Collapse
|
48
|
Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages. BMC Bioinformatics 2007; 8 Suppl 4:S6. [PMID: 17570149 PMCID: PMC1892085 DOI: 10.1186/1471-2105-8-s4-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
Collapse
|
49
|
Abstract
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
Collapse
Affiliation(s)
- András Szilágyi
- Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington St, Buffalo, NY 14203, USA
| | | | | | | |
Collapse
|
50
|
Pachkov M, Dandekar T, Korbel J, Bork P, Schuster S. Use of pathway analysis and genome context methods for functional genomics of Mycoplasma pneumoniae nucleotide metabolism. Gene 2007; 396:215-25. [PMID: 17467928 DOI: 10.1016/j.gene.2007.02.033] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2005] [Revised: 11/26/2006] [Accepted: 02/21/2007] [Indexed: 11/27/2022]
Abstract
Elementary modes analysis allows one to reveal whether a set of known enzymes is sufficient to sustain functionality of the cell. Moreover, it is helpful in detecting missing reactions and predicting which enzymes could fill these gaps. Here, we perform a comprehensive elementary modes analysis and a genomic context analysis of Mycoplasma pneumoniae nucleotide metabolism, and search for new enzyme activities. The purine and pyrimidine networks are reconstructed by assembling enzymes annotated in the genome or found experimentally. We show that these reaction sets are sufficient for enabling synthesis of DNA and RNA in M. pneumoniae. Special focus is on the key modes for growth. Moreover, we make an educated guess on the nutritional requirements of this micro-organism. For the case that M. pneumoniae does not require adenine as a substrate, we suggest adenylosuccinate synthetase (EC 6.3.4.4), adenylosuccinate lyase (EC 4.3.2.2) and GMP reductase (EC 1.7.1.7) to be operative. GMP reductase activity is putatively assigned to the NRDI_MYCPN gene on the basis of the genomic context analysis. For the pyrimidine network, we suggest CTP synthase (EC 6.3.4.2) to be active. Further experiments on the nutritional requirements are needed to make a decision. Pyrimidine metabolism appears to be more appropriate as a drug target than purine metabolism since it shows lower plasticity.
Collapse
Affiliation(s)
- Mikhail Pachkov
- Department of Bioinformatics, Faculty of Biology and Pharmaceutics, Friedrich-Schiller University Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany.
| | | | | | | | | |
Collapse
|