1
|
Lupo U, Sgarbossa D, Milighetti M, Bitbol AF. DiffPaSS-high-performance differentiable pairing of protein sequences using soft scores. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 41:btae738. [PMID: 39672677 DOI: 10.1093/bioinformatics/btae738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 12/05/2024] [Accepted: 12/11/2024] [Indexed: 12/15/2024]
Abstract
MOTIVATION Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores. RESULTS We introduce Differentiable Pairing using Soft Scores (DiffPaSS), a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual information and neighbor graph alignment scores. DiffPaSS outperforms existing algorithms for optimizing the same scores. We demonstrate the usefulness of our paired alignments for the prediction of protein complex structure. DiffPaSS does not require sequences to be aligned, and we also apply it to nonaligned sequences from T-cell receptors. AVAILABILITY AND IMPLEMENTATION A PyTorch implementation and installable Python package are available at https://github.com/Bitbol-Lab/DiffPaSS.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom
- Cancer Institute, University College London, London WC1E 6DD, United Kingdom
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
2
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
3
|
Gandarilla-Pérez CA, Pinilla S, Bitbol AF, Weigt M. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins. PLoS Comput Biol 2023; 19:e1011010. [PMID: 36996234 PMCID: PMC10089317 DOI: 10.1371/journal.pcbi.1011010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/11/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023] Open
Abstract
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana, Cuba
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| | - Sergio Pinilla
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (UMR 8237), Paris, France
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| |
Collapse
|
4
|
Wang L, Li FL, Ma XY, Cang Y, Bai F. PPI-Miner: A Structure and Sequence Motif Co-Driven Protein-Protein Interaction Mining and Modeling Computational Method. J Chem Inf Model 2022; 62:6160-6171. [PMID: 36448715 DOI: 10.1021/acs.jcim.2c01033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein-protein interactions (PPIs) play important roles in biological processes of life, and predicting PPIs becomes a critical scientific issue of concern. Most PPIs occur through small domains or motifs (fragments), which are challenging and laborious to map by standard biochemical approaches because they generally require the cloning of several truncation mutants. Here, we present a computational method, named as PPI-Miner, to fish potential protein interacting partners utilizing protein motifs as queries. In brief, this work first developed a motif-matching algorithm designed to identify the proteins that contain sequential or structural similar motifs with the given query motif. Being aligned to the query motif, the binding mode of the discovered motif and its receptor protein will be initially determined to be used to build PPI complexes accordingly. Eventually, a PPI complex structure could be built and optimized with a designed automatic protocol. Besides discovering PPIs, PPI-Miner can also be applied to other areas, i.e., the rational design of molecular glues and protein vaccines. In this work, PPI-Miner was employed to mine the potential cereblon (CRBN) substrates from human proteome. As a result, 1,739 candidates were predicted, and 16 of them have been experimentally validated in previous studies. The source code of PPI-Miner can be obtained from the GitHub repository (https://github.com/Wang-Lin-boop/PPI-Miner), the webserver is freely available for users (https://bailab.siais.shanghaitech.edu.cn/services/ppi-miner), and the database of predicted CRBN substrates is accessible at https://bailab.siais.shanghaitech.edu.cn/services/crbn-subslib.
Collapse
Affiliation(s)
| | | | | | | | - Fang Bai
- Shanghai Clinical Research and Trial Center, Shanghai201210, China
| |
Collapse
|
5
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
6
|
Hong Y, Lee J, Ko J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 2022; 23:93. [PMID: 35296230 PMCID: PMC8925138 DOI: 10.1186/s12859-022-04628-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04628-8.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co, Seoul, Republic of Korea
| | - Juyong Lee
- Arontier Co, Seoul, Republic of Korea. .,Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, Republic of Korea.
| | - Junsu Ko
- Arontier Co, Seoul, Republic of Korea
| |
Collapse
|
7
|
Vaglietti S, Fiumara F. PolyQ length co-evolution in neural proteins. NAR Genom Bioinform 2021; 3:lqab032. [PMID: 34017944 PMCID: PMC8121095 DOI: 10.1093/nargab/lqab032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/10/2021] [Accepted: 03/31/2021] [Indexed: 12/29/2022] Open
Abstract
Intermolecular co-evolution optimizes physiological performance in functionally related proteins, ultimately increasing molecular co-adaptation and evolutionary fitness. Polyglutamine (polyQ) repeats, which are over-represented in nervous system-related proteins, are increasingly recognized as length-dependent regulators of protein function and interactions, and their length variation contributes to intraspecific phenotypic variability and interspecific divergence. However, it is unclear whether polyQ repeat lengths evolve independently in each protein or rather co-evolve across functionally related protein pairs and networks, as in an integrated regulatory system. To address this issue, we investigated here the length evolution and co-evolution of polyQ repeats in clusters of functionally related and physically interacting neural proteins in Primates. We observed function-/disease-related polyQ repeat enrichment and evolutionary hypervariability in specific neural protein clusters, particularly in the neurocognitive and neuropsychiatric domains. Notably, these analyses detected extensive patterns of intermolecular polyQ length co-evolution in pairs and clusters of functionally related, physically interacting proteins. Moreover, they revealed both direct and inverse polyQ length co-variation in protein pairs, together with complex patterns of coordinated repeat variation in entire polyQ protein sets. These findings uncover a whole system of co-evolving polyQ repeats in neural proteins with direct implications for understanding polyQ-dependent phenotypic variability, neurocognitive evolution and neuropsychiatric disease pathogenesis.
Collapse
Affiliation(s)
- Serena Vaglietti
- Rita Levi Montalcini Department of Neuroscience, University of Torino, Torino 10125, Italy
| | - Ferdinando Fiumara
- Rita Levi Montalcini Department of Neuroscience, University of Torino, Torino 10125, Italy
- National Institute of Neuroscience (INN), University of Torino, Torino 10125, Italy
| |
Collapse
|
8
|
Patil S, Kondabagil K. Coevolutionary and Phylogenetic Analysis of Mimiviral Replication Machinery Suggest the Cellular Origin of Mimiviruses. Mol Biol Evol 2021; 38:2014-2029. [PMID: 33570580 PMCID: PMC8097291 DOI: 10.1093/molbev/msab003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Mimivirus is one of the most complex and largest viruses known. The origin and evolution of Mimivirus and other giant viruses have been a subject of intense study in the last two decades. The two prevailing hypotheses on the origin of Mimivirus and other viruses are the reduction hypothesis, which posits that viruses emerged from modern unicellular organisms; whereas the virus-first hypothesis proposes viruses as relics of precellular forms of life. In this study, to gain insights into the origin of Mimivirus, we have carried out extensive phylogenetic, correlation, and multidimensional scaling analyses of the putative proteins involved in the replication of its 1.2-Mb large genome. Correlation analysis and multidimensional scaling methods were validated using bacteriophage, bacteria, archaea, and eukaryotic replication proteins before applying to Mimivirus. We show that a large fraction of mimiviral replication proteins, including polymerase B, clamp, and clamp loaders are of eukaryotic origin and are coevolving. Although phylogenetic analysis places some components along the lineages of phage and bacteria, we show that all the replication-related genes have been homogenized and are under purifying selection. Collectively our analysis supports the idea that Mimivirus originated from a complex cellular ancestor. We hypothesize that Mimivirus has largely retained complex replication machinery reminiscent of its progenitor while losing most of the other genes related to processes such as metabolism and translation.
Collapse
Affiliation(s)
- Supriya Patil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India
| | - Kiran Kondabagil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, India
| |
Collapse
|
9
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
10
|
Savel D, Koyutürk M. Characterizing human genomic coevolution in locus-gene regulatory interactions. BioData Min 2019; 12:8. [PMID: 30923571 PMCID: PMC6419833 DOI: 10.1186/s13040-019-0195-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/19/2019] [Indexed: 11/10/2022] Open
Abstract
Background Coevolution has been used to identify and predict interactions and functional relationships between proteins of many different organisms including humans. Current efforts in annotating the human genome increasingly show that non-coding DNA sequence has important functional and regulatory interactions. Furthermore, regulatory elements do not necessarily reside in close proximity of the coding region for their target genes. Results We characterize coevolution as it appears in locus-gene interactions in the human genome, focusing on expression Quantitative Trait - Locus (eQTL) interactions. Our results show that in these interactions the conservation status of the loci is predictive of the conservation status of their target genes. Furthermore, comparing the phylogenetic histories of intra-chromosomal pairs of loci and transcription start sites, we find that pairs that appear coevolved are enriched for cis-eQTL interactions. Exploring this property we found that coevolution might be useful in prioritizing association tests in cis-eQTL detection. Conclusions The relationship between the conservation status of pairs of loci and protein coding transcription start sites reveal correlations with regulatory interactions. Pairs that appear coevolved are enriched for intra-chromosomal regulatory interactions, thus our results suggest that measures of coevolution can be useful for prediction and detection of new interactions. Measures of coevolution are genome-wide and could potentially be used to prioritize the detection of distant or inter-chromosomal interactions such as trans-eQTL interactions in the human genome.
Collapse
Affiliation(s)
- Daniel Savel
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| | - Mehmet Koyutürk
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA.,2Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| |
Collapse
|
11
|
Kwon N, Baek K, Kim D, Yun H. Leucine-rich glioma inactivated 3: Integrative analyses reveal its potential prognostic role in cancer. Mol Med Rep 2017; 17:3993-4002. [DOI: 10.3892/mmr.2017.8279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 07/25/2017] [Indexed: 11/06/2022] Open
Affiliation(s)
- Nyoun Kwon
- Department of Biochemistry, Chung‑Ang University, College of Medicine, Seoul 06974, Republic of Korea
| | - Kwang Baek
- Department of Biochemistry, Chung‑Ang University, College of Medicine, Seoul 06974, Republic of Korea
| | - Dong‑Seok Kim
- Department of Biochemistry, Chung‑Ang University, College of Medicine, Seoul 06974, Republic of Korea
| | - Hye‑Young Yun
- Department of Biochemistry, Chung‑Ang University, College of Medicine, Seoul 06974, Republic of Korea
| |
Collapse
|
12
|
Niu Y, Liu C, Moghimyfiroozabad S, Yang Y, Alavian KN. PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages. PeerJ 2017; 5:e3712. [PMID: 28875072 PMCID: PMC5578374 DOI: 10.7717/peerj.3712] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 07/28/2017] [Indexed: 02/05/2023] Open
Abstract
Direct and indirect functional links between proteins as well as their interactions as part of larger protein complexes or common signaling pathways may be predicted by analyzing the correlation of their evolutionary patterns. Based on phylogenetic profiling, here we present a highly scalable and time-efficient computational framework for predicting linkages within the whole human proteome. We have validated this method through analysis of 3,697 human pathways and molecular complexes and a comparison of our results with the prediction outcomes of previously published co-occurrency model-based and normalization methods. Here we also introduce PrePhyloPro, a web-based software that uses our method for accurately predicting proteome-wide linkages. We present data on interactions of human mitochondrial proteins, verifying the performance of this software. PrePhyloPro is freely available at http://prephylopro.org/phyloprofile/.
Collapse
Affiliation(s)
- Yulong Niu
- Department of Medicine, Division of Brain Sciences, Imperial College London, London, United Kingdom.,Key Lab of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.,School of Medicine, Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, United States of America
| | - Chengcheng Liu
- Department of Periodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | | | - Yi Yang
- Key Lab of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Kambiz N Alavian
- Department of Medicine, Division of Brain Sciences, Imperial College London, London, United Kingdom.,School of Medicine, Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, United States of America.,Department of Biology, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran
| |
Collapse
|
13
|
Leung MCK, Procter AC, Goldstone JV, Foox J, DeSalle R, Mattingly CJ, Siddall ME, Timme-Laragy AR. Applying evolutionary genetics to developmental toxicology and risk assessment. Reprod Toxicol 2017; 69:174-186. [PMID: 28267574 PMCID: PMC5829367 DOI: 10.1016/j.reprotox.2017.03.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 02/27/2017] [Accepted: 03/02/2017] [Indexed: 12/26/2022]
Abstract
Evolutionary thinking continues to challenge our views on health and disease. Yet, there is a communication gap between evolutionary biologists and toxicologists in recognizing the connections among developmental pathways, high-throughput screening, and birth defects in humans. To increase our capability in identifying potential developmental toxicants in humans, we propose to apply evolutionary genetics to improve the experimental design and data interpretation with various in vitro and whole-organism models. We review five molecular systems of stress response and update 18 consensual cell-cell signaling pathways that are the hallmark for early development, organogenesis, and differentiation; and revisit the principles of teratology in light of recent advances in high-throughput screening, big data techniques, and systems toxicology. Multiscale systems modeling plays an integral role in the evolutionary approach to cross-species extrapolation. Phylogenetic analysis and comparative bioinformatics are both valuable tools in identifying and validating the molecular initiating events that account for adverse developmental outcomes in humans. The discordance of susceptibility between test species and humans (ontogeny) reflects their differences in evolutionary history (phylogeny). This synthesis not only can lead to novel applications in developmental toxicity and risk assessment, but also can pave the way for applying an evo-devo perspective to the study of developmental origins of health and disease.
Collapse
Affiliation(s)
- Maxwell C K Leung
- Nicholas School of the Environment, Duke University, Durham, NC, United States.
| | - Andrew C Procter
- Institute for Advanced Analytics, North Carolina State University, Raleigh, NC, United States
| | - Jared V Goldstone
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, United States
| | - Jonathan Foox
- Department of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States
| | - Robert DeSalle
- Department of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States
| | - Carolyn J Mattingly
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, United States
| | - Mark E Siddall
- Department of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States
| | - Alicia R Timme-Laragy
- Department of Environmental Health Sciences, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
14
|
Fares MA. Coevolution Analysis Illuminates the Evolutionary Plasticity of the Chaperonin System GroES/L. STRESS AND ENVIRONMENTAL REGULATION OF GENE EXPRESSION AND ADAPTATION IN BACTERIA 2016:796-811. [DOI: 10.1002/9781119004813.ch77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
15
|
Zhang Q, Nogales-Cadenas R, Lin JR, Zhang W, Cai Y, Vijg J, Zhang ZD. Systems-level analysis of human aging genes shed new light on mechanisms of aging. Hum Mol Genet 2016; 25:2934-2947. [PMID: 27179790 DOI: 10.1093/hmg/ddw145] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 04/07/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open
Abstract
Although studies over the last decades have firmly connected a number of genes and molecular pathways to aging, the aging process as a whole still remains poorly understood. To gain novel insights into the mechanisms underlying aging, instead of considering aging genes individually, we studied their characteristics at the systems level in the context of biological networks. We calculated a comprehensive set of network characteristics for human aging-related genes from the GenAge database. By comparing them with other functional groups of genes, we identified a robust group of aging-specific network characteristics. To find the structural basis and the molecular mechanisms underlying this aging-related network specificity, we also analyzed protein domain interactions and gene expression patterns across different tissues. Our study revealed that aging genes not only tend to be network hubs, playing important roles in communication among different functional modules or pathways, but also are more likely to physically interact and be co-expressed with essential genes. The high expression of aging genes across a large number of tissue types also points to a high level of connectivity among aging genes. Unexpectedly, contrary to the depletion of interactions among hub genes in biological networks, we observed close interactions among aging hubs, which renders the aging subnetworks vulnerable to random attacks and thus may contribute to the aging process. Comparison across species reveals the evolution process of the aging subnetwork. As the organisms become more complex, the complexity of its aging mechanisms increases and their aging hub genes are more functionally connected.
Collapse
Affiliation(s)
| | | | | | | | | | - Jan Vijg
- Department of Genetics.,Department of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, Bronx, NY, USA
| | | |
Collapse
|
16
|
Tine M, Kuhl H, Teske PR, Tschöp MH, Jastroch M. Diversification and coevolution of the ghrelin/growth hormone secretagogue receptor system in vertebrates. Ecol Evol 2016; 6:2516-35. [PMID: 27066235 PMCID: PMC4797157 DOI: 10.1002/ece3.2057] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2015] [Revised: 02/08/2016] [Accepted: 02/09/2016] [Indexed: 12/13/2022] Open
Abstract
The gut hormone ghrelin is involved in numerous metabolic functions, such as the stimulation of growth hormone secretion, gastric motility, and food intake. Ghrelin is modified by ghrelin O-acyltransferase (GOAT) or membrane-bound O-acyltransferase domain-containing 4 (MBOAT4) enabling action through the growth hormone secretagogue receptors (GHS-R). During the course of evolution, initially strong ligand/receptor specificities can be disrupted by genomic changes, potentially modifying physiological roles of the ligand/receptor system. Here, we investigated the coevolution of ghrelin, GOAT, and GHS-R in vertebrates. We combined similarity search, conserved synteny analyses, phylogenetic reconstructions, and protein structure comparisons to reconstruct the evolutionary history of the ghrelin system. Ghrelin remained a single-gene locus in all vertebrate species, and accordingly, a single GHS-R isoform was identified in all tetrapods. Similar patterns of the nonsynonymous (dN) and synonymous (dS) ratio (dN/dS) in the vertebrate lineage strongly suggest coevolution of the ghrelin and GHS-R genes, supporting specific functional interactions and common physiological pathways. The selection profiles do not allow confirmation as to whether ghrelin binds specifically to GOAT, but the ghrelin dN/dS patterns are more similar to those of GOAT compared to MBOAT1 and MBOAT2 isoforms. Four GHS-R isoforms were identified in teleost genomes. This diversification of GHS-R resulted from successive rounds of duplications, some of which remained specific to the teleost lineage. Coevolution signals are lost in teleosts, presumably due to the diversification of GHS-R but not the ghrelin gene. The identification of the GHS-R diversity in teleosts provides a molecular basis for comparative studies on ghrelin's physiological roles and regulation, while the comparative sequence and structure analyses will assist translational medicine to determine structure-function relationships of the ghrelin/GHS-R system.
Collapse
Affiliation(s)
- Mbaye Tine
- Genome Centre at Max Planck Institute for Plant Breeding Research Carl-von-Linné-Weg 10D-50829 Köln Germany; Molecular Zoology Laboratory Department of Zoology University of Johannesburg Kingsway Campus Auckland Park 2006 South Africa
| | - Heiner Kuhl
- Max Planck Institute for Molecular Genetics Ihnestrasse 63-73 14195 Berlin Germany
| | - Peter R Teske
- Molecular Zoology Laboratory Department of Zoology University of Johannesburg Kingsway Campus Auckland Park 2006 South Africa
| | - Matthias H Tschöp
- Helmholtz Diabetes Center & German Diabetes Center (DZD) Helmholtz Zentrum München, 85764 Neuherberg, Germany; Division of Metabolic Diseases Technische Universität München 80333 Munich Germany
| | - Martin Jastroch
- Helmholtz Diabetes Center & German Diabetes Center (DZD) Helmholtz Zentrum München, 85764 Neuherberg, Germany; Division of Metabolic Diseases Technische Universität München 80333 Munich Germany
| |
Collapse
|
17
|
Quantitative and Systems-Based Approaches for Deciphering Bacterial Membrane Interactome and Gene Function. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 883:135-54. [PMID: 26621466 DOI: 10.1007/978-3-319-23603-2_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
High-throughput genomic and proteomic methods provide a concise description of the molecular constituents of a cell, whereas systems biology strives to understand the way these components function as a whole. Recent developments, such as genome editing technologies and protein epitope-tagging coupled with high-sensitivity mass-spectrometry, allow systemic studies to be performed at an unprecedented scale. Available methods can be successfully applied to various goals, both expanding fundamental knowledge and solving applied problems. In this review, we discuss the present state and future of bacterial cell envelope interactomics, with a specific focus on host-pathogen interactions and drug target discovery. Both experimental and computational methods will be outlined together with examples of their practical implementation.
Collapse
|
18
|
Qian W, Zhou H, Tang K. Recent coselection in human populations revealed by protein-protein interaction network. Genome Biol Evol 2014; 7:136-53. [PMID: 25532814 PMCID: PMC4316623 DOI: 10.1093/gbe/evu270] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Genome-wide scans for signals of natural selection in human populations have identified a large number of candidate loci that underlie local adaptations. This is surprising given the relatively short evolutionary time since the divergence of the human population. One hypothesis that has not been formally examined is whether and how the recent human evolution may have been shaped by coselection in the context of complex molecular interactome. In this study, genome-wide signals of selection were scanned in East Asians, Europeans, and Africans using 1000 Genome data, and subsequently mapped onto the protein-protein interaction (PPI) network. We found that the candidate genes of recent positive selection localized significantly closer to each other on the PPI network than expected, revealing substantial clustering of selected genes. Furthermore, gene pairs of shorter PPI network distances showed higher similarities of their recent evolutionary paths than those further apart. Last, subnetworks enriched with recent coselection signals were identified, which are substantially overrepresented in biological pathways related to signal transduction, neurogenesis, and immune function. These results provide the first genome-wide evidence for association of recent selection signals with the PPI network, shedding light on the potential mechanisms of recent coselection in the human genome.
Collapse
Affiliation(s)
- Wei Qian
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hang Zhou
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kun Tang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
19
|
Cheng F, Jia P, Wang Q, Lin CC, Li WH, Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol Biol Evol 2014; 31:2156-69. [PMID: 24881052 DOI: 10.1093/molbev/msu167] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Quan Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of ChicagoBiodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of MedicineDepartment of Cancer Biology, Vanderbilt University School of MedicineDepartment of Psychiatry, Vanderbilt University School of MedicineCenter for Quantitative Sciences, Vanderbilt University Medical Center
| |
Collapse
|
20
|
Ochoa D, Pazos F. Practical aspects of protein co-evolution. Front Cell Dev Biol 2014; 2:14. [PMID: 25364721 PMCID: PMC4207036 DOI: 10.3389/fcell.2014.00014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 04/02/2014] [Indexed: 11/15/2022] Open
Abstract
Co-evolution is a fundamental aspect of Evolutionary Theory. At the molecular level, co-evolutionary linkages between protein families have been used as indicators of protein interactions and functional relationships from long ago. Due to the complexity of the problem and the amount of genomic data required for these approaches to achieve good performances, it took a relatively long time from the appearance of the first ideas and concepts to the quotidian application of these approaches and their incorporation to the standard toolboxes of bioinformaticians and molecular biologists. Today, these methodologies are mature (both in terms of performance and usability/implementation), and the genomic information that feeds them large enough to allow their general application. This review tries to summarize the current landscape of co-evolution-based methodologies, with a strong emphasis on describing interesting cases where their application to important biological systems, alone or in combination with other computational and experimental approaches, allowed getting new insight into these.
Collapse
Affiliation(s)
- David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Hinxton, UK
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC) Madrid, Spain
| |
Collapse
|
21
|
El-Kebir M, Marschall T, Wohlers I, Patterson M, Heringa J, Schönhuth A, Klau GW. Mapping proteins in the presence of paralogs using units of coevolution. BMC Bioinformatics 2014; 14 Suppl 15:S18. [PMID: 24564758 PMCID: PMC3852051 DOI: 10.1186/1471-2105-14-s15-s18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible.
Collapse
|
22
|
Desalle R, Chicote JU, Sun TT, Garcia-España A. Generation of divergent uroplakin tetraspanins and their partners during vertebrate evolution: identification of novel uroplakins. BMC Evol Biol 2014; 14:13. [PMID: 24450554 PMCID: PMC3922775 DOI: 10.1186/1471-2148-14-13] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 01/02/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The recent availability of sequenced genomes from a broad array of chordates (cephalochordates, urochordates and vertebrates) has allowed us to systematically analyze the evolution of uroplakins: tetraspanins (UPK1a and UPK1b families) and their respective partner proteins (UPK2 and UPK3 families). RESULTS We report here: (1) the origin of uroplakins in the common ancestor of vertebrates, (2) the appearance of several residues that have statistically significantly positive dN/dS ratios in the duplicated paralogs of uroplakin genes, and (3) the existence of strong coevolutionary relationships between UPK1a/1b tetraspanins and their respective UPK2/UPK3-related partner proteins. Moreover, we report the existence of three new UPK2/3 family members we named UPK2b, 3c and 3d, which will help clarify the evolutionary relationships between fish, amphibian and mammalian uroplakins that may perform divergent functions specific to these different and physiologically distinct groups of vertebrates. CONCLUSIONS Since our analyses cover species of all major chordate groups this work provides an extremely clear overall picture of how the uroplakin families and their partner proteins have evolved in parallel. We also highlight several novel features of uroplakin evolution including the appearance of UPK2b and 3d in fish and UPK3c in the common ancestor of reptiles and mammals. Additional studies of these novel uroplakins should lead to new insights into uroplakin structure and function.
Collapse
Affiliation(s)
- Rob Desalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA.
| | | | | | | |
Collapse
|
23
|
Atanur S, Diaz A, Maratou K, Sarkis A, Rotival M, Game L, Tschannen M, Kaisaki P, Otto G, Ma M, Keane T, Hummel O, Saar K, Chen W, Guryev V, Gopalakrishnan K, Garrett M, Joe B, Citterio L, Bianchi G, McBride M, Dominiczak A, Adams D, Serikawa T, Flicek P, Cuppen E, Hubner N, Petretto E, Gauguier D, Kwitek A, Jacob H, Aitman T. Genome sequencing reveals loci under artificial selection that underlie disease phenotypes in the laboratory rat. Cell 2013; 154:691-703. [PMID: 23890820 PMCID: PMC3732391 DOI: 10.1016/j.cell.2013.06.040] [Citation(s) in RCA: 115] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 04/30/2013] [Accepted: 06/21/2013] [Indexed: 12/24/2022]
Abstract
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models.
Collapse
Affiliation(s)
- Santosh S. Atanur
- Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK
| | - Ana Garcia Diaz
- Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Klio Maratou
- Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Allison Sarkis
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Maxime Rotival
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Laurence Game
- Genomics Core Laboratory, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Michael R. Tschannen
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Pamela J. Kaisaki
- The Welcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Georg W. Otto
- The Welcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Man Chun John Ma
- Department of Pharmacology, University of Iowa, Iowa City, IA 52242, USA
| | - Thomas M. Keane
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Oliver Hummel
- Max Delbruck Center for Molecular Medicine, Berlin 13092, Germany
| | - Kathrin Saar
- Max Delbruck Center for Molecular Medicine, Berlin 13092, Germany
| | - Wei Chen
- Max Delbruck Center for Molecular Medicine, Berlin 13092, Germany
| | - Victor Guryev
- Hubrecht Institute KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584 Utrecht, the Netherlands
- European Research Institute for the Biology of Ageing, University Medical Center, 9700 AD Groningen, the Netherlands
| | - Kathirvel Gopalakrishnan
- Center for Hypertension and Personalized Medicine, Department of Physiology and Pharmacology, University of Toledo College of Medicine, Toledo, OH 43606-3390, USA
| | - Michael R. Garrett
- Department of Pharmacology and Toxicology, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Bina Joe
- Center for Hypertension and Personalized Medicine, Department of Physiology and Pharmacology, University of Toledo College of Medicine, Toledo, OH 43606-3390, USA
| | - Lorena Citterio
- San Raffaele Scientific Institute, OU Nephrology, University Vita Salute San Raffaele, Chair of Nephrology, 58, 20132 Milan, Italy
| | - Giuseppe Bianchi
- San Raffaele Scientific Institute, OU Nephrology, University Vita Salute San Raffaele, Chair of Nephrology, 58, 20132 Milan, Italy
| | - Martin McBride
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Anna Dominiczak
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - David J. Adams
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Tadao Serikawa
- Institute of Laboratory Animals, Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Edwin Cuppen
- Hubrecht Institute KNAW and University Medical Center Utrecht, Uppsalalaan 8, 3584 Utrecht, the Netherlands
| | - Norbert Hubner
- Max Delbruck Center for Molecular Medicine, Berlin 13092, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin 13092, Germany
| | - Enrico Petretto
- Integrative Genomics and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Dominique Gauguier
- The Welcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- INSERM UMR-S872, Cordeliers Research Centre, 75006 Paris, France
| | - Anne Kwitek
- Department of Pharmacology, University of Iowa, Iowa City, IA 52242, USA
| | - Howard Jacob
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Timothy J. Aitman
- Physiological Genomic and Medicine Group, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| |
Collapse
|
24
|
Ruiz-González MX, Fares MA. Coevolution analyses illuminate the dependencies between amino acid sites in the chaperonin system GroES-L. BMC Evol Biol 2013; 13:156. [PMID: 23875653 PMCID: PMC3728108 DOI: 10.1186/1471-2148-13-156] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 07/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND GroESL is a heat-shock protein ubiquitous in bacteria and eukaryotic organelles. This evolutionarily conserved protein is involved in the folding of a wide variety of other proteins in the cytosol, being essential to the cell. The folding activity proceeds through strong conformational changes mediated by the co-chaperonin GroES and ATP. Functions alternative to folding have been previously described for GroEL in different bacterial groups, supporting enormous functional and structural plasticity for this molecule and the existence of a hidden combinatorial code in the protein sequence enabling such functions. Describing this plasticity can shed light on the functional diversity of GroEL. We hypothesize that different overlapping sets of amino acids coevolve within GroEL, GroES and between both these proteins. Shifts in these coevolutionary relationships may inevitably lead to evolution of alternative functions. RESULTS We conducted the first coevolution analyses in an extensive bacterial phylogeny, revealing complex networks of evolutionary dependencies between residues in GroESL. These networks differed among bacterial groups and involved amino acid sites with functional importance and others with previously unsuspected functional potential. Coevolutionary networks formed statistically independent units among bacterial groups and map to structurally continuous regions in the protein, suggesting their functional link. Sites involved in coevolution fell within narrow structural regions, supporting dynamic combinatorial functional links involving similar protein domains. Moreover, coevolving sites within a bacterial group mapped to regions previously identified as involved in folding-unrelated functions, and thus, coevolution may mediate alternative functions. CONCLUSIONS Our results highlight the evolutionary plasticity of GroEL across the entire bacterial phylogeny. Evidence on the functional importance of coevolving sites illuminates the as yet unappreciated functional diversity of proteins.
Collapse
Affiliation(s)
- Mario X Ruiz-González
- Integrative and Systems Biology Group, Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas (CSIC-UPV), Valencia, SPAIN
| | | |
Collapse
|
25
|
Abstract
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Collapse
Affiliation(s)
- David de Juan
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | |
Collapse
|
26
|
Tiwary BK. Correlated evolution of gonadotropin-releasing hormone and gonadotropin-inhibitory hormone and their receptors in mammals. Neuroendocrinology 2013; 97:242-51. [PMID: 22948085 DOI: 10.1159/000342694] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Accepted: 08/09/2012] [Indexed: 11/19/2022]
Abstract
BACKGROUND Evolutionary rate variation in genes (proteins) is manifested both within the species (genome) and between the species (genomes). However, the interdependent components of a biological system in form of a gene or a protein are expected to evolve in a correlated manner under a common functional constraint. METHODS The phylogenetic analysis and correlation analysis of gonadotropin-releasing hormone (GnRH) and gonadotropin-inhibitory hormone (GnIH) and their receptors (GnRHR and GnIHR) was conducted along with other control neuropeptides. RESULTS Both neuropeptides and their receptors regulating the reproductive neuroendocrine axis in vertebrates exhibited a correlated evolution under a common physiological constraint to regulate the release of gonadotropin. This result supports a coordinated substitution of amino acids in the GnRH and the GnIH neuropeptides along with their receptors in terms of similar evolutionary rates and distances with similar nucleotide composition of genes. CONCLUSION This is the first demonstration of the correlated evolution of two components of an endocrine system regulating the release of gonadotropin which are acting in concert for successful reproduction.
Collapse
Affiliation(s)
- Basant K Tiwary
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India.
| |
Collapse
|
27
|
Abstract
Proteins do not function in isolation; it is their interactions with one another and also with other molecules (e.g. DNA, RNA) that mediate metabolic and signaling pathways, cellular processes, and organismal systems. Due to their central role in biological function, protein interactions also control the mechanisms leading to healthy and diseased states in organisms. Diseases are often caused by mutations affecting the binding interface or leading to biochemically dysfunctional allosteric changes in proteins. Therefore, protein interaction networks can elucidate the molecular basis of disease, which in turn can inform methods for prevention, diagnosis, and treatment. In this chapter, we will describe the computational approaches to predict and map networks of protein interactions and briefly review the experimental methods to detect protein interactions. We will describe the application of protein interaction networks as a translational approach to the study of human disease and evaluate the challenges faced by these approaches.
Collapse
Affiliation(s)
- Mileidy W. Gonzalez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Maricel G. Kann
- Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
28
|
Swapna LS, Srinivasan N, Robertson DL, Lovell SC. The origins of the evolutionary signal used to predict protein-protein interactions. BMC Evol Biol 2012; 12:238. [PMID: 23217198 PMCID: PMC3537733 DOI: 10.1186/1471-2148-12-238] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 11/17/2012] [Indexed: 12/02/2022] Open
Abstract
Background The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis. Results In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence. Conclusions Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Collapse
|
29
|
Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang PI, Boutz DR, Fong V, Phanse S, Babu M, Craig SA, Hu P, Wan C, Vlasblom J, Dar VUN, Bezginov A, Clark GW, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A. A census of human soluble protein complexes. Cell 2012; 150:1068-81. [PMID: 22939629 DOI: 10.1016/j.cell.2012.08.011] [Citation(s) in RCA: 655] [Impact Index Per Article: 50.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2012] [Revised: 07/30/2012] [Accepted: 08/10/2012] [Indexed: 12/19/2022]
Abstract
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.
Collapse
Affiliation(s)
- Pierre C Havugimana
- Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Ochoa D, García-Gutiérrez P, Juan D, Valencia A, Pazos F. Incorporating information on predicted solvent accessibility to the co-evolution-based study of protein interactions. MOLECULAR BIOSYSTEMS 2012; 9:70-6. [PMID: 23104128 DOI: 10.1039/c2mb25325a] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.
Collapse
Affiliation(s)
- David Ochoa
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain
| | | | | | | | | |
Collapse
|
31
|
Bezginov A, Clark GW, Charlebois RL, Dar VUN, Tillier ERM. Coevolution reveals a network of human proteins originating with multicellularity. Mol Biol Evol 2012; 30:332-46. [PMID: 22977115 PMCID: PMC3548307 DOI: 10.1093/molbev/mss218] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Protein interaction networks play central roles in biological systems, from simple metabolic pathways through complex programs permitting the development of organisms. Multicellularity could only have arisen from a careful orchestration of cellular and molecular roles and responsibilities, all properly controlled and regulated. Disease reflects a breakdown of this organismal homeostasis. To better understand the evolution of interactions whose dysfunction may be contributing factors to disease, we derived the human protein coevolution network using our MatrixMatchMaker algorithm and using the Orthologous MAtrix project (OMA) database as a source for protein orthologs from 103 eukaryotic genomes. We annotated the coevolution network using protein–protein interaction data, many functional data sources, and we explored the evolutionary rates and dates of emergence of the proteins in our data set. Strikingly, clustering based only on the topology of the coevolution network partitions it into two subnetworks, one generally representing ancient eukaryotic functions and the other functions more recently acquired during animal evolution. That latter subnetwork is enriched for proteins with roles in cell–cell communication, the control of cell division, and related multicellular functions. Further annotation using data from genetic disease databases and cancer genome sequences strongly implicates these proteins in both ciliopathies and cancer. The enrichment for such disease markers in the animal network suggests a functional link between these coevolving proteins. Genetic validation corroborates the recruitment of ancient cilia in the evolution of multicellularity.
Collapse
Affiliation(s)
- Alexandr Bezginov
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
32
|
Wang MC, Chen FC, Chen YZ, Huang YT, Chuang TJ. LDGIdb: a database of gene interactions inferred from long-range strong linkage disequilibrium between pairs of SNPs. BMC Res Notes 2012; 5:212. [PMID: 22551073 PMCID: PMC3441865 DOI: 10.1186/1756-0500-5-212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 04/26/2012] [Indexed: 12/22/2022] Open
Abstract
Background Complex human diseases may be associated with many gene interactions. Gene interactions take several different forms and it is difficult to identify all of the interactions that are potentially associated with human diseases. One approach that may fill this knowledge gap is to infer previously unknown gene interactions via identification of non-physical linkages between different mutations (or single nucleotide polymorphisms, SNPs) to avoid hitchhiking effect or lack of recombination. Strong non-physical SNP linkages are considered to be an indication of biological (gene) interactions. These interactions can be physical protein interactions, regulatory interactions, functional compensation/antagonization or many other forms of interactions. Previous studies have shown that mutations in different genes can be linked to the same disorders. Therefore, non-physical SNP linkages, coupled with knowledge of SNP-disease associations may shed more light on the role of gene interactions in human disorders. A user-friendly web resource that integrates information about non-physical SNP linkages, gene annotations, SNP information, and SNP-disease associations may thus be a good reference for biomedical research. Findings Here we extracted the SNPs located within the promoter or exonic regions of protein-coding genes from the HapMap database to construct a database named the Linkage-Disequilibrium-based Gene Interaction database (LDGIdb). The database stores 646,203 potential human gene interactions, which are potential interactions inferred from SNP pairs that are subject to long-range strong linkage disequilibrium (LD), or non-physical linkages. To minimize the possibility of hitchhiking, SNP pairs inferred to be non-physically linked were required to be located in different chromosomes or in different LD blocks of the same chromosomes. According to the genomic locations of the involved SNPs (i.e., promoter, untranslated region (UTR) and coding region (CDS)), the SNP linkages inferred were categorized into promoter-promoter, promoter-UTR, promoter-CDS, CDS-CDS, CDS-UTR and UTR-UTR linkages. For the CDS-related linkages, the coding SNPs were further classified into nonsynonymous and synonymous variations, which represent potential gene interactions at the protein and RNA level, respectively. The LDGIdb also incorporates human disease-association databases such as Genome-Wide Association Studies (GWAS) and Online Mendelian Inheritance in Man (OMIM), so that the user can search for potential disease-associated SNP linkages. The inferred SNP linkages are also classified in the context of population stratification to provide a resource for investigating potential population-specific gene interactions. Conclusion The LDGIdb is a user-friendly resource that integrates non-physical SNP linkages and SNP-disease associations for studies of gene interactions in human diseases. With the help of the LDGIdb, it is plausible to infer population-specific SNP linkages for more focused studies, an avenue that is potentially important for pharmacogenetics. Moreover, by referring to disease-association information such as the GWAS data, the LDGIdb may help identify previously uncharacterized disease-associated gene interactions and potentially lead to new discoveries in studies of human diseases. Keywords Gene interaction, SNP, Linkage disequilibrium, Systems biology, Bioinformatics
Collapse
Affiliation(s)
- Ming-Chih Wang
- Genomics Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | | | | | | | | |
Collapse
|
33
|
Raj T, Shulman JM, Keenan BT, Chibnik LB, Evans DA, Bennett DA, Stranger BE, De Jager PL. Alzheimer disease susceptibility loci: evidence for a protein network under natural selection. Am J Hum Genet 2012; 90:720-6. [PMID: 22482808 DOI: 10.1016/j.ajhg.2012.02.022] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Revised: 02/03/2012] [Accepted: 02/22/2012] [Indexed: 11/30/2022] Open
Abstract
Recent genome-wide association studies have identified a number of susceptibility loci for Alzheimer disease (AD). To understand the functional consequences and potential interactions of the associated loci, we explored large-scale data sets interrogating the human genome for evidence of positive natural selection. Our findings provide significant evidence for signatures of recent positive selection acting on several haplotypes carrying AD susceptibility alleles; interestingly, the genes found in these selected haplotypes can be assembled, independently, into a molecular complex via a protein-protein interaction (PPI) network approach. These results suggest a possible coevolution of genes encoding physically-interacting proteins that underlie AD susceptibility and are coexpressed in different tissues. In particular, PICALM, BIN1, CD2AP, and EPHA1 are interconnected through multiple interacting proteins and appear to have coordinated evidence of selection in the same human population, suggesting that they may be involved in the execution of a shared molecular function. This observation may be AD-specific, as the 12 loci associated with Parkinson disease do not demonstrate excess evidence of natural selection. The context for selection is probably unrelated to AD itself; it is likely that these genes interact in another context, such as in immune cells, where we observe cis-regulatory effects at several of the selected AD loci.
Collapse
Affiliation(s)
- Towfique Raj
- Program in Translational NeuroPsychiatric Genomics, Institute for the Neurosciences Department of Neurology, Brigham and Women's Hospital, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Hajirasouliha I, Schönhuth A, de Juan D, Valencia A, Sahinalp SC. Mirroring co-evolving trees in the light of their topologies. Bioinformatics 2012; 28:1202-8. [PMID: 22399677 DOI: 10.1093/bioinformatics/bts109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. RESULTS Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. AVAILABILITY A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
Collapse
Affiliation(s)
- Iman Hajirasouliha
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
| | | | | | | | | |
Collapse
|
35
|
Fares MA, Ruiz-González MX, Labrador JP. Protein coadaptation and the design of novel approaches to identify protein-protein interactions. IUBMB Life 2011; 63:264-71. [PMID: 21488148 DOI: 10.1002/iub.455] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Proteins rarely function in isolation but they form part of complex networks of interactions with other proteins within or among cells. The importance of a particular protein for cell viability is directly dependent upon the number of interactions where it participates and the function it performs: the larger the number of interactions of a protein the greater its functional importance is for the cell. With the advent of genome sequencing and "omics" technologies it became feasible conducting large-scale searches for protein interacting partners. Unfortunately, the accuracy of such analyses has been underwhelming owing to methodological limitations and to the inherent complexity of protein interactions. In addition to these experimental approaches, many computational methods have been developed to identify protein-protein interactions by assuming that interacting proteins coevolve resulting from the coadaptation dynamics between the amino acids of their interacting faces. We review the main technological advances made in the field of interactomics and discuss the feasibility of computational methods to identify protein-protein interactions based on the estimation of coevolution. As proof-of-concept, we present a classical case study: the interactions of cell surface proteins (receptors) and their ligands. Finally, we take this discussion one step forward to include interactions between organisms and species to understand the generation of biological complexity. Development of technologies for accurate detection of protein-protein interactions may shed light on processes that go from the fine-tuning of pathways and metabolic networks to the emergence of biological complexity.
Collapse
Affiliation(s)
- Mario A Fares
- Department of Abiotic Stress, Group of Integrative and Systems Biology, Instituto de Biología Molecular y Celular de Plantas (CSIC-Universidad Politécnica de Valencia), Valencia, Spain.
| | | | | |
Collapse
|
36
|
Rodionov A, Bezginov A, Rose J, Tillier ERM. A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms Mol Biol 2011; 6:17. [PMID: 21672226 PMCID: PMC3130660 DOI: 10.1186/1748-7188-6-17] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2010] [Accepted: 06/14/2011] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time. RESULTS In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM. CONCLUSIONS MMMvII will thus allow for more more extensive and intricate analyses of coevolution. AVAILABILITY An implementation of the MMMvII algorithm is available at: http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php.
Collapse
Affiliation(s)
- Alex Rodionov
- The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
| | - Alexandr Bezginov
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Jonathan Rose
- The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
| | - Elisabeth RM Tillier
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Ontario Cancer Institute, University Health Network, 101 College Street., Toronto, M5G 1L7, Canada
| |
Collapse
|
37
|
Wang GZ, Lercher MJ. The effects of network neighbours on protein evolution. PLoS One 2011; 6:e18288. [PMID: 21532755 PMCID: PMC3075247 DOI: 10.1371/journal.pone.0018288] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 03/02/2011] [Indexed: 11/19/2022] Open
Abstract
Interacting proteins may often experience similar selection pressures. Thus, we may expect that neighbouring proteins in biological interaction networks evolve at similar rates. This has been previously shown for protein-protein interaction networks. Similarly, we find correlated rates of evolution of neighbours in networks based on co-expression, metabolism, and synthetic lethal genetic interactions. While the correlations are statistically significant, their magnitude is small, with network effects explaining only between 2% and 7% of the variation. The strongest known predictor of the rate of protein evolution remains expression level. We confirmed the previous observation that similar expression levels of neighbours indeed explain their similar evolution rates in protein-protein networks, and showed that the same is true for metabolic networks. In co-expression and synthetic lethal genetic interaction networks, however, neighbouring genes still show somewhat similar evolutionary rates even after simultaneously controlling for expression level, gene essentiality and gene length. Thus, similar expression levels and related functions (as inferred from co-expression and synthetic lethal interactions) seem to explain correlated evolutionary rates of network neighbours across all currently available types of biological networks.
Collapse
Affiliation(s)
| | - Martin J. Lercher
- Institute for Computer Science, Heinrich-Heine-University, Düsseldorf, Germany
- * E-mail:
| |
Collapse
|
38
|
Abstract
Bioinformatic methods to predict protein-protein interactions (PPI) via coevolutionary analysis have -positioned themselves to compete alongside established in vitro methods, despite a lack of understanding for the underlying molecular mechanisms of the coevolutionary process. Investigating the alignment of coevolutionary predictions of PPI with experimental data can focus the effective scope of prediction and lead to better accuracies. A new rate-based coevolutionary method, MMM, preferentially finds obligate interacting proteins that form complexes, conforming to results from studies based on coimmunoprecipitation coupled with mass spectrometry. Using gold-standard databases as a benchmark for accuracy, MMM surpasses methods based on abundance ratios, suggesting that correlated evolutionary rates may yet be better than coexpression at predicting interacting proteins. At the level of protein domains, -coevolution is difficult to detect, even with MMM, except when considering small-scale experimental data involving proteins with multiple domains. Overall, these findings confirm that coevolutionary -methods can be confidently used in predicting PPI, either independently or as drivers of coimmunoprecipitation experiments.
Collapse
|
39
|
Liang Z, Xu M, Teng M, Niu L, Wu J. Coevolution is a short-distance force at the protein interaction level and correlates with the modular organization of protein networks. FEBS Lett 2010; 584:4237-40. [DOI: 10.1016/j.febslet.2010.09.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2010] [Revised: 09/04/2010] [Accepted: 09/08/2010] [Indexed: 11/17/2022]
|