1
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
2
|
Little J, Chikina M, Clark NL. Evolutionary rate covariation is a reliable predictor of co-functional interactions but not necessarily physical interactions. eLife 2024; 12:RP93333. [PMID: 38415754 PMCID: PMC10942632 DOI: 10.7554/elife.93333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
Co-functional proteins tend to have rates of evolution that covary over time. This correlation between evolutionary rates can be measured over the branches of a phylogenetic tree through methods such as evolutionary rate covariation (ERC), and then used to construct gene networks by the identification of proteins with functional interactions. The cause of this correlation has been hypothesized to result from both compensatory coevolution at physical interfaces and nonphysical forces such as shared changes in selective pressure. This study explores whether coevolution due to compensatory mutations has a measurable effect on the ERC signal. We examined the difference in ERC signal between physically interacting protein domains within complexes compared to domains of the same proteins that do not physically interact. We found no generalizable relationship between physical interaction and high ERC, although a few complexes ranked physical interactions higher than nonphysical interactions. Therefore, we conclude that coevolution due to physical interaction is weak, but present in the signal captured by ERC, and we hypothesize that the stronger signal instead comes from selective pressures on the protein as a whole and maintenance of the general function.
Collapse
Affiliation(s)
- Jordan Little
- Department of Human Genetics, University of UtahSalt Lake CityUnited States
| | - Maria Chikina
- Department of Computational Biology, University of PittsburghPittsburghUnited States
| | - Nathan L Clark
- Department of Human Genetics, University of UtahSalt Lake CityUnited States
- Department of Biological Sciences, University of PittsburghPittsburghUnited States
| |
Collapse
|
3
|
Nithya C, Kiran M, Nagarajaram HA. Hubs and Bottlenecks in Protein-Protein Interaction Networks. Methods Mol Biol 2024; 2719:227-248. [PMID: 37803121 DOI: 10.1007/978-1-0716-3461-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Protein-protein interaction networks (PPINs) represent the physical interactions among proteins in a cell. These interactions are critical in all cellular processes, including signal transduction, metabolic regulation, and gene expression. In PPINs, centrality measures are widely used to identify the most critical nodes. The two most commonly used centrality measures in networks are degree and betweenness centralities. Degree centrality is the number of connections a node has in the network, and betweenness centrality is the measure of the extent to which a node lies on the shortest paths between pairs of other nodes in the network. In PPINs, proteins with high degree and betweenness centrality are referred to as hubs and bottlenecks respectively. Hubs and bottlenecks are topologically and functionally essential proteins that play crucial roles in maintaining the network's structure and function. This article comprehensively reviews essential literature on hubs and bottlenecks, including their properties and functions.
Collapse
Affiliation(s)
- Chandramohan Nithya
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | | |
Collapse
|
4
|
Doran BA, Chen RY, Giba H, Behera V, Barat B, Sundararajan A, Lin H, Sidebottom A, Pamer EG, Raman AS. An evolution-based framework for describing human gut bacteria. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.04.569969. [PMID: 38105970 PMCID: PMC10723311 DOI: 10.1101/2023.12.04.569969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The human gut microbiome contains many bacterial strains of the same species ('strain-level variants'). Describing strains in a biologically meaningful way rather than purely taxonomically is an important goal but challenging due to the genetic complexity of strain-level variation. Here, we measured patterns of co-evolution across >7,000 strains spanning the bacterial tree-of-life. Using these patterns as a prior for studying hundreds of gut commensal strains that we isolated, sequenced, and metabolically profiled revealed widespread structure beneath the phylogenetic level of species. Defining strains by their co-evolutionary signatures enabled predicting their metabolic phenotypes and engineering consortia from strain genome content alone. Our findings demonstrate a biologically relevant organization to strain-level variation and motivate a new schema for describing bacterial strains based on their evolutionary history.
Collapse
Affiliation(s)
- Benjamin A. Doran
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, 60637
| | - Robert Y. Chen
- Department of Psychiatry, University of Washington, Seattle, WA, 98195
| | - Hannah Giba
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Pathology, University of Chicago, Chicago, IL, 60637
| | - Vivek Behera
- Department of Medicine, University of Chicago, Chicago, IL, 60637
| | - Bidisha Barat
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | | | - Huaiying Lin
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | - Ashley Sidebottom
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
| | - Eric G. Pamer
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Medicine, University of Chicago, Chicago, IL, 60637
| | - Arjun S. Raman
- Duchossois Family Institute, University of Chicago, Chicago, IL, 60637
- Department of Pathology, University of Chicago, Chicago, IL, 60637
- Center for the Physics of Evolving Systems, University of Chicago, Chicago, IL, 60637
| |
Collapse
|
5
|
Santos TG, Silva KS, Lima RM, Silva LC, Pereira M. State of the art in protein-protein interactions within the fungi kingdom. Future Microbiol 2023; 18:1119-1131. [PMID: 37540069 DOI: 10.2217/fmb-2022-0274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023] Open
Abstract
Proteins rarely exert their function by themselves. Protein-protein interactions (PPIs) regulate virtually every biological process that takes place in a cell. Such interactions are targets for new therapeutic agents against all sorts of diseases, through the screening and design of a variety of inhibitors. Here we discuss several aspects of PPIs that contribute to prediction of protein function and drug discovery. As the high-throughput techniques continue to release biological data, targets for fungal therapeutics that rely on PPIs are being proposed worldwide. Computational approaches have reduced the time taken to develop new therapeutic approaches. The near future brings the possibility of developing new PPI and interaction network inhibitors and a revolution in the way we treat fungal diseases.
Collapse
Affiliation(s)
- Thaynara G Santos
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Kleber Sf Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Raisa M Lima
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Lívia C Silva
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| | - Maristela Pereira
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, 74 000, Brazil
| |
Collapse
|
6
|
Wang S, Wu R, Lu J, Jiang Y, Huang T, Cai YD. Protein-protein interaction networks as miners of biological discovery. Proteomics 2022; 22:e2100190. [PMID: 35567424 DOI: 10.1002/pmic.202100190] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 03/28/2022] [Accepted: 04/29/2022] [Indexed: 11/12/2022]
Abstract
Protein-protein interactions (PPIs) form the basis of a myriad of biological pathways and mechanism, such as the formation of protein-complexes or the components of signaling cascades. Here, we reviewed experimental methods for identifying PPI pairs, including yeast two-hybrid, mass spectrometry, co-localization, and co-immunoprecipitation. Furthermore, a range of computational methods leveraging biochemical properties, evolution history, protein structures and more have enabled identification of additional PPIs. Given the wealth of known PPIs, we reviewed important network methods to construct and analyze networks of PPIs. These methods aid biological discovery through identifying hub genes and dynamic changes in the network, and have been thoroughly applied in various fields of biological research. Lastly, we discussed the challenges and future direction of research utilizing the power of PPI networks. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Steven Wang
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Runxin Wu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jiaqi Lu
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, USA
| | - Yijia Jiang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Tao Huang
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
7
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
8
|
Li S, Wu S, Wang L, Li F, Jiang H, Bai F. Recent advances in predicting protein-protein interactions with the aid of artificial intelligence algorithms. Curr Opin Struct Biol 2022; 73:102344. [PMID: 35219216 DOI: 10.1016/j.sbi.2022.102344] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 12/15/2022]
Abstract
Protein-protein interactions (PPIs) are essential in the regulation of biological functions and cell events, therefore understanding PPIs have become a key issue to understanding the molecular mechanism and investigating the design of drugs. Here we highlight the major developments in computational methods developed for predicting PPIs by using types of artificial intelligence algorithms. The first part introduces the source of experimental PPI data. The second part is devoted to the PPI prediction methods based on sequential information. The third part covers representative methods using structural information as the input feature. The last part is methods designed by combining different types of features. For each part, the state-of-the-art computational PPI prediction methods are reviewed in an inclusive view. Finally, we discuss the flaws existing in this area and future directions of next-generation algorithms.
Collapse
Affiliation(s)
- Shiwei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Sanan Wu
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Fenglei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; School of Information Science and Technology, ShanghaiTech University, Shanghai, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Pudong, Shanghai, 201203, China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, China; School of Information Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
9
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
10
|
Davies JS, Currie MJ, Wright JD, Newton-Vesty MC, North RA, Mace PD, Allison JR, Dobson RCJ. Selective Nutrient Transport in Bacteria: Multicomponent Transporter Systems Reign Supreme. Front Mol Biosci 2021; 8:699222. [PMID: 34268334 PMCID: PMC8276074 DOI: 10.3389/fmolb.2021.699222] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/02/2021] [Indexed: 11/24/2022] Open
Abstract
Multicomponent transporters are used by bacteria to transport a wide range of nutrients. These systems use a substrate-binding protein to bind the nutrient with high affinity and then deliver it to a membrane-bound transporter for uptake. Nutrient uptake pathways are linked to the colonisation potential and pathogenicity of bacteria in humans and may be candidates for antimicrobial targeting. Here we review current research into bacterial multicomponent transport systems, with an emphasis on the interaction at the membrane, as well as new perspectives on the role of lipids and higher oligomers in these complex systems.
Collapse
Affiliation(s)
- James S Davies
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Michael J Currie
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Joshua D Wright
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Michael C Newton-Vesty
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Rachel A North
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Peter D Mace
- Biochemistry Department, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Jane R Allison
- Maurice Wilkins Centre for Molecular Biodiscovery and School of Biological Sciences, Digital Life Institute, University of Auckland, Auckland, New Zealand
| | - Renwick C J Dobson
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
11
|
Saputra E, Kowalczyk A, Cusick L, Clark N, Chikina M. Phylogenetic Permulations: A Statistically Rigorous Approach to Measure Confidence in Associations in a Phylogenetic Context. Mol Biol Evol 2021; 38:3004-3021. [PMID: 33739420 PMCID: PMC8233500 DOI: 10.1093/molbev/msab068] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Many evolutionary comparative methods seek to identify associations between phenotypic traits or between traits and genotypes, often with the goal of inferring potential functional relationships between them. Comparative genomics methods aimed at this goal measure the association between evolutionary changes at the genetic level with traits evolving convergently across phylogenetic lineages. However, these methods have complex statistical behaviors that are influenced by nontrivial and oftentimes unknown confounding factors. Consequently, using standard statistical analyses in interpreting the outputs of these methods leads to potentially inaccurate conclusions. Here, we introduce phylogenetic permulations, a novel statistical strategy that combines phylogenetic simulations and permutations to calculate accurate, unbiased P values from phylogenetic methods. Permulations construct the null expectation for P values from a given phylogenetic method by empirically generating null phenotypes. Subsequently, empirical P values that capture the true statistical confidence given the correlation structure in the data are directly calculated based on the empirical null expectation. We examine the performance of permulation methods by analyzing both binary and continuous phenotypes, including marine, subterranean, and long-lived large-bodied mammal phenotypes. Our results reveal that permulations improve the statistical power of phylogenetic analyses and correctly calibrate statements of confidence in rejecting complex null distributions while maintaining or improving the enrichment of known functions related to the phenotype. We also find that permulations refine pathway enrichment analyses by correcting for nonindependence in gene ranks. Our results demonstrate that permulations are a powerful tool for improving statistical confidence in the conclusions of phylogenetic analysis when the parametric null is unknown.
Collapse
Affiliation(s)
- Elysia Saputra
- Joint Carnegie Mellon University - University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA.,Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Amanda Kowalczyk
- Joint Carnegie Mellon University - University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, USA.,Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Luisa Cusick
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathan Clark
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
12
|
Bloch I, Sherill-Rofe D, Stupp D, Unterman I, Beer H, Sharon E, Tabach Y. Optimization of co-evolution analysis through phylogenetic profiling reveals pathway-specific signals. Bioinformatics 2021; 36:4116-4125. [PMID: 32353123 DOI: 10.1093/bioinformatics/btaa281] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 04/17/2020] [Accepted: 04/23/2020] [Indexed: 12/11/2022] Open
Abstract
SUMMARY The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. AVAILABILITY AND IMPLEMENTATION Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. CONTACT yuvaltab@ekmd.huji.ac.il. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Hodaya Beer
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| |
Collapse
|
13
|
Rossi A, Treu L, Toppo S, Zschach H, Campanaro S, Dutilh BE. Evolutionary Study of the Crassphage Virus at Gene Level. Viruses 2020; 12:v12091035. [PMID: 32957679 PMCID: PMC7551546 DOI: 10.3390/v12091035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/03/2020] [Accepted: 09/14/2020] [Indexed: 12/15/2022] Open
Abstract
crAss-like viruses are a putative family of bacteriophages recently discovered. The eponym of the clade, crAssphage, is an enteric bacteriophage estimated to be present in at least half of the human population and it constitutes up to 90% of the sequences in some human fecal viral metagenomic datasets. We focused on the evolutionary dynamics of the genes encoded on the crAssphage genome. By investigating the conservation of the genes, a consistent variation in the evolutionary rates across the different functional groups was found. Gene duplications in crAss-like genomes were detected. By exploring the differences among the functional categories of the genes, we confirmed that the genes encoding capsid proteins were the most ubiquitous, despite their overall low sequence conservation. It was possible to identify a core of proteins whose evolutionary trees strongly correlate with each other, suggesting their genetic interaction. This group includes the capsid proteins, which are thus established as extremely suitable for rebuilding the phylogenetic tree of this viral clade. A negative correlation between the ubiquity and the conservation of viral protein sequences was shown. Together, this study provides an in-depth picture of the evolution of different genes in crAss-like viruses.
Collapse
Affiliation(s)
- Alessandro Rossi
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
| | - Laura Treu
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
- Correspondence: ; Tel.: +39-049-827-6165
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, 35131 Padova, Italy;
| | - Henrike Zschach
- Department of Biology, University of Copenhagen, 1017 Copenhagen, Denmark;
| | - Stefano Campanaro
- Department of Biology, University of Padova, 35131 Padova, Italy; (A.R.); (S.C.)
- CRIBI Biotechnology Center, University of Padua, 35131 Padova, Italy
| | - Bas E. Dutilh
- Institute of Biodynamics and Biocomplexity, University of Utrecht, 3508 Utrecht, The Netherlands;
| |
Collapse
|
14
|
Gueudré T, Baldassi C, Pagnani A, Weigt M. Predicting Interacting Protein Pairs by Coevolutionary Paralog Matching. Methods Mol Biol 2020; 2074:57-65. [PMID: 31583630 DOI: 10.1007/978-1-4939-9873-9_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Even if we know that two families of homologous proteins interact, we do not necessarily know, which specific proteins interact inside each species. The reason is that most families contain paralogs, i.e., more than one homologous sequence per species. We have developed a tool to predict interacting paralogs between the two protein families, which is based on the idea of inter-protein coevolution: our algorithm matches those members of the two protein families, which belong to the same species and collectively maximize the detectable coevolutionary signal. It is applicable even in cases, where simpler methods based, e.g., on genomic co-localization of genes coding for interacting proteins or orthology-based methods fail. In this method paper, we present an efficient implementation of this idea based on freely available software.
Collapse
Affiliation(s)
| | - Carlo Baldassi
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milan, Italy
- INFN, Sezione di Torino, Torino, Italy
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine, Turin, Italy
- INFN, Sezione di Torino, Torino, Italy
- Dipartimento di Scienza Applicata e Tecnologia, Politecnico di Torino, Torino, Italy
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, Paris, France.
| |
Collapse
|
15
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
16
|
Hillier C, Pardo M, Yu L, Bushell E, Sanderson T, Metcalf T, Herd C, Anar B, Rayner JC, Billker O, Choudhary JS. Landscape of the Plasmodium Interactome Reveals Both Conserved and Species-Specific Functionality. Cell Rep 2019; 28:1635-1647.e5. [PMID: 31390575 PMCID: PMC6693557 DOI: 10.1016/j.celrep.2019.07.019] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 05/28/2019] [Accepted: 07/08/2019] [Indexed: 11/16/2022] Open
Abstract
Malaria represents a major global health issue, and the identification of new intervention targets remains an urgent priority. This search is hampered by more than one-third of the genes of malaria-causing Plasmodium parasites being uncharacterized. We report a large-scale protein interaction network in Plasmodium schizonts, generated by combining blue native-polyacrylamide electrophoresis with quantitative mass spectrometry and machine learning. This integrative approach, spanning 3 species, identifies >20,000 putative protein interactions, organized into 600 protein clusters. We validate selected interactions, assigning functions in chromatin regulation to previously unannotated proteins and suggesting a role for an EELM2 domain-containing protein and a putative microrchidia protein as mechanistic links between AP2-domain transcription factors and epigenetic regulation. Our interactome represents a high-confidence map of the native organization of core cellular processes in Plasmodium parasites. The network reveals putative functions for uncharacterized proteins, provides mechanistic and structural insight, and uncovers potential alternative therapeutic targets.
Collapse
Affiliation(s)
- Charles Hillier
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Mercedes Pardo
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK.
| | - Lu Yu
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK
| | - Ellen Bushell
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden, Umeå University, 901 87 Umeå, Sweden
| | - Theo Sanderson
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Tom Metcalf
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Colin Herd
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Burcu Anar
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Julian C Rayner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Oliver Billker
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden, Umeå University, 901 87 Umeå, Sweden.
| | - Jyoti S Choudhary
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK.
| |
Collapse
|
17
|
Ding Z, Kihara D. Computational identification of protein-protein interactions in model plant proteomes. Sci Rep 2019; 9:8740. [PMID: 31217453 PMCID: PMC6584649 DOI: 10.1038/s41598-019-45072-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs of Arabidopsis thaliana and applied to three plants, Arabidopsis thaliana, Glycine max (soybean), and Zea mays (maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets of Arabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs in Arabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, 45229, USA.
| |
Collapse
|
18
|
Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018; 93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanism of diseases. PPIs are also important targets for developing drugs. Experimental methods, both small-scale and large-scale, have identified PPIs in several model organisms. However, results cover only a part of PPIs of organisms; moreover, there are many organisms whose PPIs have not yet been investigated. To complement experimental methods, many computational methods have been developed that predict PPIs from various characteristics of proteins. Here we provide an overview of literature reports to classify computational PPI prediction methods that consider different features of proteins, including protein sequence, genomes, protein structure, function, PPI network topology, and those which integrate multiple methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA
- Corresponding author: DK; , Phone: 1-765-496-2284 (DK)
| |
Collapse
|
19
|
Dos Santos Vasconcelos CR, de Lima Campos T, Rezende AM. Building protein-protein interaction networks for Leishmania species through protein structural information. BMC Bioinformatics 2018; 19:85. [PMID: 29510668 PMCID: PMC5840830 DOI: 10.1186/s12859-018-2105-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 03/01/2018] [Indexed: 12/21/2022] Open
Abstract
Background Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. Results The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. Conclusions The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported. Electronic supplementary material The online version of this article (10.1186/s12859-018-2105-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Crhisllane Rafaele Dos Santos Vasconcelos
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Genetics Department of Universidade Federal de Pernambuco, Recife, PE, Brazil.
| | - Túlio de Lima Campos
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil.,Bioinformatics Plataform of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil
| | - Antonio Mauro Rezende
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Bioinformatics Plataform of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Genetics Department of Universidade Federal de Pernambuco, Recife, PE, Brazil.
| |
Collapse
|
20
|
Malik S, Sharma D, Khatri SK. Reconstructing phylogenetic tree using a protein-protein interaction technique. IET Nanobiotechnol 2017; 11:1005-1016. [PMID: 29155401 DOI: 10.1049/iet-nbt.2016.0177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
In this study, a novel substitution method for finding potential protein-protein interactions (PPIs) has been discussed. This newly designed method for analyzing PPI also aids in the comparison of evolutionary distances. The method deals with various data sets, and additionally performs measurable assessment to determine PPIs is introduced. PPIs are biologically relevant and aid in better conceptual framework of phylogenetic profiling. The newly designed framework gives vision to relate the topological properties of the system with evolutionary behavior of datasets. Firstly, this study found that the most conserved protein motifs exist at the roots of the system, whereas newer motifs with mutations have a tendency to dwell on the branches. In-depth functional analysis revealed that the most conserved motifs have high specificity for improved structural procedures and pathway engagements, which may help identify their formative parts in cells. In conclusion, this study demonstrates several important aspects for future studies focusing to enhance phylogenetic profiling systems. This study can also be used effectively to utilize such strategies to develop new biological insights which will further lead to understanding of disease mechanisms.
Collapse
Affiliation(s)
- Shamita Malik
- Amity School of Engineering and Technology, Amity University, Uttar Pradesh, India.
| | - Dolly Sharma
- Computer Science and Engineering Department, Shiv Nadar University, Uttar Pradesh, India
| | - Sunil Kumar Khatri
- Amity Institute of Information Technology, Amity University, Uttar Pradesh, India
| |
Collapse
|
21
|
Frenkel-Morgenstern M, Gorohovski A, Tagore S, Sekar V, Vazquez M, Valencia A. ChiPPI: a novel method for mapping chimeric protein-protein interactions uncovers selection principles of protein fusion events in cancer. Nucleic Acids Res 2017; 45:7094-7105. [PMID: 28549153 PMCID: PMC5499553 DOI: 10.1093/nar/gkx423] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 05/07/2017] [Indexed: 12/20/2022] Open
Abstract
Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by chromosomal aberrations. The expressed fusion protein incorporates domains of both parental proteins. Using a methodology that treats discrete protein domains as binding sites for specific domains of interacting proteins, we have cataloged the protein interaction networks for 11 528 cancer fusions (ChiTaRS-3.1). Here, we present our novel method, chimeric protein–protein interactions (ChiPPI) that uses the domain–domain co-occurrence scores in order to identify preserved interactors of chimeric proteins. Mapping the influence of fusion proteins on cell metabolism and pathways reveals that ChiPPI networks often lose tumor suppressor proteins and gain oncoproteins. Furthermore, fusions often induce novel connections between non-interactors skewing interaction networks and signaling pathways. We compared fusion protein PPI networks in leukemia/lymphoma, sarcoma and solid tumors finding distinct enrichment patterns for each disease type. While certain pathways are enriched in all three diseases (Wnt, Notch and TGF β), there are distinct patterns for leukemia (EGFR signaling, DNA replication and CCKR signaling), for sarcoma (p53 pathway and CCKR signaling) and solid tumors (FGFR and EGFR signaling). Thus, the ChiPPI method represents a comprehensive tool for studying the anomaly of skewed cellular networks produced by fusion proteins in cancer.
Collapse
Affiliation(s)
| | | | - Somnath Tagore
- Faculty of Medicine, Bar-Ilan-University, Henrietta Szold 8, Safed 1311502, Israel
| | - Vaishnovi Sekar
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| | - Miguel Vazquez
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), M.F.Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
22
|
Sharma A, Wai CM, Ming R, Yu Q. Diurnal Cycling Transcription Factors of Pineapple Revealed by Genome-Wide Annotation and Global Transcriptomic Analysis. Genome Biol Evol 2017; 9:2170-2190. [PMID: 28922793 PMCID: PMC5737478 DOI: 10.1093/gbe/evx161] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/22/2017] [Indexed: 12/22/2022] Open
Abstract
Circadian clock provides fitness advantage by coordinating internal metabolic and physiological processes to external cyclic environments. Core clock components exhibit daily rhythmic changes in gene expression, and the majority of them are transcription factors (TFs) and transcription coregulators (TCs). We annotated 1,398 TFs from 67 TF families and 80 TCs from 20 TC families in pineapple, and analyzed their tissue-specific and diurnal expression patterns. Approximately 42% of TFs and 45% of TCs displayed diel rhythmic expression, including 177 TF/TCs cycling only in the nonphotosynthetic leaf tissue, 247 cycling only in the photosynthetic leaf tissue, and 201 cycling in both. We identified 68 TF/TCs whose cycling expression was tightly coupled between the photosynthetic and nonphotosynthetic leaf tissues. These TF/TCs likely coordinate key biological processes in pineapple as we demonstrated that this group is enriched in homologous genes that form the core circadian clock in Arabidopsis and includes a STOP1 homolog. Two lines of evidence support the important role of the STOP1 homolog in regulating CAM photosynthesis in pineapple. First, STOP1 responds to acidic pH and regulates a malate channel in multiple plant species. Second, the cycling expression pattern of the pineapple STOP1 and the diurnal pattern of malate accumulation in pineapple leaf are correlated. We further examined duplicate-gene retention and loss in major known circadian genes and refined their evolutionary relationships between pineapple and other plants. Significant variations in duplicate-gene retention and loss were observed for most clock genes in both monocots and dicots.
Collapse
Affiliation(s)
- Anupma Sharma
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas
| | - Ching Man Wai
- Department of Plant Biology, University of Illinois at Urbana-Champaign
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana-Champaign
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
| | - Qingyi Yu
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
- Department of Plant Pathology and Microbiology, Texas A&M University
| |
Collapse
|
23
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
24
|
Vicens A, Andrade‐López K, Cortez D, Gutiérrez RM, Treviño CL. Premammalian origin of the sperm-specific Slo3 channel. FEBS Open Bio 2017; 7:382-390. [PMID: 28286733 PMCID: PMC5337896 DOI: 10.1002/2211-5463.12186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Revised: 12/01/2016] [Accepted: 12/16/2016] [Indexed: 01/05/2023] Open
Abstract
Slo3 is a sperm-specific potassium (K+) channel essential for male fertility. Slo3 channels have so far been considered to be specific to mammals. Through exploratory genomics, we identified the Slo3 gene in the genome of terrestrial (birds and reptiles) and aquatic (fish) vertebrates. In the case of fish, Slo3 has undergone several episodes of gene loss. Transcriptomic analysis showed that vertebrate Slo3 transcript orthologues are predominantly expressed in testis, in concordance with the mammalian Slo3. We conclude that the Slo3 gene arose during the radiation of early vertebrates, much earlier than previously thought. Our findings add to the growing evidence indicating that the phylogenetic profiles of sperm-specific channels are intermittent throughout metazoan evolution, which probably reflects the adaptation of sperm to different ionic milieus and fertilization environments.
Collapse
Affiliation(s)
- Alberto Vicens
- Departamento de Genética del Desarrollo y Fisiología MolecularInstituto de BiotecnologíaUniversidad Nacional Autónoma de MéxicoCuernavaca MorelosMéxico
| | - Karla Andrade‐López
- Departamento de Genética del Desarrollo y Fisiología MolecularInstituto de BiotecnologíaUniversidad Nacional Autónoma de MéxicoCuernavaca MorelosMéxico
| | - Diego Cortez
- Centro de Ciencias GenómicasUniversidad Nacional Autónoma de MéxicoCuernavaca MorelosMéxico
| | - Rosa María Gutiérrez
- Departamento de Microbiología MolecularInstituto de BiotecnologíaUniversidad Nacional Autónoma de MéxicoCuernavaca MorelosMéxico
| | - Claudia L. Treviño
- Departamento de Genética del Desarrollo y Fisiología MolecularInstituto de BiotecnologíaUniversidad Nacional Autónoma de MéxicoCuernavaca MorelosMéxico
| |
Collapse
|
25
|
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A 2016; 113:12186-12191. [PMID: 27729520 DOI: 10.1073/pnas.1607570113] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue-residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue-residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.
Collapse
|
26
|
Ding Y, Tang J, Guo F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics 2016; 17:398. [PMID: 27677692 PMCID: PMC5039908 DOI: 10.1186/s12859-016-1253-9] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF). Methods Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs. Results To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network. Conclusions Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.
Collapse
Affiliation(s)
- Yijie Ding
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.
| |
Collapse
|
27
|
Identification of Protein-Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information. Int J Mol Sci 2016; 17:ijms17101623. [PMID: 27669239 PMCID: PMC5085656 DOI: 10.3390/ijms17101623] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 09/07/2016] [Accepted: 09/07/2016] [Indexed: 12/20/2022] Open
Abstract
Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the S.cerevisiae dataset, our method achieves 94.83% accuracy and 92.40% sensitivity. Compared with existing methods, and the accuracy of our method is increased by 0.11 percentage points. On the H.pylori dataset, our method achieves 89.06% accuracy and 88.15% sensitivity, the accuracy of our method is increased by 0.76%. On the Human PPI dataset, our method achieves 97.60% accuracy and 96.37% sensitivity, and the accuracy of our method is increased by 1.30%. In addition, we test our method on a very important PPI network, and it achieves 92.71% accuracy. In the Wnt-related network, the accuracy of our method is increased by 16.67%. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53.
Collapse
|
28
|
Vamparys L, Laurent B, Carbone A, Sacquin-Mora S. Great interactions: How binding incorrect partners can teach us about protein recognition and function. Proteins 2016; 84:1408-21. [PMID: 27287388 PMCID: PMC5516155 DOI: 10.1002/prot.25086] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/01/2016] [Accepted: 06/02/2016] [Indexed: 12/29/2022]
Abstract
Protein–protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross‐docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross‐docking predictions using the area under the specificity‐sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding‐site predictions resulting from the cross‐docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408–1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Lydie Vamparys
- Laboratoire De Biochimie Théorique, CNRS UPR 9080, Institut De Biologie Physico-Chimique, 13 Rue Pierre Et Marie Curie, Paris, 75005, France
| | - Benoist Laurent
- Laboratoire De Biochimie Théorique, CNRS UPR 9080, Institut De Biologie Physico-Chimique, 13 Rue Pierre Et Marie Curie, Paris, 75005, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC Univ-Paris 6, CNRS UMR7238, Laboratoire De Biologie Computationnelle Et Quantitative, 15 Rue De L'Ecole De Médecine, Paris, 75006, France.,Institut Universitaire De France, Paris, 75005, France
| | - Sophie Sacquin-Mora
- Laboratoire De Biochimie Théorique, CNRS UPR 9080, Institut De Biologie Physico-Chimique, 13 Rue Pierre Et Marie Curie, Paris, 75005, France.
| |
Collapse
|
29
|
Reconstruction and Application of Protein-Protein Interaction Network. Int J Mol Sci 2016; 17:ijms17060907. [PMID: 27338356 PMCID: PMC4926441 DOI: 10.3390/ijms17060907] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Revised: 05/31/2016] [Accepted: 06/03/2016] [Indexed: 11/17/2022] Open
Abstract
The protein-protein interaction network (PIN) is a useful tool for systematic investigation of the complex biological activities in the cell. With the increasing interests on the proteome-wide interaction networks, PINs have been reconstructed for many species, including virus, bacteria, plants, animals, and humans. With the development of biological techniques, the reconstruction methods of PIN are further improved. PIN has gradually penetrated many fields in biological research. In this work we systematically reviewed the development of PIN in the past fifteen years, with respect to its reconstruction and application of function annotation, subsystem investigation, evolution analysis, hub protein analysis, and regulation mechanism analysis. Due to the significant role of PIN in the in-depth exploration of biological process mechanisms, PIN will be preferred by more and more researchers for the systematic study of the protein systems in various kinds of organisms.
Collapse
|
30
|
Jiménez-Sánchez A. Coevolution of RAC Small GTPases and their Regulators GEF Proteins. Evol Bioinform Online 2016; 12:121-31. [PMID: 27226705 PMCID: PMC4872645 DOI: 10.4137/ebo.s38031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 03/31/2016] [Accepted: 04/03/2016] [Indexed: 01/16/2023] Open
Abstract
RAC proteins are small GTPases involved in important cellular processes in eukaryotes, and their deregulation may contribute to cancer. Activation of RAC proteins is regulated by DOCK and DBL protein families of guanine nucleotide exchange factors (GEFs). Although DOCK and DBL proteins act as GEFs on RAC proteins, DOCK and DBL family members are evolutionarily unrelated. To understand how DBL and DOCK families perform the same function on RAC proteins despite their unrelated primary structure, phylogenetic analyses of the RAC, DBL, and DOCK families were implemented, and interaction patterns that may suggest a coevolutionary process were searched. Interestingly, while RAC and DOCK proteins are very well conserved in humans and among eukaryotes, DBL proteins are highly divergent. Moreover, correlation analyses of the phylogenetic distances of RAC and GEF proteins and covariation analyses between residues in the interacting domains showed significant coevolution rates for both RAC–DOCK and RAC–DBL interactions.
Collapse
Affiliation(s)
- Alejandro Jiménez-Sánchez
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.; Previously at Department of Biology, University of York, York, UK
| |
Collapse
|
31
|
Feinauer C, Szurmant H, Weigt M, Pagnani A. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon. PLoS One 2016; 11:e0149166. [PMID: 26882169 PMCID: PMC4755613 DOI: 10.1371/journal.pone.0149166] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 01/28/2016] [Indexed: 11/29/2022] Open
Abstract
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.
Collapse
Affiliation(s)
- Christoph Feinauer
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
| | - Hendrik Szurmant
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Martin Weigt
- Sorbonne Universités, UPMC, UMR 7238, Computational and Quantitative Biology, Paris, France
- CNRS, UMR 7238, Computational and Quantitative Biology, Paris, France
- * E-mail: (MW); (AP)
| | - Andrea Pagnani
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation, Molecular Biotechnology Center (MBC), Torino, Italy
- * E-mail: (MW); (AP)
| |
Collapse
|
32
|
Li Z, Tang J, Guo F. Identification of 14-3-3 Proteins Phosphopeptide-Binding Specificity Using an Affinity-Based Computational Approach. PLoS One 2016; 11:e0147467. [PMID: 26828594 PMCID: PMC4734684 DOI: 10.1371/journal.pone.0147467] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 01/04/2016] [Indexed: 11/17/2022] Open
Abstract
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor genes. For each 14-3-3 isoform, we have 1,000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we propose a sampling criteria to build a predictor for each new peptide sequence. Then, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Finally, we consider elastic net to predict affinity values of peptide motifs, based on ridge regression and least absolute shrinkage and selection operator (LASSO). Our method tests on the 1,000 known peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall pearson-product-moment correlation coefficient (PCC) and root mean squared error (RMSE) values of 0.84 and 252.31 for N-terminal sublibrary, and 0.77 and 269.13 for C-terminal sublibrary. We predict affinity values of 16,000 peptide sequences and relative binding ability across six permutated positions similar with experimental values. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs are in the same amino acid category with experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is fast and reliable and is a general computational method that can be used in peptide-protein binding identification in proteomics research.
Collapse
Affiliation(s)
- Zhao Li
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China.,School of Computational Science and Engineering, University of South Carolina, Columbia, United States of America
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China
| |
Collapse
|
33
|
Calvo-Martín JM, Librado P, Aguadé M, Papaceit M, Segarra C. Adaptive selection and coevolution at the proteins of the Polycomb repressive complexes in Drosophila. Heredity (Edinb) 2016; 116:213-23. [PMID: 26486609 PMCID: PMC4806890 DOI: 10.1038/hdy.2015.91] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 07/23/2015] [Accepted: 08/10/2015] [Indexed: 11/08/2022] Open
Abstract
Polycomb group (PcG) proteins are important epigenetic regulatory proteins that modulate the chromatin state through posttranslational histone modifications. These interacting proteins form multimeric complexes that repress gene expression. Thus, PcG proteins are expected to evolve coordinately, which might be reflected in their phylogenetic trees by concordant episodes of positive selection and by a correlation in evolutionary rates. In order to detect these signals of coevolution, the molecular evolution of 17 genes encoding the subunits of five Polycomb repressive complexes has been analyzed in the Drosophila genus. The observed distribution of divergence differs substantially among and along proteins. Indeed, CAF1 is uniformly conserved, whereas only the established protein domains are conserved in other proteins, such as PHO, PHOL, PSC, PH-P and ASX. Moreover, regions with a low divergence not yet described as protein domains are present, for instance, in SFMBT and SU(Z)12. Maximum likelihood methods indicate an acceleration in the nonsynonymous substitution rate at the lineage ancestral to the obscura group species in most genes encoding subunits of the Pcl-PRC2 complex and in genes Sfmbt, Psc and Kdm2. These methods also allow inferring the action of positive selection in this lineage at genes E(z) and Sfmbt. Finally, the protein interaction network predicted from the complete proteomes of 12 Drosophila species using a coevolutionary approach shows two tight PcG clusters. These clusters include well-established binary interactions among PcG proteins as well as new putative interactions.
Collapse
Affiliation(s)
- J M Calvo-Martín
- Facultat de Biologia, Departament de Genètica, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - P Librado
- Facultat de Biologia, Departament de Genètica, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - M Aguadé
- Facultat de Biologia, Departament de Genètica, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - M Papaceit
- Facultat de Biologia, Departament de Genètica, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - C Segarra
- Facultat de Biologia, Departament de Genètica, and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
34
|
Avila-Herrera A, Pollard KS. Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species. BMC Bioinformatics 2015; 16:268. [PMID: 26303588 PMCID: PMC4549020 DOI: 10.1186/s12859-015-0677-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/17/2015] [Indexed: 01/09/2023] Open
Abstract
Background When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure. Results Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used null distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments. Conclusions We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0677-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aram Avila-Herrera
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | - Katherine S Pollard
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA. .,Department of Epidemiology and Biostatistics, University of California, San Francisco, USA. .,Institute for Human Genetics, University of California, San Francisco, 94158, CA, USA.
| |
Collapse
|
35
|
Identification of Protein–Protein Interactions by Detecting Correlated Mutation at the Interface. J Chem Inf Model 2015; 55:2042-9. [DOI: 10.1021/acs.jcim.5b00320] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
36
|
Wolfe NW, Clark NL. ERC analysis: web-based inference of gene function via evolutionary rate covariation. Bioinformatics 2015; 31:3835-7. [PMID: 26243019 DOI: 10.1093/bioinformatics/btv454] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/25/2015] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. AVAILABILITY AND IMPLEMENTATION Analyses and data at http://csb.pitt.edu/erc_analysis/ CONTACT nclark@pitt.edu.
Collapse
Affiliation(s)
- Nicholas W Wolfe
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Nathan L Clark
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
37
|
Drinkwater B, Charleston MA. A time and space complexity reduction for coevolutionary analysis of trees generated under both a Yule and Uniform model. Comput Biol Chem 2015; 57:61-71. [DOI: 10.1016/j.compbiolchem.2015.02.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Accepted: 02/03/2015] [Indexed: 11/30/2022]
|
38
|
Arenas AF, Salcedo GE, Montoya AM, Gomez-Marin JE. MSCA: a spectral comparison algorithm between time series to identify protein-protein interactions. BMC Bioinformatics 2015; 16:152. [PMID: 25963052 PMCID: PMC4448560 DOI: 10.1186/s12859-015-0599-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 04/13/2015] [Indexed: 12/27/2022] Open
Abstract
Background The interactions between pathogen proteins and their hosts allow pathogens to manipulate host cellular mechanisms to their advantage. The identification of host proteins that are targeted by virulent pathogen proteins is crucial to increase our understanding of infection mechanisms and to propose new therapeutics that target pathogens. Understanding the virulence mechanisms of pathogens requires a detailed molecular description of the proteins involved, but acquiring this knowledge is time consuming and prohibitively expensive. Therefore, we develop a statistical method based on hypothesis testing to compare the time series obtained from conversion of the physicochemical characteristics of the amino acids that form the primary structure of proteins and thus to propose potential functional relation between proteins. We called this algorithm the multiple spectral comparison algorithm (MSCA); the MSCA was inspired by the BLASTP tool and was implemented in R code. The algorithm compares and relates multiple time series according to their spectral similarities, and the biological relation between them could be interpreted as either a similar function or protein-protein interaction (PPI). Results A simulation study showed that the MSCA works satisfactorily well when we compare unequal time series generated from ARMA processes because its power was close to 1. The MSCA presented a 70% average accuracy of detecting protein interactions using a threshold of 0.7 for our spectral measure, indicating that this algorithm could predict novel PPIs and pathogen-host interactions (PHIs) with acceptable confidence. The MSCA also was validated by its identification of well-known interactions of the human proteins MAGI1, SCRIB and JAK1, as well as interactions of the virulence proteins ROP16, ROP18, ROP17 and ROP5. We verified the spectral similarities for human intraspecific PPIs and PHIs that were previously demonstrated experimentally by other authors. We suggest that human GBP (GTPase group induced by interferon) and the CREB transcription factor family could be human substrates for the complex of ROP18, ROP17 and ROP5. Conclusions Using multiple-hypothesis testing between the spectral densities of a set of unequal time series, we developed an algorithm that is able to identify the similarities or interactions between a set of proteins. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0599-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ailan F Arenas
- Gepamol, Universidad del Quindío, Carrera 15 Calle 12N, Armenia, Colombia.
| | - Gladys E Salcedo
- Grupo de Investigación y Asesoría en Estadística, Carrera 15 Calle 12N, 460, Armenia, Colombia.
| | - Andrey M Montoya
- Grupo de Investigación y Asesoría en Estadística, Carrera 15 Calle 12N, 460, Armenia, Colombia.
| | | |
Collapse
|
39
|
Scaife MA, Nguyen GTDT, Rico J, Lambert D, Helliwell KE, Smith AG. Establishing Chlamydomonas reinhardtii as an industrial biotechnology host. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015; 82:532-546. [PMID: 25641561 PMCID: PMC4515103 DOI: 10.1111/tpj.12781] [Citation(s) in RCA: 115] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 01/19/2015] [Accepted: 01/20/2015] [Indexed: 05/20/2023]
Abstract
Microalgae constitute a diverse group of eukaryotic unicellular organisms that are of interest for pure and applied research. Owing to their natural synthesis of value-added natural products microalgae are emerging as a source of sustainable chemical compounds, proteins and metabolites, including but not limited to those that could replace compounds currently made from fossil fuels. For the model microalga, Chlamydomonas reinhardtii, this has prompted a period of rapid development so that this organism is poised for exploitation as an industrial biotechnology platform. The question now is how best to achieve this? Highly advanced industrial biotechnology systems using bacteria and yeasts were established in a classical metabolic engineering manner over several decades. However, the advent of advanced molecular tools and the rise of synthetic biology provide an opportunity to expedite the development of C. reinhardtii as an industrial biotechnology platform, avoiding the process of incremental improvement. In this review we describe the current status of genetic manipulation of C. reinhardtii for metabolic engineering. We then introduce several concepts that underpin synthetic biology, and show how generic parts are identified and used in a standard manner to achieve predictable outputs. Based on this we suggest that the development of C. reinhardtii as an industrial biotechnology platform can be achieved more efficiently through adoption of a synthetic biology approach.
Collapse
Affiliation(s)
- Mark A Scaife
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
- *For correspondence (e-mails or )
| | - Ginnie TDT Nguyen
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
| | - Juan Rico
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
| | - Devinn Lambert
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
| | - Katherine E Helliwell
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
| | - Alison G Smith
- Department of Plant Science, University of CambridgeDowning Street, Cambridge, CB2 3EA, UK
- *For correspondence (e-mails or )
| |
Collapse
|
40
|
Ochoa D, Juan D, Valencia A, Pazos F. Detection of significant protein coevolution. ACTA ACUST UNITED AC 2015; 31:2166-73. [PMID: 25717190 DOI: 10.1093/bioinformatics/btv102] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 02/11/2015] [Indexed: 11/14/2022]
Abstract
MOTIVATION The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals. RESULTS In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information. AVAILABILITY AND IMPLEMENTATION The software and datasets used in this work are freely available at: http://csbg.cnb.csic.es/pMT/. CONTACT pazos@cnb.csic.es SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Ochoa
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - David Juan
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Alfonso Valencia
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid and Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/ Melchor Fernández Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
41
|
Priedigkeit N, Wolfe N, Clark NL. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks. PLoS Genet 2015; 11:e1004967. [PMID: 25679399 PMCID: PMC4334549 DOI: 10.1371/journal.pgen.1004967] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 12/19/2014] [Indexed: 12/27/2022] Open
Abstract
Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks. Molecular evolution has informed our understanding of gene function; however, classical methods have largely been static in their implementation, focusing on single genes. Here, we present and prove the utility of a dynamic, network-based understanding of molecular evolution to infer relationships between genes associated with human diseases. We have shown previously that groups of genes within functional niches tend to share similar evolutionary histories. Exploiting the availability of whole genomes from multiple species, these histories can be numerically scored and dynamically compared to one another using a sequence-based signature termed Evolutionary Rate Covariation (ERC). To explore potential applications, we characterized ERC amongst disease genes and found that many diseases contain significant ERC signatures between their contributing genes. We show that ERC can also prioritize “true” disease genes amongst unrelated gene candidates. Lastly, these signatures can serve as a foundation for creating instructive gene-based networks, unveiling novel relationships between diseases thought to be clinically distinct. Our hope is that this study will add to the increasing evidence that advancing our understanding of molecular evolution can be a crucial asset in large-scale gene discovery pursuits (Link to our webserver that provides intuitive ERC analysis tools: http://csb.pitt.edu/erc_analysis/).
Collapse
Affiliation(s)
- Nolan Priedigkeit
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Nicholas Wolfe
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Nathan L. Clark
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
42
|
Qian W, Zhou H, Tang K. Recent coselection in human populations revealed by protein-protein interaction network. Genome Biol Evol 2014; 7:136-53. [PMID: 25532814 PMCID: PMC4316623 DOI: 10.1093/gbe/evu270] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Genome-wide scans for signals of natural selection in human populations have identified a large number of candidate loci that underlie local adaptations. This is surprising given the relatively short evolutionary time since the divergence of the human population. One hypothesis that has not been formally examined is whether and how the recent human evolution may have been shaped by coselection in the context of complex molecular interactome. In this study, genome-wide signals of selection were scanned in East Asians, Europeans, and Africans using 1000 Genome data, and subsequently mapped onto the protein-protein interaction (PPI) network. We found that the candidate genes of recent positive selection localized significantly closer to each other on the PPI network than expected, revealing substantial clustering of selected genes. Furthermore, gene pairs of shorter PPI network distances showed higher similarities of their recent evolutionary paths than those further apart. Last, subnetworks enriched with recent coselection signals were identified, which are substantially overrepresented in biological pathways related to signal transduction, neurogenesis, and immune function. These results provide the first genome-wide evidence for association of recent selection signals with the PPI network, shedding light on the potential mechanisms of recent coselection in the human genome.
Collapse
Affiliation(s)
- Wei Qian
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hang Zhou
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kun Tang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
43
|
Abstract
The past decade has seen a dramatic expansion in the number and range of techniques available to obtain genome-wide information and to analyze this information so as to infer both the functions of individual molecules and how they interact to modulate the behavior of biological systems. Here, we review these techniques, focusing on the construction of physical protein-protein interaction networks, and highlighting approaches that incorporate protein structure, which is becoming an increasingly important component of systems-level computational techniques. We also discuss how network analyses are being applied to enhance our basic understanding of biological systems and their disregulation, as well as how these networks are being used in drug development.
Collapse
Affiliation(s)
- Donald Petrey
- Center for Computational Biology and Bioinformatics, Department of Systems Biology
| | | |
Collapse
|
44
|
Vicens A, Roldan ER. Coevolution of Positively Selected IZUMO1 and CD9 in Rodents: Evidence of Interaction Between Gamete Fusion Proteins?1. Biol Reprod 2014; 90:113. [DOI: 10.1095/biolreprod.113.116871] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
|
45
|
Ochoa D, Pazos F. Practical aspects of protein co-evolution. Front Cell Dev Biol 2014; 2:14. [PMID: 25364721 PMCID: PMC4207036 DOI: 10.3389/fcell.2014.00014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 04/02/2014] [Indexed: 11/15/2022] Open
Abstract
Co-evolution is a fundamental aspect of Evolutionary Theory. At the molecular level, co-evolutionary linkages between protein families have been used as indicators of protein interactions and functional relationships from long ago. Due to the complexity of the problem and the amount of genomic data required for these approaches to achieve good performances, it took a relatively long time from the appearance of the first ideas and concepts to the quotidian application of these approaches and their incorporation to the standard toolboxes of bioinformaticians and molecular biologists. Today, these methodologies are mature (both in terms of performance and usability/implementation), and the genomic information that feeds them large enough to allow their general application. This review tries to summarize the current landscape of co-evolution-based methodologies, with a strong emphasis on describing interesting cases where their application to important biological systems, alone or in combination with other computational and experimental approaches, allowed getting new insight into these.
Collapse
Affiliation(s)
- David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Hinxton, UK
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC) Madrid, Spain
| |
Collapse
|
46
|
El-Kebir M, Marschall T, Wohlers I, Patterson M, Heringa J, Schönhuth A, Klau GW. Mapping proteins in the presence of paralogs using units of coevolution. BMC Bioinformatics 2014; 14 Suppl 15:S18. [PMID: 24564758 PMCID: PMC3852051 DOI: 10.1186/1471-2105-14-s15-s18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible.
Collapse
|
47
|
Prediction of protein-protein interaction with pairwise kernel support vector machine. Int J Mol Sci 2014; 15:3220-33. [PMID: 24566145 PMCID: PMC3958907 DOI: 10.3390/ijms15023220] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 01/27/2014] [Accepted: 01/29/2014] [Indexed: 11/17/2022] Open
Abstract
Protein–protein interactions (PPIs) play a key role in many cellular processes. Unfortunately, the experimental methods currently used to identify PPIs are both time-consuming and expensive. These obstacles could be overcome by developing computational approaches to predict PPIs. Here, we report two methods of amino acids feature extraction: (i) distance frequency with PCA reducing the dimension (DFPCA) and (ii) amino acid index distribution (AAID) representing the protein sequences. In order to obtain the most robust and reliable results for PPI prediction, pairwise kernel function and support vector machines (SVM) were employed to avoid the concatenation order of two feature vectors generated with two proteins. The highest prediction accuracies of AAID and DFPCA were 94% and 93.96%, respectively, using the 10 CV test, and the results of pairwise radial basis kernel function are considerably improved over those based on radial basis kernel function. Overall, the PPI prediction tool, termed PPI-PKSVM, which is freely available at http://159.226.118.31/PPI/index.html, promises to become useful in such areas as bio-analysis and drug development.
Collapse
|
48
|
Sandler I, Zigdon N, Levy E, Aharoni A. The functional importance of co-evolving residues in proteins. Cell Mol Life Sci 2014; 71:673-82. [PMID: 23995987 PMCID: PMC11113390 DOI: 10.1007/s00018-013-1458-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/26/2013] [Accepted: 08/13/2013] [Indexed: 10/26/2022]
Abstract
Computational approaches for detecting co-evolution in proteins allow for the identification of protein-protein interaction networks in different organisms and the assignment of function to under-explored proteins. The detection of co-variation of amino acids within or between proteins, moreover, allows for the discovery of residue-residue contacts and highlights functional residues that can affect the binding affinity, catalytic activity, or substrate specificity of a protein. To explore the functional impact of co-evolutionary changes in proteins, a combined experimental and computational approach must be recruited. Here, we review recent studies that apply computational and experimental tools to obtain novel insight into the structure, function, and evolution of proteins. Specifically, we describe the application of co-evolutionary analysis for predicting high-resolution three-dimensional structures of proteins. In addition, we describe computational approaches followed by experimental analysis for identifying specificity-determining residues in proteins. Finally, we discuss studies addressing the importance of such residues in terms of the functional divergence of proteins, allowing proteins to evolve new functions while avoiding crosstalk with existing cellular pathways or forming reproductive barriers and hence promoting speciation.
Collapse
Affiliation(s)
- Inga Sandler
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Nitzan Zigdon
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Efrat Levy
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Amir Aharoni
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
- National Institute for Biotechnology in the Negev (NIBN), Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| |
Collapse
|
49
|
Evolutionary rate covariation identifies new members of a protein network required for Drosophila melanogaster female post-mating responses. PLoS Genet 2014; 10:e1004108. [PMID: 24453993 PMCID: PMC3894160 DOI: 10.1371/journal.pgen.1004108] [Citation(s) in RCA: 114] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 11/27/2013] [Indexed: 11/19/2022] Open
Abstract
Seminal fluid proteins transferred from males to females during copulation are required for full fertility and can exert dramatic effects on female physiology and behavior. In Drosophila melanogaster, the seminal protein sex peptide (SP) affects mated females by increasing egg production and decreasing receptivity to courtship. These behavioral changes persist for several days because SP binds to sperm that are stored in the female. SP is then gradually released, allowing it to interact with its female-expressed receptor. The binding of SP to sperm requires five additional seminal proteins, which act together in a network. Hundreds of uncharacterized male and female proteins have been identified in this species, but individually screening each protein for network function would present a logistical challenge. To prioritize the screening of these proteins for involvement in the SP network, we used a comparative genomic method to identify candidate proteins whose evolutionary rates across the Drosophila phylogeny co-vary with those of the SP network proteins. Subsequent functional testing of 18 co-varying candidates by RNA interference identified three male seminal proteins and three female reproductive tract proteins that are each required for the long-term persistence of SP responses in females. Molecular genetic analysis showed the three new male proteins are required for the transfer of other network proteins to females and for SP to become bound to sperm that are stored in mated females. The three female proteins, in contrast, act downstream of SP binding and sperm storage. These findings expand the number of seminal proteins required for SP's actions in the female and show that multiple female proteins are necessary for the SP response. Furthermore, our functional analyses demonstrate that evolutionary rate covariation is a valuable predictive tool for identifying candidate members of interacting protein networks. Reproduction requires more than a sperm and an egg. In animals with internal fertilization, other proteins in the seminal fluid and the female are essential for full fertility. Although hundreds of such reproductive proteins are known, our ability to understand how they interact remains limited. In this study, we investigated whether shared patterns of protein sequence evolution were predictive of functional interactions by focusing on a small network of proteins that control fertility and female post-mating behavior in the fruit fly, Drosophila melanogaster. We first showed that the six proteins already known to act in this network display correlated patterns of evolution across the Drosophila phylogeny. We then screened hundreds of otherwise uncharacterized male and female reproductive proteins and identified those with patterns of evolution most similar to those of the known network proteins. We tested each of these candidate genes and found six new network members that are each required for long-term fertility. Using molecular genetics, we also observed that the steps in the network at which these new proteins act are consistent with their strongest evolutionary correlations. Our results suggest that patterns of coevolution may be broadly useful for predicting protein interactions in a variety of biological processes.
Collapse
|
50
|
Zahiri J, Bozorgmehr JH, Masoudi-Nejad A. Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources. Curr Genomics 2014; 14:397-414. [PMID: 24396273 PMCID: PMC3861891 DOI: 10.2174/1389202911314060004] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 08/07/2013] [Accepted: 08/26/2013] [Indexed: 01/15/2023] Open
Abstract
Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability.
Collapse
Affiliation(s)
- Javad Zahiri
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Joseph Hannon Bozorgmehr
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| |
Collapse
|