1
|
Sefer E. Biocode: A Data-Driven Procedure to Learn the Growth of Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; PP:1-1. [PMID: 35380966 DOI: 10.1109/tcbb.2022.3165092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Probabilistic biological network growth models have been utilized for many tasks including but not limited to capturing mechanism and dynamics of biological growth activities, null model representation, capturing anomalies, etc. Well-known examples of these probabilistic models are Kronecker model, preferential attachment model, and duplication-based model. However, we should frequently keep developing new models to better fit and explain the observed network features while new networks are being observed. Additionally, it is difficult to develop a growth model each time we study a new network. In this paper, we propose Biocode, a framework to automatically discover novel biological growth models matching user-specified graph attributes in directed and undirected biological graphs. Biocode designs a basic set of instructions which are common enough to model a number of well-known biological graph growth models. We combine such instruction-wise representation with a genetic algorithm based optimization procedure to encode models for various biological networks. We mainly evaluate the performance of Biocode in discovering models for biological collaboration networks, gene regulatory networks, and protein interaction networks which features such as assortativity, clustering coefficient, degree distribution closely match with the true ones in the corresponding real biological networks.
Collapse
|
2
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
3
|
Heger P, Zheng W, Rottmann A, Panfilio KA, Wiehe T. The genetic factors of bilaterian evolution. eLife 2020; 9:e45530. [PMID: 32672535 PMCID: PMC7535936 DOI: 10.7554/elife.45530] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/03/2020] [Indexed: 12/13/2022] Open
Abstract
The Cambrian explosion was a unique animal radiation ~540 million years ago that produced the full range of body plans across bilaterians. The genetic mechanisms underlying these events are unknown, leaving a fundamental question in evolutionary biology unanswered. Using large-scale comparative genomics and advanced orthology evaluation techniques, we identified 157 bilaterian-specific genes. They include the entire Nodal pathway, a key regulator of mesoderm development and left-right axis specification; components for nervous system development, including a suite of G-protein-coupled receptors that control physiology and behaviour, the Robo-Slit midline repulsion system, and the neurotrophin signalling system; a high number of zinc finger transcription factors; and novel factors that previously escaped attention. Contradicting the current view, our study reveals that genes with bilaterian origin are robustly associated with key features in extant bilaterians, suggesting a causal relationship.
Collapse
Affiliation(s)
- Peter Heger
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Wen Zheng
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Anna Rottmann
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Kristen A Panfilio
- Institute for Zoology: Developmental Biology, Cologne Biocenter, University of CologneCologneGermany
- School of Life Sciences, University of Warwick, Gibbet Hill CampusCoventryUnited Kingdom
| | - Thomas Wiehe
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| |
Collapse
|
4
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
5
|
Woo HM, Jeong H, Yoon BJ. NAPAbench 2: A network synthesis algorithm for generating realistic protein-protein interaction (PPI) network families. PLoS One 2020; 15:e0227598. [PMID: 31986158 PMCID: PMC6984706 DOI: 10.1371/journal.pone.0227598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Comparative network analysis provides effective computational means for gaining novel insights into the structural and functional compositions of biological networks. In recent years, various methods have been developed for biological network alignment, whose main goal is to identify important similarities and critical differences between networks in terms of their topology and composition. A major impediment to advancing network alignment techniques has been the lack of gold-standard benchmarks that can be used for accurate and comprehensive performance assessment of such algorithms. The original NAPAbench (network alignment performance assessment benchmark) was developed to address this problem, and it has been widely utilized by many researchers for the development, evaluation, and comparison of novel network alignment techniques. In this work, we introduce NAPAbench 2-a major update of the original NAPAbench that was introduced in 2012. NAPAbench 2 includes a completely redesigned network synthesis algorithm that can generate protein-protein interaction (PPI) network families whose characteristics closely match those of the latest real PPI networks. Furthermore, the network synthesis algorithm comes with an intuitive GUI that allows users to easily generate PPI network families with an arbitrary number of networks of any size, according to a flexible user-defined phylogeny. In addition, NAPAbench 2 provides updated benchmark datasets-created using the redesigned network synthesis algorithm-which can be used for comprehensive performance assessment of network alignment algorithms and their scalability.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Hyundoo Jeong
- Department of Mechatronics Engineering, Incheon National University, Incheon, Republic of Korea
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, United States of America
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States of America
- * E-mail:
| |
Collapse
|
6
|
Defoort J, Van de Peer Y, Vermeirssen V. Function, dynamics and evolution of network motif modules in integrated gene regulatory networks of worm and plant. Nucleic Acids Res 2019; 46:6480-6503. [PMID: 29873777 PMCID: PMC6061849 DOI: 10.1093/nar/gky468] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 05/14/2018] [Indexed: 12/29/2022] Open
Abstract
Gene regulatory networks (GRNs) consist of different molecular interactions that closely work together to establish proper gene expression in time and space. Especially in higher eukaryotes, many questions remain on how these interactions collectively coordinate gene regulation. We study high quality GRNs consisting of undirected protein–protein, genetic and homologous interactions, and directed protein–DNA, regulatory and miRNA–mRNA interactions in the worm Caenorhabditis elegans and the plant Arabidopsis thaliana. Our data-integration framework integrates interactions in composite network motifs, clusters these in biologically relevant, higher-order topological network motif modules, overlays these with gene expression profiles and discovers novel connections between modules and regulators. Similar modules exist in the integrated GRNs of worm and plant. We show how experimental or computational methodologies underlying a certain data type impact network topology. Through phylogenetic decomposition, we found that proteins of worm and plant tend to functionally interact with proteins of a similar age, while at the regulatory level TFs favor same age, but also older target genes. Despite some influence of the duplication mode difference, we also observe at the motif and module level for both species a preference for age homogeneity for undirected and age heterogeneity for directed interactions. This leads to a model where novel genes are added together to the GRNs in a specific biological functional context, regulated by one or more TFs that also target older genes in the GRNs. Overall, we detected topological, functional and evolutionary properties of GRNs that are potentially universal in all species.
Collapse
Affiliation(s)
- Jonas Defoort
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Vanessa Vermeirssen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium
| |
Collapse
|
7
|
Jain A, Perisa D, Fliedner F, von Haeseler A, Ebersberger I. The Evolutionary Traceability of a Protein. Genome Biol Evol 2019; 11:531-545. [PMID: 30649284 PMCID: PMC6394115 DOI: 10.1093/gbe/evz008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2019] [Indexed: 12/12/2022] Open
Abstract
Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. However, the similarity between orthologs decays with time, and ultimately it becomes insufficient to infer common ancestry. This leaves ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the “evolutionary traceability” as a measure that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. Using yeast, we show that genes that were thought to date back to the last universal common ancestor are of high traceability. Their functions mostly involve catalysis, ion transport, and ribonucleoprotein complex assembly. In turn, the fraction of yeast genes whose traceability is not sufficient to infer their presence in last universal common ancestor is enriched for regulatory functions. Computing the traceabilities of genes that have been experimentally characterized as being essential for a self-replicating cell reveals that many of the genes that lack orthologs outside bacteria have low traceability. This leaves open whether their orthologs in the eukaryotic and archaeal domains have been overlooked. Looking at the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and nondetection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks. “protTrace,” a software tool for computing evolutionary traceability, is freely available at https://github.com/BIONF/protTrace.git; last accessed February 10, 2019.
Collapse
Affiliation(s)
- Arpit Jain
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Dominik Perisa
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Fabian Fliedner
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University Vienna, Austria.,Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Austria
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Center (BiK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| |
Collapse
|
8
|
Banerjee S, Chakraborty S. Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018; 13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| | | |
Collapse
|
9
|
Jeon H, Kim SR, Yoo YJ. Topological properties of protein interaction network and phylogenetic age of proteins. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2017:1761-1768. [DOI: 10.1109/bibm.2017.8217927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
10
|
Mohammadi S, Gleich DF, Kolda TG, Grama A. Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1446-1458. [PMID: 27483461 DOI: 10.1109/tcbb.2016.2595583] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network alignment has extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provides a new objective function that maximizes the number of aligned substructures. This objective function corresponds to an integer programming problem, which is NP-hard. Consequently, we identify a closely related surrogate function whose maximization results in a tensor eigenvector problem. Based on this formulation, we present an algorithm called Triangular AlignMEnt (TAME), which attempts to maximize the number of aligned triangles across networks. Using a case study on the NAPAbench dataset, we show that triangular alignment is capable of producing mappings with high node correctness. We further evaluate our method by aligning yeast and human interactomes. Our results indicate that TAME outperforms the state-of-art alignment methods in terms of conserved triangles. In addition, we show that the number of conserved triangles is more significantly correlated, compared to the conserved edge, with node correctness and co-expression of edges. Our formulation and resulting algorithms can be easily extended to arbitrary motifs.
Collapse
|
11
|
Holland DO, Shapiro BH, Xue P, Johnson ME. Protein-protein binding selectivity and network topology constrain global and local properties of interface binding networks. Sci Rep 2017; 7:5631. [PMID: 28717235 PMCID: PMC5514078 DOI: 10.1038/s41598-017-05686-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 06/01/2017] [Indexed: 01/30/2023] Open
Abstract
Protein-protein interactions networks (PPINs) are known to share a highly conserved structure across all organisms. What is poorly understood, however, is the structure of the child interface interaction networks (IINs), which map the binding sites proteins use for each interaction. In this study we analyze four independently constructed IINs from yeast and humans and find a conserved structure of these networks with a unique topology distinct from the parent PPIN. Using an IIN sampling algorithm and a fitness function trained on the manually curated PPINs, we show that IIN topology can be mostly explained as a balance between limits on interface diversity and a need for physico-chemical binding complementarity. This complementarity must be optimized both for functional interactions and against mis-interactions, and this selectivity is encoded in the IIN motifs. To test whether the parent PPIN shapes IINs, we compared optimal IINs in biological PPINs versus random PPINs. We found that the hubs in biological networks allow for selective binding with minimal interfaces, suggesting that binding specificity is an additional pressure for a scale-free-like PPIN. We confirm through phylogenetic analysis that hub interfaces are strongly conserved and rewiring of interactions between proteins involved in endocytosis preserves interface binding selectivity.
Collapse
Affiliation(s)
- David O Holland
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Benjamin H Shapiro
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Pei Xue
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Margaret E Johnson
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA.
| |
Collapse
|
12
|
Bauer R, Kaiser M. Nonlinear growth: an origin of hub organization in complex networks. ROYAL SOCIETY OPEN SCIENCE 2017; 4:160691. [PMID: 28405356 PMCID: PMC5383813 DOI: 10.1098/rsos.160691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 02/21/2017] [Indexed: 05/06/2023]
Abstract
Many real-world networks contain highly connected nodes called hubs. Hubs are often crucial for network function and spreading dynamics. However, classical models of how hubs originate during network development unrealistically assume that new nodes attain information about the connectivity (for example the degree) of existing nodes. Here, we introduce hub formation through nonlinear growth where the number of nodes generated at each stage increases over time and new nodes form connections independent of target node features. Our model reproduces variation in number of connections, hub occurrence time, and rich-club organization of networks ranging from protein-protein, neuronal and fibre tract brain networks to airline networks. Moreover, nonlinear growth gives a more generic representation of these networks compared with previous preferential attachment or duplication-divergence models. Overall, hub creation through nonlinear network expansion can serve as a benchmark model for studying the development of many real-world networks.
Collapse
Affiliation(s)
- Roman Bauer
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Interdisciplinary Computing and Complex BioSystems Research Group (ICOS), School of Computing Science, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Marcus Kaiser
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Interdisciplinary Computing and Complex BioSystems Research Group (ICOS), School of Computing Science, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| |
Collapse
|
13
|
Shui Y, Cho YR. Alignment of PPI Networks Using Semantic Similarity for Conserved Protein Complex Prediction. IEEE Trans Nanobioscience 2017; 15:380-389. [PMID: 28113907 DOI: 10.1109/tnb.2016.2555802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Network alignment is a computational technique to identify topological similarity of graph data by mapping link patterns. In bioinformatics, network alignment algorithms have been applied to protein-protein interaction (PPI) networks to discover evolutionarily conserved substructures at the system level. In particular, local network alignment of PPI networks searches for conserved functional components between species and predicts unknown protein complexes and signaling pathways. In this article, we present a novel approach of local network alignment by semantic mapping. While most previous methods find protein matches between species by sequence homology, our approach uses semantic similarity. Given Gene Ontology (GO) and its annotation data, we estimate functional closeness between two proteins by measuring their semantic similarity. We adopted a new semantic similarity measure, simVICD, which has the best performance for PPI validation and functional match. We tested alignment between the PPI networks of well-studied yeast protein complexes and the genome-wide PPI network of human in order to predict human protein complexes. The experimental results demonstrate that our approach has higher accuracy in protein complex prediction than graph clustering algorithms, and higher efficiency than previous network alignment algorithms.
Collapse
|
14
|
Patra B, Kon Y, Yadav G, Sevold AW, Frumkin JP, Vallabhajosyula RR, Hintze A, Østman B, Schossau J, Bhan A, Marzolf B, Tamashiro JK, Kaur A, Baliga NS, Grayhack EJ, Adami C, Galas DJ, Raval A, Phizicky EM, Ray A. A genome wide dosage suppressor network reveals genomic robustness. Nucleic Acids Res 2016; 45:255-270. [PMID: 27899637 PMCID: PMC5224485 DOI: 10.1093/nar/gkw1148] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 10/17/2016] [Accepted: 11/07/2016] [Indexed: 01/17/2023] Open
Abstract
Genomic robustness is the extent to which an organism has evolved to withstand the effects of deleterious mutations. We explored the extent of genomic robustness in budding yeast by genome wide dosage suppressor analysis of 53 conditional lethal mutations in cell division cycle and RNA synthesis related genes, revealing 660 suppressor interactions of which 642 are novel. This collection has several distinctive features, including high co-occurrence of mutant-suppressor pairs within protein modules, highly correlated functions between the pairs and higher diversity of functions among the co-suppressors than previously observed. Dosage suppression of essential genes encoding RNA polymerase subunits and chromosome cohesion complex suggests a surprising degree of functional plasticity of macromolecular complexes, and the existence of numerous degenerate pathways for circumventing the effects of potentially lethal mutations. These results imply that organisms and cancer are likely able to exploit the genomic robustness properties, due the persistence of cryptic gene and pathway functions, to generate variation and adapt to selective pressures.
Collapse
Affiliation(s)
- Biranchi Patra
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Yoshiko Kon
- Department of Biochemistry, University of Rochester School of Medicine, Rochester, NY 14627, USA
| | - Gitanjali Yadav
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA.,National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Anthony W Sevold
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Jesse P Frumkin
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | | | - Arend Hintze
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Bjørn Østman
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Jory Schossau
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Ashish Bhan
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - Bruz Marzolf
- Institute for Systems Biology, 1441 N 34th St, Seattle, WA 98103, USA
| | | | - Amardeep Kaur
- Institute for Systems Biology, 1441 N 34th St, Seattle, WA 98103, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 1441 N 34th St, Seattle, WA 98103, USA
| | - Elizabeth J Grayhack
- Department of Biochemistry, University of Rochester School of Medicine, Rochester, NY 14627, USA
| | - Christoph Adami
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
| | - David J Galas
- Institute for Systems Biology, 1441 N 34th St, Seattle, WA 98103, USA
| | - Alpan Raval
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA.,Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711, USA
| | - Eric M Phizicky
- Department of Biochemistry, University of Rochester School of Medicine, Rochester, NY 14627, USA
| | - Animesh Ray
- Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA .,Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
15
|
Hashemifar S, Huang Q, Xu J. Joint Alignment of Multiple Protein–Protein Interaction Networks via Convex Optimization. J Comput Biol 2016; 23:903-911. [DOI: 10.1089/cmb.2016.0025] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Qixing Huang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
16
|
Elmsallati A, Clark C, Kalita J. Global Alignment of Protein-Protein Interaction Networks: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:689-705. [PMID: 26336140 DOI: 10.1109/tcbb.2015.2474391] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Protein-protein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Collapse
|
17
|
Abstract
Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene’s age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use 13 orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (geneages.org).
Collapse
Affiliation(s)
- Benjamin J Liebeskind
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin Center for Computational Biology and Bioinformatics, University of Texas at Austin
| | - Claire D McWhite
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| |
Collapse
|
18
|
Liang C, Luo J, Song D. Network simulation reveals significant contribution of network motifs to the age-dependency of yeast protein-protein interaction networks. MOLECULAR BIOSYSTEMS 2015; 10:2277-88. [PMID: 24964354 DOI: 10.1039/c4mb00230j] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Advances in proteomic technologies combined with sophisticated computing and modeling methods have generated an unprecedented amount of high-throughput data for system-scale analysis. As a result, the study of protein-protein interaction (PPI) networks has garnered much attention in recent years. One of the most fundamental problems in studying PPI networks is to understand how their architecture originated and evolved to their current state. By investigating how proteins of different ages are connected in the yeast PPI networks, one can deduce their expansion procedure in evolution and how the ancient primitive network expanded and evolved. Studies have shown that proteins are often connected to other proteins of a similar age, suggesting a high degree of age preference between interacting proteins. Though several theories have been proposed to explain this phenomenon, none of them considered protein-clusters as a contributing factor. Here we first investigate the age-dependency of the proteins from the perspective of network motifs. Our analysis confirms that proteins of the same age groups tend to form interacting network motifs; furthermore, those proteins within motifs tend to be within protein complexes and the interactions among them largely contribute to the observed age preference in the yeast PPI networks. In light of these results, we describe a new modeling approach, based on "network motifs", whereby topologically connected protein clusters in the network are treated as single evolutionary units. Instead of modeling single proteins, our approach models the connections and evolutionary relationships of multiple related protein clusters or "network motifs" that are collectively integrated into an existing PPI network. Through simulation studies, we found that the "network motif" modeling approach can capture yeast PPI network properties better than if individual proteins were considered to be the simplest evolutionary units. Our approach provides a fresh perspective on modeling the evolution of yeast PPI networks, specifically that PPI networks may have a much higher age-dependency of interaction density than had been previously envisioned.
Collapse
Affiliation(s)
- Cheng Liang
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China.
| | | | | |
Collapse
|
19
|
Brion C, Pflieger D, Friedrich A, Schacherer J. Evolution of intraspecific transcriptomic landscapes in yeasts. Nucleic Acids Res 2015; 43:4558-68. [PMID: 25897111 PMCID: PMC4482089 DOI: 10.1093/nar/gkv363] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/02/2015] [Indexed: 01/13/2023] Open
Abstract
Variations in gene expression have been widely explored in order to obtain an accurate overview of the changes in regulatory networks that underlie phenotypic diversity. Numerous studies have characterized differences in genomic expression between large numbers of individuals of model organisms such as Saccharomyces cerevisiae. To more broadly survey the evolution of the transcriptomic landscape across species, we measured whole-genome expression in a large collection of another yeast species: Lachancea kluyveri (formerly Saccharomyces kluyveri), using RNAseq. Interestingly, this species diverged from the S. cerevisiae lineage prior to its ancestral whole genome duplication. Moreover, L. kluyveri harbors a chromosome-scale compositional heterogeneity due to a 1-Mb ancestral introgressed region as well as a large set of unique unannotated genes. In this context, our comparative transcriptomic analysis clearly showed a link between gene evolutionary history and expression behavior. Indeed, genes that have been recently acquired or under function relaxation tend to be less transcribed show a higher intraspecific variation (plasticity) and are less involved in network (connectivity). Moreover, utilizing this approach in L. kluyveri also highlighted specific regulatory network signatures in aerobic respiration, amino-acid biosynthesis and glycosylation, presumably due to its different lifestyle. Our data set sheds an important light on the evolution of intraspecific transcriptomic variation across distant species.
Collapse
Affiliation(s)
- Christian Brion
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - David Pflieger
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - Anne Friedrich
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - Joseph Schacherer
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| |
Collapse
|
20
|
From local to global changes in proteins: a network view. Curr Opin Struct Biol 2015; 31:1-8. [DOI: 10.1016/j.sbi.2015.02.015] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 02/15/2015] [Accepted: 02/26/2015] [Indexed: 02/01/2023]
|
21
|
Conant GC. Structure, Interaction, and Evolution: Reflections on the Natural History of Proteins. EVOLUTIONARY BIOLOGY: BIODIVERSIFICATION FROM GENOTYPE TO PHENOTYPE 2015:187-201. [DOI: 10.1007/978-3-319-19932-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
22
|
Popadin K, Gutierrez-Arcelus M, Lappalainen T, Buil A, Steinberg J, Nikolaev S, Lukowski S, Bazykin G, Seplyarskiy V, Ioannidis P, Zdobnov E, Dermitzakis E, Antonarakis S. Gene age predicts the strength of purifying selection acting on gene expression variation in humans. Am J Hum Genet 2014; 95:660-74. [PMID: 25480033 DOI: 10.1016/j.ajhg.2014.11.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 11/10/2014] [Indexed: 10/24/2022] Open
Abstract
Gene expression levels can be subject to selection. We hypothesized that the age of gene origin is associated with expression constraints, given that it affects the level of gene integration into the functional cellular environment. By studying the genetic variation affecting gene expression levels (cis expression quantitative trait loci [cis-eQTLs]) and protein levels (cis protein QTLs [cis-pQTLs]), we determined that young, primate-specific genes are enriched in cis-eQTLs and cis-pQTLs. Compared to cis-eQTLs of old genes originating before the zebrafish divergence, cis-eQTLs of young genes have a higher effect size, are located closer to the transcription start site, are more significant, and tend to influence genes in multiple tissues and populations. These results suggest that the expression constraint of each gene increases throughout its lifespan. We also detected a positive correlation between expression constraints (approximated by cis-eQTL properties) and coding constraints (approximated by Ka/Ks) and observed that this correlation might be driven by gene age. To uncover factors associated with the increase in gene-age-related expression constraints, we demonstrated that gene connectivity, gene involvement in complex regulatory networks, gene haploinsufficiency, and the strength of posttranscriptional regulation increase with gene age. We also observed an increase in heritability of gene expression levels with age, implying a reduction of the environmental component. In summary, we show that gene age shapes key gene properties during evolution and is therefore an important component of genome function.
Collapse
|
23
|
Dissecting the human protein-protein interaction network via phylogenetic decomposition. Sci Rep 2014; 4:7153. [PMID: 25412639 PMCID: PMC4239568 DOI: 10.1038/srep07153] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 11/04/2014] [Indexed: 12/18/2022] Open
Abstract
The protein-protein interaction (PPI) network offers a conceptual framework for better understanding the functional organization of the proteome. However, the intricacy of network complexity complicates comprehensive analysis. Here, we adopted a phylogenic grouping method combined with force-directed graph simulation to decompose the human PPI network in a multi-dimensional manner. This network model enabled us to associate the network topological properties with evolutionary and biological implications. First, we found that ancient proteins occupy the core of the network, whereas young proteins tend to reside on the periphery. Second, the presence of age homophily suggests a possible selection pressure may have acted on the duplication and divergence process during the PPI network evolution. Lastly, functional analysis revealed that each age group possesses high specificity of enriched biological processes and pathway engagements, which could correspond to their evolutionary roles in eukaryotic cells. More interestingly, the network landscape closely coincides with the subcellular localization of proteins. Together, these findings suggest the potential of using conceptual frameworks to mimic the true functional organization in a living cell.
Collapse
|
24
|
Abstract
Ongoing improvements in Computational Biology research have generated massive amounts of Protein–Protein Interactions (PPIs) dataset. In this regard, the availability of PPI data for several organisms provoke the discovery of computational methods for measurements, analysis, modeling, comparisons, clustering and alignments of biological data networks. Nevertheless, fixed network comparison is computationally stubborn and as a result several methods have been used instead. We illustrate a probabilistic approach among proteins nodes that are part of various networks by using Chapman–Kolmogorov (CK) formula. We have compared CK formula with semi-Markov random method, SMETANA. We significantly noticed that CK outperforms the SMETANA in all respects such as efficiency, speed, space and complexity. We have modified the SMETANA source codes available in MATLAB in the light of CK formula. Discriminant-Expectation Maximization (D-EM) accesses the parameters of a protein network datasets and determines a linear transformation to simplify the assumption of probabilistic format of data distributions and find good features dynamically. Our implementation finds that D-EM has a satisfactory performance in protein network alignment applications.
Collapse
Affiliation(s)
- Md. Sarwar Kamal
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh
| | - Mohammad Ibrahim Khan
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh
| |
Collapse
|
25
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
26
|
Micale G, Pulvirenti A, Giugno R, Ferro A. GASOLINE: a Greedy And Stochastic algorithm for optimal Local multiple alignment of Interaction NEtworks. PLoS One 2014; 9:e98750. [PMID: 24911103 PMCID: PMC4049608 DOI: 10.1371/journal.pone.0098750] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/07/2014] [Indexed: 11/19/2022] Open
Abstract
The analysis of structure and dynamics of biological networks plays a central role in understanding the intrinsic complexity of biological systems. Biological networks have been considered a suitable formalism to extend evolutionary and comparative biology. In this paper we present GASOLINE, an algorithm for multiple local network alignment based on statistical iterative sampling in connection to a greedy strategy. GASOLINE overcomes the limits of current approaches by producing biologically significant alignments within a feasible running time, even for very large input instances. The method has been extensively tested on a database of real and synthetic biological networks. A comprehensive comparison with state-of-the art algorithms clearly shows that GASOLINE yields the best results in terms of both reliability of alignments and running time on real biological networks and results comparable in terms of quality of alignments on synthetic networks. GASOLINE has been developed in Java, and is available, along with all the computed alignments, at the following URL: http://ferrolab.dmi.unict.it/gasoline/gasoline.html.
Collapse
Affiliation(s)
- Giovanni Micale
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
- * E-mail:
| | - Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| |
Collapse
|
27
|
Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics 2014; 30:2351-9. [PMID: 24794929 DOI: 10.1093/bioinformatics/btu307] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION As biological inquiry produces ever more network data, such as protein-protein interaction networks, gene regulatory networks and metabolic networks, many algorithms have been proposed for the purpose of pairwise network alignment-finding a mapping from the nodes of one network to the nodes of another in such a way that the mapped nodes can be considered to correspond with respect to both their place in the network topology and their biological attributes. This technique is helpful in identifying previously undiscovered homologies between proteins of different species and revealing functionally similar subnetworks. In the past few years, a wealth of different aligners has been published, but few of them have been compared with one another, and no comprehensive review of these algorithms has yet appeared. RESULTS We present the problem of biological network alignment, provide a guide to existing alignment algorithms and comprehensively benchmark existing algorithms on both synthetic and real-world biological data, finding dramatic differences between existing algorithms in the quality of the alignments they produce. Additionally, we find that many of these tools are inconvenient to use in practice, and there remains a need for easy-to-use cross-platform tools for performing network alignment.
Collapse
Affiliation(s)
- Connor Clark
- Department of Computer Science, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA
| | - Jugal Kalita
- Department of Computer Science, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA
| |
Collapse
|
28
|
Model the evolution of protein interaction network assisted with protein age. J Theor Biol 2013; 333:10-7. [DOI: 10.1016/j.jtbi.2013.05.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 04/24/2013] [Accepted: 05/07/2013] [Indexed: 11/21/2022]
|
29
|
Capra JA, Stolzer M, Durand D, Pollard KS. How old is my gene? Trends Genet 2013; 29:659-68. [PMID: 23915718 DOI: 10.1016/j.tig.2013.07.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 06/13/2013] [Accepted: 07/03/2013] [Indexed: 11/26/2022]
Abstract
Gene functions, interactions, disease associations, and ecological distributions are all correlated with gene age. However, it is challenging to estimate the intricate series of evolutionary events leading to a modern-day gene and then to reduce this history to a single age estimate. Focusing on eukaryotic gene families, we introduce a framework that can be used to compare current strategies for quantifying gene age, discuss key differences between these methods, and highlight several common problems. We argue that genes with complex evolutionary histories do not have a single well-defined age. As a result, care must be taken to articulate the goals and assumptions of any analysis that uses gene age estimates. Recent algorithmic advances offer the promise of gene age estimates that are fast, accurate, and consistent across gene families. This will enable a shift to integrated genome-wide analyses of all events in gene evolutionary histories in the near future.
Collapse
Affiliation(s)
- John A Capra
- Center for Human Genetics Research and Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA
| | | | | | | |
Collapse
|
30
|
Sahraeian SME, Yoon BJ. SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLoS One 2013; 8:e67995. [PMID: 23874484 PMCID: PMC3710069 DOI: 10.1371/journal.pone.0067995] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 05/24/2013] [Indexed: 12/03/2022] Open
Abstract
In this paper we introduce an efficient algorithm for alignment of multiple large-scale biological networks. In this scheme, we first compute a probabilistic similarity measure between nodes that belong to different networks using a semi-Markov random walk model. The estimated probabilities are further enhanced by incorporating the local and the cross-species network similarity information through the use of two different types of probabilistic consistency transformations. The transformed alignment probabilities are used to predict the alignment of multiple networks based on a greedy approach. We demonstrate that the proposed algorithm, called SMETANA, outperforms many state-of-the-art network alignment techniques, in terms of computational efficiency, alignment accuracy, and scalability. Our experiments show that SMETANA can easily align tens of genome-scale networks with thousands of nodes on a personal computer without any difficulty. The source code of SMETANA is available upon request. The source code of SMETANA can be downloaded from http://www.ece.tamu.edu/~bjyoon/SMETANA/.
Collapse
Affiliation(s)
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A & M University, College Station, Texas, United States of America
- * E-mail:
| |
Collapse
|
31
|
Reece-Hoyes JS, Pons C, Diallo A, Mori A, Shrestha S, Kadreppa S, Nelson J, Diprima S, Dricot A, Lajoie BR, Ribeiro PSM, Weirauch MT, Hill DE, Hughes TR, Myers CL, Walhout AJM. Extensive rewiring and complex evolutionary dynamics in a C. elegans multiparameter transcription factor network. Mol Cell 2013; 51:116-27. [PMID: 23791784 DOI: 10.1016/j.molcel.2013.05.018] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 03/28/2013] [Accepted: 05/15/2013] [Indexed: 10/26/2022]
Abstract
Gene duplication results in two identical paralogs that diverge through mutation, leading to loss or gain of interactions with other biomolecules. Here, we comprehensively characterize such network rewiring for C. elegans transcription factors (TFs) within and across four newly delineated molecular networks. Remarkably, we find that even highly similar TFs often have different interaction degrees and partners. In addition, we find that most TF families have a member that is highly connected in multiple networks. Further, different TF families have opposing correlations between network connectivity and phylogenetic age, suggesting that they are subject to different evolutionary pressures. Finally, TFs that have similar partners in one network generally do not in another, indicating a lack of pressure to retain cross-network similarity. Our multiparameter analyses provide unique insights into the evolutionary dynamics that shaped TF networks.
Collapse
|
32
|
A kinetic model of the evolution of a protein interaction network. BMC Genomics 2013; 14:172. [PMID: 23497092 PMCID: PMC3751699 DOI: 10.1186/1471-2164-14-172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Accepted: 03/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Known protein interaction networks have very particular properties. Old proteins tend to have more interactions than new ones. One of the best statistical representatives of this property is the node degree distribution (distribution of proteins having a given number of interactions). It has previously been shown that this distribution is very close to the sum of two distinct exponential components. In this paper, we asked: What are the possible mechanisms of evolution for such types of networks? To answer this question, we tested a kinetic model for simplified evolution of a protein interactome. Our proposed model considers the emergence of new genes and interactions and the loss of old ones. We assumed that there are generally two coexisting classes of proteins. Proteins constituting the first class are essential only for ecological adaptations and are easily lost when ecological conditions change. Proteins of the second class are essential for basic life processes and, hence, are always effectively protected against deletion. All proteins can transit between the above classes in both directions. We also assumed that the phenomenon of gene duplication is always related to ecological adaptation and that a new copy of a duplicated gene is not essential. According to this model, all proteins gain new interactions with a rate that preferentially increases with the number of interactions (the rich get richer). Proteins can also gain interactions because of duplication. Proteins lose their interactions both with and without the loss of partner genes. Results The proposed model reproduces the main properties of protein-protein interaction networks very well. The connectivity of the oldest part of the interaction network is densest, and the node degree distribution follows the sum of two shifted power-law functions, which is a theoretical generalization of the previous finding. The above distribution covers the wide range of values of node degrees very well, much better than a power law or generalized power law supplemented with an exponential cut-off. The presented model also relates the total number of interactome links to the total number of interacting proteins. The theoretical results were for the interactomes of A. thaliana, B. taurus, C. elegans, D. melanogaster, E. coli, H. pylori, H. sapiens, M. musculus, R. norvegicus and S. cerevisiae. Conclusions Using these approaches, the kinetic parameters could be estimated. Finally, the model revealed the evolutionary kinetics of proteome formation, the phenomenon of protein differentiation and the process of gaining new interactions.
Collapse
|
33
|
Ferreira RM, Rybarczyk-Filho JL, Dalmolin RJS, Castro MAA, Moreira JCF, Brunnet LG, de Almeida RMC. Preferential duplication of intermodular hub genes: an evolutionary signature in eukaryotes genome networks. PLoS One 2013; 8:e56579. [PMID: 23468868 PMCID: PMC3582557 DOI: 10.1371/journal.pone.0056579] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Accepted: 01/14/2013] [Indexed: 12/31/2022] Open
Abstract
Whole genome protein-protein association networks are not random and their topological properties stem from genome evolution mechanisms. In fact, more connected, but less clustered proteins are related to genes that, in general, present more paralogs as compared to other genes, indicating frequent previous gene duplication episodes. On the other hand, genes related to conserved biological functions present few or no paralogs and yield proteins that are highly connected and clustered. These general network characteristics must have an evolutionary explanation. Considering data from STRING database, we present here experimental evidence that, more than not being scale free, protein degree distributions of organisms present an increased probability for high degree nodes. Furthermore, based on this experimental evidence, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated with a probability that linearly grows with gene degree and decreases with its clustering coefficient. For the first time a model yields results that simultaneously describe different topological distributions. Also, this model correctly predicts that, to produce protein-protein association networks with number of links and number of nodes in the observed range for Eukaryotes, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. This scenario implies a universal mechanism for genome evolution.
Collapse
Affiliation(s)
- Ricardo M. Ferreira
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | - Rodrigo J. S. Dalmolin
- Departamento de Bioquímica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Mauro A. A. Castro
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- National Institute of Science and Technology for Complex Systems, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - José C. F. Moreira
- Departamento de Bioquímica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Leonardo G. Brunnet
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Rita M. C. de Almeida
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- National Institute of Science and Technology for Complex Systems, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- * E-mail:
| |
Collapse
|
34
|
Liu Z, Guo F, Zhang J, Wang J, Lu L, Li D, He F. Proteome-wide prediction of self-interacting proteins based on multiple properties. Mol Cell Proteomics 2013; 12:1689-700. [PMID: 23422585 DOI: 10.1074/mcp.m112.021790] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This work helps us understand the role self-interacting proteins play in cellular functions from an overall perspective, and the constructed prediction model may contribute to the high throughput finding of self-interacting proteins and provide clues for elucidating their functions.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | | | | | | | | | | | | |
Collapse
|
35
|
Pérez-Bercoff Å, Hudson CM, Conant GC. A conserved mammalian protein interaction network. PLoS One 2013; 8:e52581. [PMID: 23320073 PMCID: PMC3539715 DOI: 10.1371/journal.pone.0052581] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 11/20/2012] [Indexed: 11/19/2022] Open
Abstract
Physical interactions between proteins mediate a variety of biological functions, including signal transduction, physical structuring of the cell and regulation. While extensive catalogs of such interactions are known from model organisms, their evolutionary histories are difficult to study given the lack of interaction data from phylogenetic outgroups. Using phylogenomic approaches, we infer a upper bound on the time of origin for a large set of human protein-protein interactions, showing that most such interactions appear relatively ancient, dating no later than the radiation of placental mammals. By analyzing paired alignments of orthologous and putatively interacting protein-coding genes from eight mammals, we find evidence for weak but significant co-evolution, as measured by relative selective constraint, between pairs of genes with interacting proteins. However, we find no strong evidence for shared instances of directional selection within an interacting pair. Finally, we use a network approach to show that the distribution of selective constraint across the protein interaction network is non-random, with a clear tendency for interacting proteins to share similar selective constraints. Collectively, the results suggest that, on the whole, protein interactions in mammals are under selective constraint, presumably due to their functional roles.
Collapse
Affiliation(s)
- Åsa Pérez-Bercoff
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
| | - Corey M. Hudson
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
| | - Gavin C. Conant
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
36
|
Rito T, Deane CM, Reinert G. The importance of age and high degree, in protein-protein interaction networks. J Comput Biol 2012; 19:785-95. [PMID: 22697248 DOI: 10.1089/cmb.2012.0054] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Here we present an in-depth analysis of the protein age patterns found in the edge and triangle subgraphs of the yeast protein-protein interaction network (PIN). We assess their statistical significance both according to what would be expected by chance given the node frequencies found in the yeast PIN, and also, for the case of triangles, given the age frequencies observed in the currently available pairwise data. We find that pairwise interactions between Old proteins are over-represented even when controlling for high degree, and triangle interactions between Old proteins are over-represented even when controlling for pairwise interaction frequencies. There is evidence for negative selection of interactions between Middle-aged and Old proteins within triangles, despite pairwise Middle-Old interactions being common. Most triangles consist solely of vertices with high degree. Our findings point towards an architecture of the yeast PIN that is highly heterogeneous, having connected clumps which contain a large number of interacting Old proteins along with selective age-dependent interaction patterns. Supplementary Material is available online (www.liebertonline.com/cmb).
Collapse
Affiliation(s)
- Tiago Rito
- Department of Statistics, University of Oxford, Oxford United Kingdom.
| | | | | |
Collapse
|
37
|
Bottinelli A, Bassetti B, Lagomarsino MC, Gherardi M. Influence of homology and node age on the growth of protein-protein interaction networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:041919. [PMID: 23214627 DOI: 10.1103/physreve.86.041919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Indexed: 06/01/2023]
Abstract
Proteins participating in a protein-protein interaction network can be grouped into homology classes following their common ancestry. Proteins added to the network correspond to genes added to the classes, so the dynamics of the two objects are intrinsically linked. Here we first introduce a statistical model describing the joint growth of the network and the partitioning of nodes into classes, which is studied through a combined mean-field and simulation approach. We then employ this unified framework to address the specific issue of the age dependence of protein interactions through the definition of three different node wiring or divergence schemes. A comparison with empirical data indicates that an age-dependent divergence move is necessary in order to reproduce the basic topological observables together with the age correlation between interacting nodes visible in empirical data. We also discuss the possibility of nontrivial joint partition and topology observables.
Collapse
|
38
|
A network synthesis model for generating protein interaction network families. PLoS One 2012; 7:e41474. [PMID: 22912671 PMCID: PMC3418285 DOI: 10.1371/journal.pone.0041474] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 06/27/2012] [Indexed: 11/19/2022] Open
Abstract
In this work, we introduce a novel network synthesis model that can generate families of evolutionarily related synthetic protein-protein interaction (PPI) networks. Given an ancestral network, the proposed model generates the network family according to a hypothetical phylogenetic tree, where the descendant networks are obtained through duplication and divergence of their ancestors, followed by network growth using network evolution models. We demonstrate that this network synthesis model can effectively create synthetic networks whose internal and cross-network properties closely resemble those of real PPI networks. The proposed model can serve as an effective framework for generating comprehensive benchmark datasets that can be used for reliable performance assessment of comparative network analysis algorithms. Using this model, we constructed a large-scale network alignment benchmark, called NAPAbench, and evaluated the performance of several representative network alignment algorithms. Our analysis clearly shows the relative performance of the leading network algorithms, with their respective advantages and disadvantages. The algorithm and source code of the network synthesis model and the network alignment benchmark NAPAbench are publicly available at http://www.ece.tamu.edu/bjyoon/NAPAbench/.
Collapse
|
39
|
ProteinHistorian: tools for the comparative analysis of eukaryote protein origin. PLoS Comput Biol 2012; 8:e1002567. [PMID: 22761559 PMCID: PMC3386163 DOI: 10.1371/journal.pcbi.1002567] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Accepted: 05/06/2012] [Indexed: 12/04/2022] Open
Abstract
The evolutionary history of a protein reflects the functional history of its ancestors. Recent phylogenetic studies identified distinct evolutionary signatures that characterize proteins involved in cancer, Mendelian disease, and different ontogenic stages. Despite the potential to yield insight into the cellular functions and interactions of proteins, such comparative phylogenetic analyses are rarely performed, because they require custom algorithms. We developed ProteinHistorian to make tools for performing analyses of protein origins widely available. Given a list of proteins of interest, ProteinHistorian estimates the phylogenetic age of each protein, quantifies enrichment for proteins of specific ages, and compares variation in protein age with other protein attributes. ProteinHistorian allows flexibility in the definition of protein age by including several algorithms for estimating ages from different databases of evolutionary relationships. We illustrate the use of ProteinHistorian with three example analyses. First, we demonstrate that proteins with high expression in human, compared to chimpanzee and rhesus macaque, are significantly younger than those with human-specific low expression. Next, we show that human proteins with annotated regulatory functions are significantly younger than proteins with catalytic functions. Finally, we compare protein length and age in many eukaryotic species and, as expected from previous studies, find a positive, though often weak, correlation between protein age and length. ProteinHistorian is available through a web server with an intuitive interface and as a set of command line tools; this allows biologists and bioinformaticians alike to integrate these approaches into their analysis pipelines. ProteinHistorian's modular, extensible design facilitates the integration of new datasets and algorithms. The ProteinHistorian web server, source code, and pre-computed ages for 32 eukaryotic genomes are freely available under the GNU public license at http://lighthouse.ucsf.edu/ProteinHistorian/.
Collapse
|
40
|
Fokkens L, Hogeweg P, Snel B. Gene duplications contribute to the overrepresentation of interactions between proteins of a similar age. BMC Evol Biol 2012; 12:99. [PMID: 22732003 PMCID: PMC3457867 DOI: 10.1186/1471-2148-12-99] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Accepted: 06/07/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study of biological networks and how they have evolved is fundamental to our understanding of the cell. By investigating how proteins of different ages are connected in the protein interaction network, one can infer how that network has expanded in evolution, without the need for explicit reconstruction of ancestral networks. Studies that implement this approach show that proteins are often connected to proteins of a similar age, suggesting a simultaneous emergence of interacting proteins. There are several theories explaining this phenomenon, but despite the importance of gene duplication in genome evolution, none consider protein family dynamics as a contributing factor. RESULTS In an S. cerevisiae protein interaction network we investigate to what extent edges that arise from duplication events contribute to the observed tendency to interact with proteins of a similar age. We find that part of this tendency is explained by interactions between paralogs. Age is usually defined on the level of protein families, rather than individual proteins, hence paralogs have the same age. The major contribution however, is from interaction partners that are shared between paralogs. These interactions have most likely been conserved after a duplication event. To investigate to what extent a nearly neutral process of network growth can explain these results, we adjust a well-studied network growth model to incorporate protein families. Our model shows that the number of edges between paralogs can be amplified by subsequent duplication events, thus explaining the overrepresentation of interparalog edges in the data. The fact that interaction partners shared by paralogs are often of the same age as the paralogs does not arise naturally from our model and needs further investigation. CONCLUSION We amend previous theories that explain why proteins of a similar age prefer to interact by demonstrating that this observation can be partially explained by gene duplication events. There is an ongoing debate on whether the protein interaction network is predominantly shaped by duplication and subfunctionalization or whether network rewiring is most important. Our analyses of S. cerevisiae protein interaction networks demonstrate that duplications have influenced at least one property of the protein interaction network: how proteins of different ages are connected.
Collapse
Affiliation(s)
- Like Fokkens
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Padualaan 8, 3584CH, Utrecht, The Netherlands
| | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Padualaan 8, 3584CH, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Padualaan 8, 3584CH, Utrecht, The Netherlands
- Netherlands Consortium for Systems Biology (NCSB), c/o NISB Bureau, University of Amsterdam, Science Park 904, 1098XH, Amsterdam, The Netherlands
| |
Collapse
|
41
|
Zhao Y, Mooney SD. Functional organization and its implication in evolution of the human protein-protein interaction network. BMC Genomics 2012; 13:150. [PMID: 22530615 PMCID: PMC3375200 DOI: 10.1186/1471-2164-13-150] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 04/24/2012] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Based on the distinguishing properties of protein-protein interaction networks such as power-law degree distribution and modularity structure, several stochastic models for the evolution of these networks have been purposed, motivated by the idea that a validated model should reproduce similar topological properties of the empirical network. However, being able to capture topological properties does not necessarily mean it correctly reproduces how networks emerge and evolve. More importantly, there is already evidence suggesting functional organization and significance of these networks. The current stochastic models of evolution, however, grow the network without consideration for biological function and natural selection. RESULTS To test whether protein interaction networks are functionally organized and their impacts on the evolution of these networks, we analyzed their evolution at both the topological and functional level. We find that the human network is shown to be functionally organized, and its function evolves with the topological properties of the network. Our analysis suggests that function most likely affects local modularity of the network. Consistently, we further found that the topological unit is also the functional unit of the network. CONCLUSION We have demonstrated functional organization of a protein interaction network. Given our observations, we suggest that its significance should not be overlooked when studying network evolution.
Collapse
Affiliation(s)
- Yiqiang Zhao
- Buck Institute for Research on Aging, Novato, California, USA
| | | |
Collapse
|
42
|
Emmert-Streib F. Limitations of gene duplication models: evolution of modules in protein interaction networks. PLoS One 2012; 7:e35531. [PMID: 22530042 PMCID: PMC3329483 DOI: 10.1371/journal.pone.0035531] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 03/18/2012] [Indexed: 01/05/2023] Open
Abstract
It has been generally acknowledged that the module structure of protein interaction networks plays a crucial role with respect to the functional understanding of these networks. In this paper, we study evolutionary aspects of the module structure of protein interaction networks, which forms a mesoscopic level of description with respect to the architectural principles of networks. The purpose of this paper is to investigate limitations of well known gene duplication models by showing that these models are lacking crucial structural features present in protein interaction networks on a mesoscopic scale. This observation reveals our incomplete understanding of the structural evolution of protein networks on the module level.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom.
| |
Collapse
|
43
|
Chae L, Lee I, Shin J, Rhee SY. Towards understanding how molecular networks evolve in plants. CURRENT OPINION IN PLANT BIOLOGY 2012; 15:177-84. [PMID: 22280840 DOI: 10.1016/j.pbi.2012.01.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Revised: 12/20/2011] [Accepted: 01/05/2012] [Indexed: 05/02/2023]
Abstract
Residing beneath the phenotypic landscape of a plant are intricate and dynamic networks of genes and proteins. As evolution operates on phenotypes, we expect its forces to shape somehow these underlying molecular networks. In this review, we discuss progress being made to elucidate the nature of these forces and their impact on the composition and structure of molecular networks. We also outline current limitations and open questions facing the broader field of plant network analysis.
Collapse
Affiliation(s)
- Lee Chae
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
44
|
Sun MGF, Sikora M, Costanzo M, Boone C, Kim PM. Network evolution: rewiring and signatures of conservation in signaling. PLoS Comput Biol 2012; 8:e1002411. [PMID: 22438796 PMCID: PMC3305342 DOI: 10.1371/journal.pcbi.1002411] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 01/14/2012] [Indexed: 01/09/2023] Open
Abstract
The analysis of network evolution has been hampered by limited availability of protein interaction data for different organisms. In this study, we investigate evolutionary mechanisms in Src Homology 3 (SH3) domain and kinase interaction networks using high-resolution specificity profiles. We constructed and examined networks for 23 fungal species ranging from Saccharomyces cerevisiae to Schizosaccharomyces pombe. We quantify rates of different rewiring mechanisms and show that interaction change through binding site evolution is faster than through gene gain or loss. We found that SH3 interactions evolve swiftly, at rates similar to those found in phosphoregulation evolution. Importantly, we show that interaction changes are sufficiently rapid to exhibit saturation phenomena at the observed timescales. Finally, focusing on the SH3 interaction network, we observe extensive clustering of binding sites on target proteins by SH3 domains and a strong correlation between the number of domains that bind a target protein (target in-degree) and interaction conservation. The relationship between in-degree and interaction conservation is driven by two different effects, namely the number of clusters that correspond to interaction interfaces and the number of domains that bind to each cluster leads to sequence specific conservation, which in turn results in interaction conservation. In summary, we uncover several network evolution mechanisms likely to generalize across peptide recognition modules. Protein interaction networks control virtually all cellular processes. The rules governing their evolution have remained elusive, as comprehensive protein interaction data is available for only a small number of very distant species, making evolutionary network studies difficult. Here we attempt to overcome this limitation by computationally constructing protein interaction networks for 23 relatively tightly spaced yeast species. We focus on networks consisting of kinase and peptide binding domain interactions, which play central roles in signaling pathways. These networks enable us to investigate evolutionary network mechanisms. We are able, for the first time, to accurately quantify the contribution of different rewiring mechanisms. Interaction change appears to be mainly accomplished through binding site evolution rather than through gene gain or loss. This is in contrast to other evolutionary processes, where gene duplication or deletion is a major driving factor. Moreover, our analysis reveals that interaction changes are very fast – fast enough that the number of changes saturates, i.e., the actual rate of change has been strongly underestimated in previous studies. Our analysis also reveals different mechanisms by which certain interactions are conserved throughout evolution. Our results likely transfer to other species and networks, and will benefit future evolutionary studies of signaling pathways.
Collapse
Affiliation(s)
- Mark G. F. Sun
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Martin Sikora
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
- Institut de Biologia Evolutiva (UPF-CSIC), CEXS-UPF-PRBB, Barcelona, Spain
| | - Michael Costanzo
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Charles Boone
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Philip M. Kim
- Department of Computer Science, University of Toronto, Toronto, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
- * E-mail:
| |
Collapse
|
45
|
Wang F, Liu M, Song B, Li D, Pei H, Guo Y, Huang J, Zhang D. Prediction and characterization of protein-protein interaction networks in swine. Proteome Sci 2012; 10:2. [PMID: 22230699 PMCID: PMC3306829 DOI: 10.1186/1477-5956-10-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2011] [Accepted: 01/10/2012] [Indexed: 11/13/2022] Open
Abstract
Background Studying the large-scale protein-protein interaction (PPI) network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes. Results We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively. Conclusion The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/).
Collapse
Affiliation(s)
- Fen Wang
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Hogeweg P. Toward a theory of multilevel evolution: long-term information integration shapes the mutational landscape and enhances evolvability. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 751:195-224. [PMID: 22821460 DOI: 10.1007/978-1-4614-3567-9_10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Most of evolutionary theory has abstracted away from how information is coded in the genome and how this information is transformed into traits on which selection takes place. While in the earliest stages of biological evolution, in the RNA world, the mapping from the genotype into function was largely predefined by the physical-chemical properties of the evolving entities (RNA replicators, e.g. from sequence to folded structure and catalytic sites), in present-day organisms, the mapping itself is the result of evolution. I will review results of several in silico evolutionary studies which examine the consequences of evolving the genetic coding, and the ways this information is transformed, while adapting to prevailing environments. Such multilevel evolution leads to long-term information integration. Through genome, network, and dynamical structuring, the occurrence and/or effect of random mutations becomes nonrandom, and facilitates rapid adaptation. This is what does happen in the in silico experiments. Is it also what did happen in biological evolution? I will discuss some data that suggest that it did. In any case, these results provide us with novel search images to tackle the wealth of biological data.
Collapse
Affiliation(s)
- Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
47
|
Sun MGF, Kim PM. Evolution of biological interaction networks: from models to real data. Genome Biol 2011; 12:235. [PMID: 22204388 PMCID: PMC3334609 DOI: 10.1186/gb-2011-12-12-235] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 12/12/2011] [Indexed: 01/19/2023] Open
Abstract
We are beginning to uncover common mechanisms leading to the evolution of biological networks. The driving force behind these advances is the increasing availability of comparative data in several species.
Collapse
Affiliation(s)
- Mark G F Sun
- Department of Computer Science, University of Toronto, 160 College St, Toronto, Ontario, Canada
| | | |
Collapse
|
48
|
Ali W, Deane C, Reinert G. Protein Interaction Networks and Their Statistical Analysis. HANDBOOK OF STATISTICAL SYSTEMS BIOLOGY 2011:200-234. [DOI: 10.1002/9781119970606.ch10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
49
|
Liu Z, Liu Q, Sun H, Hou L, Guo H, Zhu Y, Li D, He F. Evidence for the additions of clustered interacting nodes during the evolution of protein interaction networks from network motifs. BMC Evol Biol 2011; 11:133. [PMID: 21595981 PMCID: PMC3128043 DOI: 10.1186/1471-2148-11-133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 05/20/2011] [Indexed: 11/22/2022] Open
Abstract
Background High-throughput screens have revealed large-scale protein interaction networks defining most cellular functions. How the proteins were added to the protein interaction network during its growth is a basic and important issue. Network motifs represent the simplest building blocks of cellular machines and are of biological significance. Results Here we study the evolution of protein interaction networks from the perspective of network motifs. We find that in current protein interaction networks, proteins of the same age class tend to form motifs and such co-origins of motif constituents are affected by their topologies and biological functions. Further, we find that the proteins within motifs whose constituents are of the same age class tend to be densely interconnected, co-evolve and share the same biological functions, and these motifs tend to be within protein complexes. Conclusions Our findings provide novel evidence for the hypothesis of the additions of clustered interacting nodes and point out network motifs, especially the motifs with the dense topology and specific function may play important roles during this process. Our results suggest functional constraints may be the underlying driving force for such additions of clustered interacting nodes.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing, China
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Navlakha S, Kingsford C. Network archaeology: uncovering ancient networks from present-day interactions. PLoS Comput Biol 2011; 7:e1001119. [PMID: 21533211 PMCID: PMC3077358 DOI: 10.1371/journal.pcbi.1001119] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 03/10/2011] [Indexed: 11/25/2022] Open
Abstract
What proteins interacted in a long-extinct ancestor of yeast? How have different members of a protein complex assembled together over time? Our ability to answer such questions has been limited by the unavailability of ancestral protein-protein interaction (PPI) networks. To overcome this limitation, we propose several novel algorithms to reconstruct the growth history of a present-day network. Our likelihood-based method finds a probable previous state of the graph by applying an assumed growth model backwards in time. This approach retains node identities so that the history of individual nodes can be tracked. Using this methodology, we estimate protein ages in the yeast PPI network that are in good agreement with sequence-based estimates of age and with structural features of protein complexes. Further, by comparing the quality of the inferred histories for several different growth models (duplication-mutation with complementarity, forest fire, and preferential attachment), we provide additional evidence that a duplication-based model captures many features of PPI network growth better than models designed to mimic social network growth. From the reconstructed history, we model the arrival time of extant and ancestral interactions and predict that complexes have significantly re-wired over time and that new edges tend to form within existing complexes. We also hypothesize a distribution of per-protein duplication rates, track the change of the network's clustering coefficient, and predict paralogous relationships between extant proteins that are likely to be complementary to the relationships inferred using sequence alone. Finally, we infer plausible parameters for the model, thereby predicting the relative probability of various evolutionary events. The success of these algorithms indicates that parts of the history of the yeast PPI are encoded in its present-day form. Many questions about present-day interaction networks could be answered by tracking how the network changed over time. We present a suite of algorithms to uncover an approximate node-by-node and edge-by-edge history of changes of a network when given only a present-day network and a plausible growth model by which it evolved. Our approach tracks the extant network backwards in time by finding high-likelihood previous configurations. Using topology alone, we show we can estimate protein ages and can identify anchor nodes from which proteins have duplicated. Our reconstructed histories also allow us to study how topological properties of the network have changed over time and how interactions and modules may have evolved. Further, we provide another line of evidence indicating that major features of the evolution of the yeast PPI are best captured by a duplication-based model. The study of inferred ancient networks is a novel application of dynamic network analysis that can unveil the evolutionary principles that drive cellular mechanisms. The algorithms presented here will likely also be useful for investigating other ancient, unavailable networks.
Collapse
Affiliation(s)
- Saket Navlakha
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Carl Kingsford
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| |
Collapse
|