1
|
Gondhalekar R, Kempes CP, McGlynn SE. Scaling of Protein Function across the Tree of Life. Genome Biol Evol 2023; 15:evad214. [PMID: 38007693 PMCID: PMC10715193 DOI: 10.1093/gbe/evad214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 11/07/2023] [Accepted: 11/12/2023] [Indexed: 11/28/2023] Open
Abstract
Scaling laws are a powerful way to compare genomes because they put all organisms onto a single curve and reveal nontrivial generalities as genomes change in size. The abundance of functional categories across genomes has previously been found to show power law scaling with respect to the total number of functional categories, suggesting that universal constraints shape genomic category abundance. Here, we look across the tree of life to understand how genome evolution may be related to functional scaling. We revisit previous observations of functional genome scaling with an expanded taxonomy by analyzing 3,726 bacterial, 220 archaeal, and 79 unicellular eukaryotic genomes. We find that for some functional classes, scaling is best described by multiple exponents, revealing previously unobserved shifts in scaling as genome-encoded protein annotations increase or decrease. Furthermore, we find that scaling varies between phyletic groups at both the domain and phyla levels and is less universal than previously thought. This variability in functional scaling is not related to taxonomic phylogeny resolved at the phyla level, suggesting that differences in cell plan or physiology outweigh broad patterns of taxonomic evolution. Since genomes are maintained and replicated by the functional proteins encoded by them, these results point to functional degeneracy between taxonomic groups and unique evolutionary trajectories toward these. We also find that individual phyla frequently span scaling exponents of functional classes, revealing that individual clades can move across scaling exponents. Together, our results reveal unique shifts in functions across the tree of life and highlight that as genomes grow or shrink, proteins of various functions may be added or lost.
Collapse
Affiliation(s)
- Riddhi Gondhalekar
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- School of Life Sciences and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | | | - Shawn Erin McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- School of Life Sciences and Technology, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, USA
- Center for Sustainable Resource Science, RIKEN, Saitama, Japan
| |
Collapse
|
2
|
Tovo A, Menzel P, Krogh A, Cosentino Lagomarsino M, Suweis S. Taxonomic classification method for metagenomics based on core protein families with Core-Kaiju. Nucleic Acids Res 2020; 48:e93. [PMID: 32633756 PMCID: PMC7498351 DOI: 10.1093/nar/gkaa568] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 06/12/2020] [Accepted: 06/24/2020] [Indexed: 12/19/2022] Open
Abstract
Characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. Determining microbiomes diversity implies the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and shotgun sequencing to three mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on various mock communities and we show that Core-Kaiju reliably predicts both number of taxa and abundances. Finally, we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and a fresh view on real microbiomes.
Collapse
Affiliation(s)
- Anna Tovo
- Physics and Astronomy Department, LIPh Lab, University of Padova, Via Marzolo 8, 35131 Padova, Italy.,Mathematics Department, University of Padova, via Trieste 63, 35121 Padova, Italy
| | - Peter Menzel
- Labor Berlin Charité Vivantes GmbH, Sylter Str. 2, 13353 Berlin, Germany
| | - Anders Krogh
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, DK-2100 Copenhagen, Denmark
| | - Marco Cosentino Lagomarsino
- IFOM, FIRC Institute of Molecular Oncology, Via Adamello 16, 20143 Milan, Italy.,Physics Department, University of Milan, and I.N.F.N., Via Celoria 16, 20133 Milan, Italy
| | - Samir Suweis
- Physics and Astronomy Department, LIPh Lab, University of Padova, Via Marzolo 8, 35131 Padova, Italy.,Padova Neuroscience Center, University of Padova, Via Orus 2/B, 35131 Padova, Italy
| |
Collapse
|
3
|
Mazzolini A, Grilli J, De Lazzari E, Osella M, Lagomarsino MC, Gherardi M. Zipf and Heaps laws from dependency structures in component systems. Phys Rev E 2018; 98:012315. [PMID: 30110773 DOI: 10.1103/physreve.98.012315] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Indexed: 06/08/2023]
Abstract
Complex natural and technological systems can be considered, on a coarse-grained level, as assemblies of elementary components: for example, genomes as sets of genes or texts as sets of words. On one hand, the joint occurrence of components emerges from architectural and specific constraints in such systems. On the other hand, general regularities may unify different systems, such as the broadly studied Zipf and Heaps laws, respectively concerning the distribution of component frequencies and their number as a function of system size. Dependency structures (i.e., directed networks encoding the dependency relations between the components in a system) were proposed recently as a possible organizing principles underlying some of the regularities observed. However, the consequences of this assumption were explored only in binary component systems, where solely the presence or absence of components is considered, and multiple copies of the same component are not allowed. Here we consider a simple model that generates, from a given ensemble of dependency structures, a statistical ensemble of sets of components, allowing for components to appear with any multiplicity. Our model is a minimal extension that is memoryless and therefore accessible to analytical calculations. A mean-field analytical approach (analogous to the "Zipfian ensemble" in the linguistics literature) captures the relevant laws describing the component statistics as we show by comparison with numerical computations. In particular, we recover a power-law Zipf rank plot, with a set of core components, and a Heaps law displaying three consecutive regimes (linear, sublinear, and saturating) that we characterize quantitatively.
Collapse
Affiliation(s)
- Andrea Mazzolini
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Jacopo Grilli
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Eleonora De Lazzari
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
| | - Matteo Osella
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Marco Cosentino Lagomarsino
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- CNRS, UMR 7238, Paris, France
- IFOM, Milan, Italy
| | - Marco Gherardi
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
4
|
Mahmoudabadi G, Phillips R. A comprehensive and quantitative exploration of thousands of viral genomes. eLife 2018; 7:31955. [PMID: 29624169 PMCID: PMC5908442 DOI: 10.7554/elife.31955] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2017] [Accepted: 03/30/2018] [Indexed: 01/27/2023] Open
Abstract
The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends – such as gene density, noncoding percentage, and abundances of functional gene categories – across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends.
Collapse
Affiliation(s)
- Gita Mahmoudabadi
- Department of Bioengineering, California Institute of Technology, Pasadena, United States
| | - Rob Phillips
- Department of Bioengineering, California Institute of Technology, Pasadena, United States.,Department of Applied Physics, California Institute of Technology, Pasadena, United States
| |
Collapse
|
5
|
De Lazzari E, Grilli J, Maslov S, Cosentino Lagomarsino M. Family-specific scaling laws in bacterial genomes. Nucleic Acids Res 2017; 45:7615-7622. [PMID: 28605556 PMCID: PMC5737699 DOI: 10.1093/nar/gkx510] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 05/30/2017] [Indexed: 01/21/2023] Open
Abstract
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes.
Collapse
Affiliation(s)
- Eleonora De Lazzari
- Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 4 Place Jussieu, Paris 75005, France
| | - Jacopo Grilli
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th st 60637 Chicago, IL, USA
| | - Sergei Maslov
- Department of Bioengineering, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- To whom correspondence should be addressed. Tel: +33 144277341; . Correspondence may also be addressed to Sergei Maslov. Tel: +1 217 265 5705;
| | - Marco Cosentino Lagomarsino
- Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 4 Place Jussieu, Paris 75005, France
- CNRS, UMR 7238, Paris, France
- FIRC Institute of Molecular Oncology (IFOM), 20139 Milan, Italy
- To whom correspondence should be addressed. Tel: +33 144277341; . Correspondence may also be addressed to Sergei Maslov. Tel: +1 217 265 5705;
| |
Collapse
|
6
|
Covariations in ecological scaling laws fostered by community dynamics. Proc Natl Acad Sci U S A 2017; 114:10672-10677. [PMID: 28830995 DOI: 10.1073/pnas.1708376114] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Scaling laws in ecology, intended both as functional relationships among ecologically relevant quantities and the probability distributions that characterize their occurrence, have long attracted the interest of empiricists and theoreticians. Empirical evidence exists of power laws associated with the number of species inhabiting an ecosystem, their abundances, and traits. Although their functional form appears to be ubiquitous, empirical scaling exponents vary with ecosystem type and resource supply rate. The idea that ecological scaling laws are linked has been entertained before, but the full extent of macroecological pattern covariations, the role of the constraints imposed by finite resource supply, and a comprehensive empirical verification are still unexplored. Here, we propose a theoretical scaling framework that predicts the linkages of several macroecological patterns related to species' abundances and body sizes. We show that such a framework is consistent with the stationary-state statistics of a broad class of resource-limited community dynamics models, regardless of parameterization and model assumptions. We verify predicted theoretical covariations by contrasting empirical data and provide testable hypotheses for yet unexplored patterns. We thus place the observed variability of ecological scaling exponents into a coherent statistical framework where patterns in ecology embed constrained fluctuations.
Collapse
|
7
|
Kristensen DM, Saeed U, Frishman D, Koonin EV. A census of α-helical membrane proteins in double-stranded DNA viruses infecting bacteria and archaea. BMC Bioinformatics 2015; 16:380. [PMID: 26554846 PMCID: PMC4641393 DOI: 10.1186/s12859-015-0817-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 11/06/2015] [Indexed: 01/21/2023] Open
Abstract
Background Viruses are the most abundant and genetically diverse biological entities on earth, yet the repertoire of viral proteins remains poorly explored. As the number of sequenced virus genomes grows into the thousands, and the number of viral proteins into the hundreds of thousands, we report a systematic computational analysis of the point of first-contact between viruses and their hosts, namely viral transmembrane (TM) proteins. Results The complement of α-helical TM proteins in double-stranded DNA viruses infecting bacteria and archaea reveals large-scale trends that differ from those of their hosts. Viruses typically encode a substantially lower fraction of TM proteins than archaea or bacteria, with the notable exception of viruses with virions containing a lipid component such as a lipid envelope, internal lipid core, or inner membrane vesicle. Compared to bacteriophages, archaeal viruses are substantially enriched in membrane proteins. However, this feature is not always stable throughout the evolution of a viral lineage; for example, TM proteins are not part of the common heritage shared between Lipothrixviridae and Rudiviridae. In contrast to bacteria and archaea, viruses almost completely lack proteins with complicated membrane topologies composed of more than 4 TM segments, with the few detected exceptions being obvious cases of relatively recent horizontal transfer from the host. Conclusions The dramatic differences between the membrane proteomes of cells and viruses stem from the fact that viruses do not depend on essential membranes for energy transformation, ion homeostasis, nutrient transport and signaling. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0817-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- David M Kristensen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. .,Current address: Department of Biomedical Engineering, University of Iowa, Iowa City, IA, USA.
| | - Usman Saeed
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354, Freising, Germany. .,Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.
| | - Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354, Freising, Germany. .,Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
8
|
|
9
|
Abstract
The widespread exchange of genes between bacteria must have consequences on the global architecture of their genomes, which are being found in the abundant genomic data available today. Most of the expansion of bacterial protein families can be attributed to transfer events, which are positively biased for smaller evolutionary distances between genomes, and more frequent for classes that are larger, when summed over all known bacteria. Moreover, “innovation” events where horizontal transfers carry exogenous evolutionary families appear to be less frequent for larger genomes. This dynamic expansion of evolutionary families is interconnected with the acquisition of new biological functions and thus with the size and distribution of the genes’ functional categories found on a genome. This commentary presents our recent contributions to this line of work and possible future directions.
Collapse
Affiliation(s)
- Luigi Grassi
- Dipartimento di Fisica, Sapienza Università di Roma; Rome, Italy
| | | | | |
Collapse
|
10
|
Grilli J, Romano M, Bassetti F, Cosentino Lagomarsino M. Cross-species gene-family fluctuations reveal the dynamics of horizontal transfers. Nucleic Acids Res 2014; 42:6850-60. [PMID: 24829449 PMCID: PMC4066789 DOI: 10.1093/nar/gku378] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Prokaryotes vary their protein repertoire mainly through horizontal transfer and gene loss. To elucidate the links between these processes and the cross-species gene-family statistics, we perform a large-scale data analysis of the cross-species variability of gene-family abundance (the number of members of the family found on a given genome). We find that abundance fluctuations are related to the rate of horizontal transfers. This is rationalized by a minimal theoretical model, which predicts this link. The families that are not captured by the model show abundance profiles that are markedly peaked around a mean value, possibly because of specific abundance selection. Based on these results, we define an abundance variability index that captures a family's evolutionary behavior (and thus some of its relevant functional properties) purely based on its cross-species abundance fluctuations. Analysis and model, combined, show a quantitative link between cross-species family abundance statistics and horizontal transfer dynamics, which can be used to analyze genome ‘flux’. Groups of families with different values of the abundance variability index correspond to genome sub-parts having different plasticity in terms of the level of horizontal exchange allowed by natural selection.
Collapse
Affiliation(s)
- Jacopo Grilli
- Dipartimento di Fisica e Astronomia "G. Galilei", Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
| | - Mariacristina Romano
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria, 16, 20133 Milano, Italy
| | - Federico Bassetti
- Università di Pavia, Dipartimento di Matematica, via Ferrata 1, 27100 Pavia, Italy
| | - Marco Cosentino Lagomarsino
- CNRS, UMR 7238, Paris, France Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 15 rue de l'École de Médecine, Paris, France
| |
Collapse
|
11
|
Abstract
In a series of conceptual articles published around the millennium, Carl Woese emphasized that evolution of cells is the central problem of evolutionary biology, that the three-domain ribosomal tree of life is an essential framework for reconstructing cellular evolution, and that the evolutionary dynamics of functionally distinct cellular systems are fundamentally different, with the information processing systems “crystallizing” earlier than operational systems. The advances of evolutionary genomics over the last decade vindicate major aspects of Woese’s vision. Despite the observations of pervasive horizontal gene transfer among bacteria and archaea, the ribosomal tree of life comes across as a central statistical trend in the “forest” of phylogenetic trees of individual genes, and hence, an appropriate scaffold for evolutionary reconstruction. The evolutionary stability of information processing systems, primarily translation, becomes ever more striking with the accumulation of comparative genomic data indicating that nearly allof the few universal genes encode translation system components. Woese’s view on the fundamental distinctions between the three domains of cellular life also withstand the test of comparative genomics, although his non-acceptance of symbiogenetic scenarios for the origin of eukaryotes might not. Above all, Woese’s key prediction that understanding evolution of microbes will be the core of the new evolutionary biology appears to be materializing.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD 20894
| |
Collapse
|
12
|
Martínez-Núñez MA, Poot-Hernandez AC, Rodríguez-Vázquez K, Perez-Rueda E. Increments and duplication events of enzymes and transcription factors influence metabolic and regulatory diversity in prokaryotes. PLoS One 2013; 8:e69707. [PMID: 23922780 PMCID: PMC3726781 DOI: 10.1371/journal.pone.0069707] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Accepted: 06/13/2013] [Indexed: 11/18/2022] Open
Abstract
In this work, the content of enzymes and DNA-binding transcription factors (TFs) in 794 non-redundant prokaryotic genomes was evaluated. The identification of enzymes was based on annotations deposited in the KEGG database as well as in databases of functional domains (COG and PFAM) and structural domains (Superfamily). For identifications of the TFs, hidden Markov profiles were constructed based on well-known transcriptional regulatory families. From these analyses, we obtained diverse and interesting results, such as the negative rate of incremental changes in the number of detected enzymes with respect to the genome size. On the contrary, for TFs the rate incremented as the complexity of genome increased. This inverse related performance shapes the diversity of metabolic and regulatory networks and impacts the availability of enzymes and TFs. Furthermore, the intersection of the derivatives between enzymes and TFs was identified at 9,659 genes, after this point, the regulatory complexity grows faster than metabolic complexity. In addition, TFs have a low number of duplications, in contrast to the apparent high number of duplications associated with enzymes. Despite the greater number of duplicated enzymes versus TFs, the increment by which duplicates appear is higher in TFs. A lower proportion of enzymes among archaeal genomes (22%) than in the bacterial ones (27%) was also found. This low proportion might be compensated by the interconnection between the metabolic pathways in Archaea. A similar proportion was also found for the archaeal TFs, for which the formation of regulatory complexes has been proposed. Finally, an enrichment of multifunctional enzymes in Bacteria, as a mechanism of ecological adaptation, was detected.
Collapse
Affiliation(s)
- Mario Alberto Martínez-Núñez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, México D.F., México
- * E-mail: (MMN); (EPR)
| | - Augusto Cesar Poot-Hernandez
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Katya Rodríguez-Vázquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, México D.F., México
| | - Ernesto Perez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
- * E-mail: (MMN); (EPR)
| |
Collapse
|
13
|
Bottinelli A, Bassetti B, Lagomarsino MC, Gherardi M. Influence of homology and node age on the growth of protein-protein interaction networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:041919. [PMID: 23214627 DOI: 10.1103/physreve.86.041919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Indexed: 06/01/2023]
Abstract
Proteins participating in a protein-protein interaction network can be grouped into homology classes following their common ancestry. Proteins added to the network correspond to genes added to the classes, so the dynamics of the two objects are intrinsically linked. Here we first introduce a statistical model describing the joint growth of the network and the partitioning of nodes into classes, which is studied through a combined mean-field and simulation approach. We then employ this unified framework to address the specific issue of the age dependence of protein interactions through the definition of three different node wiring or divergence schemes. A comparison with empirical data indicates that an age-dependent divergence move is necessary in order to reproduce the basic topological observables together with the age correlation between interacting nodes visible in empirical data. We also discuss the possibility of nontrivial joint partition and topology observables.
Collapse
|
14
|
Abstract
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law–like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as “laws of evolutionary genomics” in the same sense “law” is understood in modern physics.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.
| |
Collapse
|