Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nasir A, Kim KM, Caetano-Anollés G. Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014;10:e1003452. [PMID: 24499935 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open

For:	Nasir A, Kim KM, Caetano-Anollés G. Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 2014;10:e1003452. [PMID: 24499935 DOI: 10.1371/journal.pcbi.1003452] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 12/03/2013] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Caetano-Anollés G. Are Viruses Taxonomic Units? A Protein Domain and Loop-Centric Phylogenomic Assessment. Viruses 2024;16:1061. [PMID: 39066224 PMCID: PMC11281659 DOI: 10.3390/v16071061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 06/26/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open

Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]

Kwon T, Hovde BT. Global characterization of biosynthetic gene clusters in non-model eukaryotes using domain architectures. Sci Rep 2024;14:1534. [PMID: 38233413 PMCID: PMC10794256 DOI: 10.1038/s41598-023-50095-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024] Open

Nguyen LAC, Mori M, Yasuda Y, Galipon J. Functional Consequences of Shifting Transcript Boundaries in Glucose Starvation. Mol Cell Biol 2023;43:611-628. [PMID: 37937348 PMCID: PMC10761120 DOI: 10.1080/10985549.2023.2270406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/10/2023] [Indexed: 11/09/2023] Open

Caetano-Anollés G. Agency in evolution of biomolecular communication. Ann N Y Acad Sci 2023;1525:88-103. [PMID: 37219369 DOI: 10.1111/nyas.15005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]

Genomic Survey of Flavin Monooxygenases in Wild and Cultivated Rice Provides Insight into Evolution and Functional Diversities. Int J Mol Sci 2023;24:ijms24044190. [PMID: 36835601 PMCID: PMC9960948 DOI: 10.3390/ijms24044190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/08/2023] [Accepted: 01/12/2023] [Indexed: 02/22/2023] Open

Budimir I, Giampieri E, Saccenti E, Suarez-Diez M, Tarozzi M, Dall'Olio D, Merlotti A, Curti N, Remondini D, Castellani G, Sala C. Intraspecies characterization of bacteria via evolutionary modeling of protein domains. Sci Rep 2022;12:16595. [PMID: 36198716 PMCID: PMC9534902 DOI: 10.1038/s41598-022-21036-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open

Romei M, Sapriel G, Imbert P, Jamay T, Chomilier J, Lecointre G, Carpentier M. Protein folds as synapomorphies of the tree of life. Evolution 2022;76:1706-1719. [PMID: 35765784 PMCID: PMC9541633 DOI: 10.1111/evo.14550] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]

New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022;13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open

Caetano-Anollés G. The Compressed Vocabulary of Microbial Life. Front Microbiol 2021;12:655990. [PMID: 34305827 PMCID: PMC8292947 DOI: 10.3389/fmicb.2021.655990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open

Abstract

Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.

Collapse

Nasir A, Mughal F, Caetano-Anollés G. The tree of life describes a tripartite cellular world. Bioessays 2021;43:e2000343. [PMID: 33837594 DOI: 10.1002/bies.202000343] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/11/2021] [Accepted: 03/15/2021] [Indexed: 12/28/2022]

Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021;68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]

Harris HMB, Hill C. A Place for Viruses on the Tree of Life. Front Microbiol 2021;11:604048. [PMID: 33519747 PMCID: PMC7840587 DOI: 10.3389/fmicb.2020.604048] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 12/14/2020] [Indexed: 12/15/2022] Open

Nasir A, Romero-Severson E, Claverie JM. Investigating the Concept and Origin of Viruses. Trends Microbiol 2020;28:959-967. [PMID: 33158732 PMCID: PMC7609044 DOI: 10.1016/j.tim.2020.08.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/25/2020] [Accepted: 08/27/2020] [Indexed: 12/21/2022]

Defosset A, Kress A, Nevers Y, Ripp R, Thompson JD, Poch O, Lecompte O. Proteome-Scale Detection of Differential Conservation Patterns at Protein and Subprotein Levels with BLUR. Genome Biol Evol 2020;13:5991441. [PMID: 33211099 PMCID: PMC7851591 DOI: 10.1093/gbe/evaa248] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/18/2020] [Indexed: 11/23/2022] Open

Yadav A, Fernández-Baca D, Cannon SB. Family-Specific Gains and Losses of Protein Domains in the Legume and Grass Plant Families. Evol Bioinform Online 2020;16:1176934320939943. [PMID: 32694909 PMCID: PMC7350399 DOI: 10.1177/1176934320939943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 06/15/2020] [Indexed: 11/27/2022] Open

Abstract

Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events such as domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationships. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in 2 large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate 4 types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The 4 types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve—that is, via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. All the feature matrices were analyzed using feature selection techniques and statistical tests to select protein domains that have significant different feature values in legumes and grasses. We report the biological functions of the top selected domains from the analysis of all the feature matrices. In addition, we also perform domain-centric gene ontology (dcGO) enrichment analysis on all selected domains from all 4 feature matrices to study the gene ontology terms associated with the significantly evolving domains in legumes and grasses. Domain content analysis revealed a striking loss of protein domains from the Fanconi anemia (FA) pathway, the pathway responsible for the repair of interstrand DNA crosslinks. The abundance analysis of domains found in legumes revealed an increase in glutathione synthase enzyme, an antioxidant required from nitrogen fixation, and a decrease in xanthine oxidizing enzymes, a phenomenon confirmed by previous studies. In grasses, the abundance analysis showed increases in domains related to gene silencing which could be due to polyploidy or due to enhanced response to viral infection. We provide a docker container that can be used to perform this analysis workflow on any user-defined sets of species, available at https://cloud.docker.com/u/akshayayadav/repository/docker/akshayayadav/protein-domain-evolution-project.

Collapse

Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020;12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open

Mughal F, Caetano-Anollés G. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 2019;14:e0224201. [PMID: 31648227 PMCID: PMC6812854 DOI: 10.1371/journal.pone.0224201] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open

Abstract

Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database, an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. The database, which was introduced in 2006, traces the deep history of the structural domains of enzymes in metabolic pathways. Here we release version 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. MANET is available at http://manet.illinois.edu.

Collapse

Deryusheva EI, Machulin AV, Matyunin MA, Galzitskaya OV. Investigation of the Relationship between the S1 Domain and Its Molecular Functions Derived from Studies of the Tertiary Structure. Molecules 2019;24:E3681. [PMID: 31614904 PMCID: PMC6832287 DOI: 10.3390/molecules24203681] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 10/11/2019] [Indexed: 11/16/2022] Open

Caetano-Anollés D, Nasir A, Kim KM, Caetano-Anollés G. Testing Empirical Support for Evolutionary Models that Root the Tree of Life. J Mol Evol 2019;87:131-142. [PMID: 30887086 PMCID: PMC6443624 DOI: 10.1007/s00239-019-09891-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 03/06/2019] [Indexed: 12/12/2022]

Evolution of Protein Domain Architectures. Methods Mol Biol 2019;1910:469-504. [PMID: 31278674 DOI: 10.1007/978-1-4939-9074-0_15] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018;14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open

Caetano-Anollés D, Caetano-Anollés K, Caetano-Anollés G. Evolution of macromolecular structure: a 'double tale' of biological accretion and diversification. Sci Prog 2018;101:360-383. [PMID: 30296968 PMCID: PMC10365222 DOI: 10.3184/003685018x15379391431599] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Staley JT, Caetano-Anollés G. Archaea-First and the Co-Evolutionary Diversification of Domains of Life. Bioessays 2018;40:e1800036. [PMID: 29944192 DOI: 10.1002/bies.201800036] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/12/2018] [Indexed: 12/13/2022]

Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017;6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]

A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters. Sci Rep 2017;7:14210. [PMID: 29079755 PMCID: PMC5660162 DOI: 10.1038/s41598-017-13297-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 09/21/2017] [Indexed: 12/28/2022] Open

Laurie J, Chattopadhyay AK, Flower DR. Protein lipograms. J Theor Biol 2017;430:109-116. [PMID: 28716385 DOI: 10.1016/j.jtbi.2017.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Revised: 06/30/2017] [Accepted: 07/12/2017] [Indexed: 11/20/2022]

Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, von Mering C, Claassen M, Picotti P. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 2017;355:355/6327/eaai7825. [PMID: 28232526 DOI: 10.1126/science.aai7825] [Citation(s) in RCA: 255] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 01/12/2017] [Indexed: 12/14/2022]

Eggermont L, Verstraeten B, Van Damme EJM. Genome-Wide Screening for Lectin Motifs in Arabidopsis thaliana. THE PLANT GENOME 2017;10. [PMID: 28724081 DOI: 10.3835/plantgenome2017.02.0010] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Nasir A, Kim KM, Caetano-Anollés G. Phylogenetic Tracings of Proteome Size Support the Gradual Accretion of Protein Structural Domains and the Early Origin of Viruses from Primordial Cells. Front Microbiol 2017;8:1178. [PMID: 28690608 PMCID: PMC5481351 DOI: 10.3389/fmicb.2017.01178] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 06/09/2017] [Indexed: 01/05/2023] Open

Abstract

Untangling the origin and evolution of viruses remains a challenging proposition. We recently studied the global distribution of protein domain structures in thousands of completely sequenced viral and cellular proteomes with comparative genomics, phylogenomics, and multidimensional scaling methods. A tree of life describing the evolution of proteomes revealed viruses emerging from the base of the tree as a fourth supergroup of life. A tree of domains indicated an early origin of modern viral lineages from ancient cells that co-existed with the cellular ancestors. However, it was recently argued that the rooting of our trees and the basal placement of viruses was artifactually induced by small genome (proteome) size. Here we show that these claims arise from misunderstanding and misinterpretations of cladistic methodology. Trees are reconstructed unrooted, and thus, their topologies cannot be distorted a posteriori by the rooting methodology. Tracing proteome size in trees and multidimensional views of evolutionary relationships as well as tests of leaf stability and exclusion/inclusion of taxa demonstrated that the smallest proteomes were neither attracted toward the root nor caused any topological distortions of the trees. Simulations confirmed that taxa clustering patterns were independent of proteome size and were determined by the presence of known evolutionary relatives in data matrices, highlighting the need for broader taxon sampling in phylogeny reconstruction. Instead, phylogenetic tracings of proteome size revealed a slowdown in innovation of the structural domain vocabulary and four regimes of allometric scaling that reflected a Heaps law. These regimes explained increasing economies of scale in the evolutionary growth and accretion of kernel proteome repertoires of viruses and cellular organisms that resemble growth of human languages with limited vocabulary sizes. Results reconcile dynamic and static views of domain frequency distributions that are consistent with the axiom of spatiotemporal continuity that is tenet of evolutionary thinking.

Collapse

Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017;12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open

Abstract

The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.

Collapse

Staley JT, Fuerst JA. Ancient, highly conserved proteins from a LUCA with complex cell biology provide evidence in support of the nuclear compartment commonality (NuCom) hypothesis. Res Microbiol 2017;168:395-412. [PMID: 28111289 DOI: 10.1016/j.resmic.2017.01.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Revised: 01/08/2017] [Accepted: 01/09/2017] [Indexed: 12/23/2022]

Arguments Reinforcing the Three-Domain View of Diversified Cellular Life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2016;2016:1851865. [PMID: 28050162 PMCID: PMC5165138 DOI: 10.1155/2016/1851865] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 10/18/2016] [Accepted: 11/03/2016] [Indexed: 11/18/2022]

Cardenas JP, Quatrini R, Holmes DS. Aerobic Lineage of the Oxidative Stress Response Protein Rubrerythrin Emerged in an Ancient Microaerobic, (Hyper)Thermophilic Environment. Front Microbiol 2016;7:1822. [PMID: 27917155 PMCID: PMC5114695 DOI: 10.3389/fmicb.2016.01822] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 10/31/2016] [Indexed: 11/27/2022] Open

Abstract

Rubrerythrins (RBRs) are non-heme di-iron proteins belonging to the ferritin-like superfamily. They are involved in oxidative stress defense as peroxide scavengers in a wide range of organisms. The vast majority of RBRs, including classical forms of this protein, contain a C-terminal rubredoxin-like domain involved in electron transport that is used during catalysis in anaerobic conditions. Rubredoxin is an ancient and large protein family of short length (<100 residues) that contains a Fe-S center involved in electron transfer. However, functional forms of the enzyme lacking the rubredoxin-like domain have been reported (e.g., sulerythrin and ferriperoxin). In this study, phylogenomic evidence is presented that suggests that a complete lineage of rubrerythrins, lacking the rubredoxin-like domain, arose in an ancient microaerobic and (hyper)thermophilic environments in the ancestors of the Archaea Thermoproteales and Sulfolobales. This lineage (termed the “aerobic-type” lineage) subsequently evolved to become adapted to environments with progressively lower temperatures and higher oxygen concentrations via the acquisition of two co-localized genes, termed DUF3501 and RFO, encoding a conserved protein of unknown function and a predicted Fe-S oxidoreductase, respectively. Proposed Horizontal Gene Transfer events from these archaeal ancestors to Bacteria expanded the opportunities for further evolution of this RBR including adaption to lower temperatures. The second lineage (termed the cyanobacterial lineage) is proposed to have evolved in cyanobacterial ancestors, maybe in direct response to the production of oxygen via oxygenic photosynthesis during the Great Oxygen Event (GOE). It is hypothesized that both lineages of RBR emerged in a largely anaerobic world with “whiffs” of oxygen and that their subsequent independent evolutionary trajectories allowed microorganisms to transition from this anaerobic world to an aerobic one.

Collapse

Repeat proteins challenge the concept of structural domains. Biochem Soc Trans 2016;43:844-9. [PMID: 26517892 DOI: 10.1042/bst20150083] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Brunk E, Mih N, Monk J, Zhang Z, O’Brien EJ, Bliven SE, Chen K, Chang RL, Bourne PE, Palsson BO. Systems biology of the structural proteome. BMC SYSTEMS BIOLOGY 2016;10:26. [PMID: 26969117 PMCID: PMC4787049 DOI: 10.1186/s12918-016-0271-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Accepted: 02/16/2016] [Indexed: 12/19/2022]

Abstract

BACKGROUND

The success of genome-scale models (GEMs) can be attributed to the high-quality, bottom-up reconstructions of metabolic, protein synthesis, and transcriptional regulatory networks on an organism-specific basis. Such reconstructions are biochemically, genetically, and genomically structured knowledge bases that can be converted into a mathematical format to enable a myriad of computational biological studies. In recent years, genome-scale reconstructions have been extended to include protein structural information, which has opened up new vistas in systems biology research and empowered applications in structural systems biology and systems pharmacology.

RESULTS

Here, we present the generation, application, and dissemination of genome-scale models with protein structures (GEM-PRO) for Escherichia coli and Thermotoga maritima. We show the utility of integrating molecular scale analyses with systems biology approaches by discussing several comparative analyses on the temperature dependence of growth, the distribution of protein fold families, substrate specificity, and characteristic features of whole cell proteomes. Finally, to aid in the grand challenge of big data to knowledge, we provide several explicit tutorials of how protein-related information can be linked to genome-scale models in a public GitHub repository ( https://github.com/SBRG/GEMPro/tree/master/GEMPro_recon/).

CONCLUSIONS

Translating genome-scale, protein-related information to structured data in the format of a GEM provides a direct mapping of gene to gene-product to protein structure to biochemical reaction to network states to phenotypic function. Integration of molecular-level details of individual proteins, such as their physical, chemical, and structural properties, further expands the description of biochemical network-level properties, and can ultimately influence how to model and predict whole cell phenotypes as well as perform comparative systems biology approaches to study differences between organisms. GEM-PRO offers insight into the physical embodiment of an organism's genotype, and its use in this comparative framework enables exploration of adaptive strategies for these organisms, opening the door to many new lines of research. With these provided tools, tutorials, and background, the reader will be in a position to run GEM-PRO for their own purposes.

Collapse

Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016;9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]

Jeong H, Sung S, Kwon T, Seo M, Caetano-Anollés K, Choi SH, Cho S, Nasir A, Kim H. HGTree: database of horizontally transferred genes determined by tree reconciliation. Nucleic Acids Res 2015;44:D610-9. [PMID: 26578597 PMCID: PMC4702880 DOI: 10.1093/nar/gkv1245] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/01/2015] [Indexed: 01/13/2023] Open

Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015;35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]

Nasir A, Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. SCIENCE ADVANCES 2015;1:e1500527. [PMID: 26601271 PMCID: PMC4643759 DOI: 10.1126/sciadv.1500527] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/30/2015] [Indexed: 05/05/2023]

Shahzad K, Mittenthal JE, Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann's law of language. BMC SYSTEMS BIOLOGY 2015;9:44. [PMID: 26260760 PMCID: PMC4531524 DOI: 10.1186/s12918-015-0192-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 07/30/2015] [Indexed: 11/10/2022]

Abstract

BACKGROUND

The combination of domains in multidomain proteins enhances their function and structure but lengthens the molecules and increases their cost at cellular level.

METHODS

The dependence of domain length on the number of domains a protein holds was surveyed for a set of 60 proteomes representing free-living organisms from all kingdoms of life. Distributions were fitted using non-linear functions and fitted parameters interpreted with a formulation of decreasing returns.

RESULTS

We find that domain length decreases with increasing number of domains in proteins, following the Menzerath-Altmann (MA) law of language. Highly significant negative correlations exist for the set of proteomes examined. Mathematically, the MA law expresses as a power law relationship that unfolds when molecular persistence P is a function of domain accretion. P holds two terms, one reflecting the matter-energy cost of adding domains and extending their length, the other reflecting how domain length and number impinges on information and biophysics. The pattern of diminishing returns can therefore be explained as a frustrated interplay between the strategies of economy, flexibility and robustness, matching previously observed trade-offs in the domain makeup of proteomes. Proteomes of Archaea, Fungi and to a lesser degree Plants show the largest push towards molecular economy, each at their own economic stratum. Fungi increase domain size in single domain proteins while reinforcing the pattern of diminishing returns. In contrast, Metazoa, and to lesser degrees Protista and Bacteria, relax economy. Metazoa achieves maximum flexibility and robustness by harboring compact molecules and complex domain organization, offering a new functional vocabulary for molecular biology.

CONCLUSIONS

The tendency of parts to decrease their size when systems enlarge is universal for language and music, and now for parts of macromolecules, extending the MA law to natural systems.

Collapse

Caetano-Anollés G, Caetano-Anollés D. Computing the origin and evolution of the ribosome from its structure - Uncovering processes of macromolecular accretion benefiting synthetic biology. Comput Struct Biotechnol J 2015;13:427-47. [PMID: 27096056 PMCID: PMC4823900 DOI: 10.1016/j.csbj.2015.07.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 07/16/2015] [Accepted: 07/19/2015] [Indexed: 12/11/2022] Open

Abstract

Accretion occurs pervasively in nature at widely different timeframes. The process also manifests in the evolution of macromolecules. Here we review recent computational and structural biology studies of evolutionary accretion that make use of the ideographic (historical, retrodictive) and nomothetic (universal, predictive) scientific frameworks. Computational studies uncover explicit timelines of accretion of structural parts in molecular repertoires and molecules. Phylogenetic trees of protein structural domains and proteomes and their molecular functions were built from a genomic census of millions of encoded proteins and associated terminal Gene Ontology terms. Trees reveal a ‘metabolic-first’ origin of proteins, the late development of translation, and a patchwork distribution of proteins in biological networks mediated by molecular recruitment. Similarly, the natural history of ancient RNA molecules inferred from trees of molecular substructures built from a census of molecular features shows patchwork-like accretion patterns. Ideographic analyses of ribosomal history uncover the early appearance of structures supporting mRNA decoding and tRNA translocation, the coevolution of ribosomal proteins and RNA, and a first evolutionary transition that brings ribosomal subunits together into a processive protein biosynthetic complex. Nomothetic structural biology studies of tertiary interactions and ancient insertions in rRNA complement these findings, once concentric layering assumptions are removed. Patterns of coaxial helical stacking reveal a frustrated dynamics of outward and inward ribosomal growth possibly mediated by structural grafting. The early rise of the ribosomal ‘turnstile’ suggests an evolutionary transition in natural biological computation. Results make explicit the need to understand processes of molecular growth and information transfer of macromolecules.

Collapse

The Origin and Evolution of Baeyer-Villiger Monooxygenases (BVMOs): An Ancestral Family of Flavin Monooxygenases. PLoS One 2015;10:e0132689. [PMID: 26161776 PMCID: PMC4498894 DOI: 10.1371/journal.pone.0132689] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Accepted: 06/18/2015] [Indexed: 12/13/2022] Open

Goncearenco A, Berezovsky IN. Protein function from its emergence to diversity in contemporary proteins. Phys Biol 2015;12:045002. [PMID: 26057563 DOI: 10.1088/1478-3975/12/4/045002] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Chang TC, Stergiopoulos I. Evolutionary analysis of the global landscape of protein domain types and domain architectures associated with family 14 carbohydrate-binding modules. FEBS Lett 2015;589:1813-8. [PMID: 26067847 DOI: 10.1016/j.febslet.2015.05.048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 05/11/2015] [Accepted: 05/20/2015] [Indexed: 10/23/2022]

Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015;16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open

Caetano-Anollés G, Mittenthal JE, Caetano-Anollés D, Kim KM. A calibrated chronology of biochemistry reveals a stem line of descent responsible for planetary biodiversity. Front Genet 2014;5:306. [PMID: 25309572 PMCID: PMC4161044 DOI: 10.3389/fgene.2014.00306] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 08/18/2014] [Indexed: 11/13/2022] Open

Kim KM, Nasir A, Hwang K, Caetano-Anollés G. A tree of cellular life inferred from a genomic census of molecular functions. J Mol Evol 2014;79:240-62. [PMID: 25128982 DOI: 10.1007/s00239-014-9637-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 08/05/2014] [Indexed: 10/24/2022]

On How Many Fundamental Kinds of Cells are Present on Earth: Looking for Phylogenetic Traits that Would Allow the Identification of the Primary Lines of Descent. J Mol Evol 2014;78:313-20. [DOI: 10.1007/s00239-014-9626-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/21/2014] [Indexed: 11/26/2022]

Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM. Archaea: the first domain of diversified life. ARCHAEA (VANCOUVER, B.C.) 2014;2014:590214. [PMID: 24987307 PMCID: PMC4060292 DOI: 10.1155/2014/590214] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 02/15/2014] [Accepted: 03/25/2014] [Indexed: 01/23/2023]