1
|
Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024; 16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open
Abstract
The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.
Collapse
Affiliation(s)
- Felix Langschied
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Natasha Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael Hiller
- Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Saioa Manzano-Morales
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | | | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ingo Ebersberger
- Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
| |
Collapse
|
2
|
Langleib M, Calvelo J, Costábile A, Castillo E, Tort JF, Hoffmann FG, Protasio AV, Koziol U, Iriarte A. Evolutionary analysis of species-specific duplications in flatworm genomes. Mol Phylogenet Evol 2024; 199:108141. [PMID: 38964593 DOI: 10.1016/j.ympev.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 06/15/2024] [Accepted: 07/01/2024] [Indexed: 07/06/2024]
Abstract
Platyhelminthes, also known as flatworms, is a phylum of bilaterian invertebrates infamous for their parasitic representatives. The classes Cestoda, Monogenea, and Trematoda comprise parasitic helminths inhabiting multiple hosts, including fishes, humans, and livestock, and are responsible for considerable economic damage and burden on human health. As in other animals, the genomes of flatworms have a wide variety of paralogs, genes related via duplication, whose origins could be mapped throughout the evolution of the phylum. Through in-silico analysis, we studied inparalogs, i.e., species-specific duplications, focusing on their biological functions, expression changes, and evolutionary rate. These genes are thought to be key players in the adaptation process of species to each particular niche. Our results showed that genes related with specific functional terms, such as response to stress, transferase activity, oxidoreductase activity, and peptidases, are overrepresented among inparalogs. This trend is conserved among species from different classes, including free-living species. Available expression data from Schistosoma mansoni, a parasite from the trematode class, demonstrated high conservation of expression patterns between inparalogs, but with notable exceptions, which also display evidence of rapid evolution. We discuss how natural selection may operate to maintain these genes and the particular duplication models that fit better to the observations. Our work supports the critical role of gene duplication in the evolution of flatworms, representing the first study of inparalogs evolution at the genome-wide level in this group.
Collapse
Affiliation(s)
- Mauricio Langleib
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay; Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Javier Calvelo
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Alicia Costábile
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Estela Castillo
- Laboratorio de Biología Parasitaria, Instituto de Higiene, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - José F Tort
- Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Mississippi, United States of America; Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi, United States of America
| | - Anna V Protasio
- Department of Pathology, University of Cambridge, Tennis Court Road, CB2 1QP, Cambridge, United Kingdom
| | - Uriel Koziol
- Sección Biología Celular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
3
|
Yuan H, Mancuso CA, Johnson K, Braasch I, Krishnan A. Computational strategies for cross-species knowledge transfer and translational biomedicine. ARXIV 2024:arXiv:2408.08503v1. [PMID: 39184546 PMCID: PMC11343225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. We introduce the term "agnology" to describe the functional equivalence of molecular components regardless of evolutionary origin, as this concept is becoming pervasive in integrative data-driven models where the role of evolutionary origin can become unclear. Our review addresses four key areas of information and knowledge transfer across species: (1) transferring disease and gene annotation knowledge, (2) identifying agnologous molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying agnologous cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer.
Collapse
Affiliation(s)
- Hao Yuan
- Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Christopher A. Mancuso
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus
| | - Kayla Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| | - Ingo Braasch
- Department of Integrative Biology; Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| |
Collapse
|
4
|
Vandepoele K, Thierens S, Van Bel M. Application of orthology and network biology to infer gene functions in non-model plants. PHYSIOLOGIA PLANTARUM 2024; 176:e14441. [PMID: 39019770 DOI: 10.1111/ppl.14441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 07/19/2024]
Abstract
Approximately 60% of the genes and gene products in the model species Arabidopsis thaliana have been functionally characterized. In non-model plant species, the functional annotation of the gene space is largely based on homology, with the assumption that genes with shared common ancestry have conserved functions. However, the wide variety in possible morphological, physiological, and ecological differences between plant species gives rise to many species- and clade-specific genes, for which this transfer of knowledge is not possible. Other complications, such as difficulties with genetic transformation, the absence of large-scale mutagenesis methods, and long generation times, further lead to the slow characterization of genes in non-model species. Here, we discuss different resources that integrate plant gene function information. Different approaches that support the functional annotation of gene products, based on orthology or network biology, are described. While sequence-based tools to characterize the functional landscape in non-model species are maturing and becoming more readily available, easy-to-use network-based methods inferring plant gene functions are not as prevalent and have limited functionality.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- VIB Center for AI & Computational Biology, VIB, Ghent, Belgium
| | - Sander Thierens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| |
Collapse
|
5
|
Hou Z, Yang S, He W, Lu T, Feng X, Zang L, Bai W, Chen X, Nie B, Li C, Wei M, Ma L, Han Z, Zou Q, Li W, Wang L. The haplotype-resolved genome of diploid Chrysanthemum indicum unveils new acacetin synthases genes and their evolutionary history. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38864745 DOI: 10.1111/tpj.16854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/31/2024] [Accepted: 05/03/2024] [Indexed: 06/13/2024]
Abstract
Acacetin, a flavonoid compound, possesses a wide range of pharmacological effects, including antimicrobial, immune regulation, and anticancer effects. Some key steps in its biosynthetic pathway were largely unknown in flowering plants. Here, we present the first haplotype-resolved genome of Chrysanthemum indicum, whose dried flowers contain abundant flavonoids and have been utilized as traditional Chinese medicine. Various phylogenetic analyses revealed almost equal proportion of three tree topologies among three Chrysanthemum species (C. indicum, C. nankingense, and C. lavandulifolium), indicating that frequent gene flow among Chrysanthemum species or incomplete lineage sorting due to rapid speciation might contribute to conflict topologies. The expanded gene families in C. indicum were associated with oxidative functions. Through comprehensive candidate gene screening, we identified five flavonoid O-methyltransferase (FOMT) candidates, which were highly expressed in flowers and whose expressional levels were significantly correlated with the content of acacetin. Further experiments validated two FOMTs (CI02A009970 and CI03A006662) were capable of catalyzing the conversion of apigenin into acacetin, and these two genes are possibly responsible acacetin accumulation in disc florets and young leaves, respectively. Furthermore, combined analyses of ancestral chromosome reconstruction and phylogenetic trees revealed the distinct evolutionary fates of the two validated FOMT genes. Our study provides new insights into the biosynthetic pathway of flavonoid compounds in the Asteraceae family and offers a model for tracing the origin and evolutionary routes of single genes. These findings will facilitate in vitro biosynthetic production of flavonoid compounds through cellular and metabolic engineering and expedite molecular breeding of C. indicum cultivars.
Collapse
Affiliation(s)
- Zhuangwei Hou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Song Yang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Weijun He
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Tingting Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Xunmeng Feng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lanlan Zang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Wenhui Bai
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Xueqing Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Bao Nie
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Cheng Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Min Wei
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Liangju Ma
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Zhengzhou Han
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Qingjun Zou
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
- National Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, National Resource Center for Chinese Materia Medica, Chinese Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Wei Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Li Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
- State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Beijing, 100700, China
| |
Collapse
|
6
|
Sierra NC, Olsman N, Yi L, Pachter L, Goentoro L, Gold DA. A Novel Approach to Comparative RNA-Seq Does Not Support a Conserved Set of Orthologs Underlying Animal Regeneration. Genome Biol Evol 2024; 16:evae120. [PMID: 38922665 PMCID: PMC11214158 DOI: 10.1093/gbe/evae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/27/2024] Open
Abstract
Molecular studies of animal regeneration typically focus on conserved genes and signaling pathways that underlie morphogenesis. To date, a holistic analysis of gene expression across animals has not been attempted, as it presents a suite of problems related to differences in experimental design and gene homology. By combining orthology analyses with a novel statistical method for testing gene enrichment across large data sets, we are able to test whether tissue regeneration across animals shares transcriptional regulation. We applied this method to a meta-analysis of six publicly available RNA-Seq data sets from diverse examples of animal regeneration. We recovered 160 conserved orthologous gene clusters, which are enriched in structural genes as opposed to those regulating morphogenesis. A breakdown of gene presence/absence provides limited support for the conservation of pathways typically implicated in regeneration, such as Wnt signaling and cell pluripotency pathways. Such pathways are only conserved if we permit large amounts of paralog switching through evolution. Overall, our analysis does not support the hypothesis that a shared set of ancestral genes underlie regeneration mechanisms in animals. After applying the same method to heat shock studies and getting similar results, we raise broader questions about the ability of comparative RNA-Seq to reveal conserved gene pathways across deep evolutionary relationships.
Collapse
Affiliation(s)
- Noémie C Sierra
- Department of Earth and Planetary Sciences, University of California, Davis, Davis, CA 95616, USA
| | - Noah Olsman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lynn Yi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lea Goentoro
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - David A Gold
- Department of Earth and Planetary Sciences, University of California, Davis, Davis, CA 95616, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
7
|
Xiao L, Wang X, Jiang Y, Ye B, Yu K, Wang Q, Yang X, Zhang J, Ouyang Q, Jin H, Tian E. Lipid and sugar metabolism play an essential role in pollen development and male sterility: a case analysis in Brassica napus. PHYSIOLOGIA PLANTARUM 2024; 176:e14394. [PMID: 38894535 DOI: 10.1111/ppl.14394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/14/2024] [Accepted: 06/04/2024] [Indexed: 06/21/2024]
Abstract
AIMS The genic male sterility (GMS) system is an important strategy for generating heterosis in plants. To better understand the essential role of lipid and sugar metabolism and to identify additional candidates for pollen development and male sterility, transcriptome and metabolome analysis of a GMS line of 1205AB in B. napus was used as a case study. DATA RESOURCES GENERATED To characterize the GMS system, the transcriptome and metabolome profiles were generated for 24 samples and 48 samples of 1205AB in B. napus, respectively. Transcriptome analysis yielded a total of 156.52 Gb of clean data and revealed the expression levels of 109,541 genes and 8,501 novel genes. In addition, a total of 1,353 metabolites were detected in the metabolomic analysis, including 784 in positive ion mode and 569 in negative ion mode. KEY RESULTS A total of 15,635 differentially expressed genes (DEGs) and 83 differential metabolites (DMs) were identified from different comparison groups, most of which were involved in lipid and sugar metabolism. The combination of transcriptome and metabolome analysis revealed 49 orthologous GMS genes related to lipid metabolism and 46 orthologous GMS genes related to sugar metabolism, as well as 45 novel genes. UTILITY OF THE RESOURCE The transcriptome and metabolome profiles and their analysis provide useful reference data for the future discovery of additional GMS genes and the development of more robust male sterility breeding systems for use in the production of plant hybrids.
Collapse
Affiliation(s)
- Lijing Xiao
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Xianya Wang
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Yingfen Jiang
- Institute of Crop Science, Anhui Academy of Agricultural Science, Hefei, China
| | - Botao Ye
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Kunjiang Yu
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Qian Wang
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Xu Yang
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Jinze Zhang
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Qingjing Ouyang
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Hairui Jin
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| | - Entang Tian
- Agricultural College of Guizhou University, Guizhou University, Guiyang, China
| |
Collapse
|
8
|
Ouedraogo WYDD, Ouangraoua A. Orthology and Paralogy Relationships at Transcript Level. J Comput Biol 2024; 31:277-293. [PMID: 38621191 DOI: 10.1089/cmb.2023.0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open
Abstract
Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.
Collapse
Affiliation(s)
| | - Aida Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
9
|
Domazet-Lošo M, Široki T, Šimičević K, Domazet-Lošo T. Macroevolutionary dynamics of gene family gain and loss along multicellular eukaryotic lineages. Nat Commun 2024; 15:2663. [PMID: 38531970 DOI: 10.1038/s41467-024-47017-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
The gain and loss of genes fluctuate over evolutionary time in major eukaryotic clades. However, the full profile of these macroevolutionary trajectories is still missing. To give a more inclusive view on the changes in genome complexity across the tree of life, here we recovered the evolutionary dynamics of gene family gain and loss ranging from the ancestor of cellular organisms to 352 eukaryotic species. We show that in all considered lineages the gene family content follows a common evolutionary pattern, where the number of gene families reaches the highest value at a major evolutionary and ecological transition, and then gradually decreases towards extant organisms. This supports theoretical predictions and suggests that the genome complexity is often decoupled from commonly perceived organismal complexity. We conclude that simplification by gene family loss is a dominant force in Phanerozoic genomes of various lineages, probably underpinned by intense ecological specializations and functional outsourcing.
Collapse
Affiliation(s)
- Mirjana Domazet-Lošo
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia.
| | - Tin Široki
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Korina Šimičević
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
- School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000, Zagreb, Croatia.
| |
Collapse
|
10
|
Beavan A, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci U S A 2024; 121:e2304934120. [PMID: 38147560 PMCID: PMC10769857 DOI: 10.1073/pnas.2304934120] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 11/05/2023] [Indexed: 12/28/2023] Open
Abstract
Pangenomes exhibit remarkable variability in many prokaryotic species, much of which is maintained through the processes of horizontal gene transfer and gene loss. Repeated acquisitions of near-identical homologs can easily be observed across pangenomes, leading to the question of whether these parallel events potentiate similar evolutionary trajectories, or whether the remarkably different genetic backgrounds of the recipients mean that postacquisition evolutionary trajectories end up being quite different. In this study, we present a machine learning method that predicts the presence or absence of genes in the Escherichia coli pangenome based on complex patterns of the presence or absence of other accessory genes within a genome. Our analysis leverages the repeated transfer of genes through the E. coli pangenome to observe patterns of repeated evolution following similar events. We find that the presence or absence of a substantial set of genes is highly predictable from other genes alone, indicating that selection potentiates and maintains gene-gene co-occurrence and avoidance relationships deterministically over long-term bacterial evolution and is robust to differences in host evolutionary history. We propose that at least part of the pangenome can be understood as a set of genes with relationships that govern their likely cohabitants, analogous to an ecosystem's set of interacting organisms. Our findings indicate that intragenomic gene fitness effects may be key drivers of prokaryotic evolution, influencing the repeated emergence of complex gene-gene relationships across the pangenome.
Collapse
Affiliation(s)
- Alan Beavan
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| | - Maria Rosa Domingo-Sananes
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
- School of Science and Technology, Nottingham Trent University, NottinghamNG1 4FQ, United Kingdom
| | - James O. McInerney
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| |
Collapse
|
11
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
12
|
Li S, Nakayama H, Sinha NR. How to utilize comparative transcriptomics to dissect morphological diversity in plants. CURRENT OPINION IN PLANT BIOLOGY 2023; 76:102474. [PMID: 37804608 DOI: 10.1016/j.pbi.2023.102474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 09/11/2023] [Accepted: 09/11/2023] [Indexed: 10/09/2023]
Abstract
Comparative transcriptomics has emerged as a powerful approach that allows us to unravel the genetic basis of organ morphogenesis and its diversification processes during evolution. However, the application of comparative transcriptomics in studying plant morphological diversity addresses challenges such as identifying homologous gene pairs, selecting appropriate developmental stages for comparison, and extracting biologically meaningful networks. Methods such as phylostratigraphy, clustering, and gene co-expression networks are explored to identify functionally equivalent genes, align developmental stages, and uncover gene regulatory relationships. In the current review, we highlight the importance of these approaches in overcoming the complexity of plant genomes, the impact of heterochrony on stage alignment, and the integration of gene networks with additional data for a comprehensive understanding of morphological evolution.
Collapse
Affiliation(s)
- Siyu Li
- Department of Plant Biology, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Hokuto Nakayama
- Department of Plant Biology, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA; Graduate School of Science, Department of Biological Sciences, The University of Tokyo, Science Build. #2, 7-3-1 Hongo Bunkyo-ku Tokyo, 113-0033, Japan
| | - Neelima R Sinha
- Department of Plant Biology, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.
| |
Collapse
|
13
|
Song Y, Miao Z, Brazma A, Papatheodorou I. Benchmarking strategies for cross-species integration of single-cell RNA sequencing data. Nat Commun 2023; 14:6495. [PMID: 37838716 PMCID: PMC10576752 DOI: 10.1038/s41467-023-41855-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 09/21/2023] [Indexed: 10/16/2023] Open
Abstract
The growing number of available single-cell gene expression datasets from different species creates opportunities to explore evolutionary relationships between cell types across species. Cross-species integration of single-cell RNA-sequencing data has been particularly informative in this context. However, in order to do so robustly it is essential to have rigorous benchmarking and appropriate guidelines to ensure that integration results truly reflect biology. Here, we benchmark 28 combinations of gene homology mapping methods and data integration algorithms in a variety of biological settings. We examine the capability of each strategy to perform species-mixing of known homologous cell types and to preserve biological heterogeneity using 9 established metrics. We also develop a new biology conservation metric to address the maintenance of cell type distinguishability. Overall, scANVI, scVI and SeuratV4 methods achieve a balance between species-mixing and biology conservation. For evolutionarily distant species, including in-paralogs is beneficial. SAMap outperforms when integrating whole-body atlases between species with challenging gene homology annotation. We provide our freely available cross-species integration and assessment pipeline to help analyse new data and develop new algorithms.
Collapse
Affiliation(s)
- Yuyao Song
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.
| | - Zhichao Miao
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou, 510005, China
| | - Alvis Brazma
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Irene Papatheodorou
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, United Kingdom.
| |
Collapse
|
14
|
Belcour A, Got J, Aite M, Delage L, Collén J, Frioux C, Leblanc C, Dittami SM, Blanquart S, Markov GV, Siegel A. Inferring and comparing metabolism across heterogeneous sets of annotated genomes using AuCoMe. Genome Res 2023; 33:972-987. [PMID: 37468308 PMCID: PMC10629481 DOI: 10.1101/gr.277056.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 05/23/2023] [Indexed: 07/21/2023]
Abstract
Comparative analysis of genome-scale metabolic networks (GSMNs) may yield important information on the biology, evolution, and adaptation of species. However, it is impeded by the high heterogeneity of the quality and completeness of structural and functional genome annotations, which may bias the results of such comparisons. To address this issue, we developed AuCoMe, a pipeline to automatically reconstruct homogeneous GSMNs from a heterogeneous set of annotated genomes without discarding available manual annotations. We tested AuCoMe with three data sets, one bacterial, one fungal, and one algal, and showed that it successfully reduces technical biases while capturing the metabolic specificities of each organism. Our results also point out shared and divergent metabolic traits among evolutionarily distant algae, underlining the potential of AuCoMe to accelerate the broad exploration of metabolic evolution across the tree of life.
Collapse
Affiliation(s)
- Arnaud Belcour
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| | - Jeanne Got
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Méziane Aite
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Ludovic Delage
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Jonas Collén
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Catherine Leblanc
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Simon M Dittami
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Gabriel V Markov
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Anne Siegel
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| |
Collapse
|
15
|
Titus-McQuillan JE, Nanni AV, McIntyre LM, Rogers RL. Estimating transcriptome complexities across eukaryotes. BMC Genomics 2023; 24:254. [PMID: 37170194 PMCID: PMC10173493 DOI: 10.1186/s12864-023-09326-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/20/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Genomic complexity is a growing field of evolution, with case studies for comparative evolutionary analyses in model and emerging non-model systems. Understanding complexity and the functional components of the genome is an untapped wealth of knowledge ripe for exploration. With the "remarkable lack of correspondence" between genome size and complexity, there needs to be a way to quantify complexity across organisms. In this study, we use a set of complexity metrics that allow for evaluating changes in complexity using TranD. RESULTS We ascertain if complexity is increasing or decreasing across transcriptomes and at what structural level, as complexity varies. In this study, we define three metrics - TpG, EpT, and EpG- to quantify the transcriptome's complexity that encapsulates the dynamics of alternative splicing. Here we compare complexity metrics across 1) whole genome annotations, 2) a filtered subset of orthologs, and 3) novel genes to elucidate the impacts of orthologs and novel genes in transcript model analysis. Effective Exon Number (EEN) issued to compare the distribution of exon sizes within transcripts against random expectations of uniform exon placement. EEN accounts for differences in exon size, which is important because novel gene differences in complexity for orthologs and whole-transcriptome analyses are biased towards low-complexity genes with few exons and few alternative transcripts. CONCLUSIONS With our metric analyses, we are able to quantify changes in complexity across diverse lineages with greater precision and accuracy than previous cross-species comparisons under ortholog conditioning. These analyses represent a step toward whole-transcriptome analysis in the emerging field of non-model evolutionary genomics, with key insights for evolutionary inference of complexity changes on deep timescales across the tree of life. We suggest a means to quantify biases generated in ortholog calling and correct complexity analysis for lineage-specific effects. With these metrics, we directly assay the quantitative properties of newly formed lineage-specific genes as they lower complexity.
Collapse
Affiliation(s)
- James E Titus-McQuillan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| | - Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Rebekah L Rogers
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| |
Collapse
|
16
|
Laslo M, Just J, Angelini DR. Theme and variation in the evolution of insect sex determination. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:162-181. [PMID: 35239250 PMCID: PMC10078687 DOI: 10.1002/jez.b.23125] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 11/24/2021] [Accepted: 01/03/2022] [Indexed: 11/07/2022]
Abstract
The development of dimorphic adult sexes is a critical process for most animals, one that is subject to intense selection. Work in vertebrate and insect model species has revealed that sex determination mechanisms vary widely among animal groups. However, this variation is not uniform, with a limited number of conserved factors. Therefore, sex determination offers an excellent context to consider themes and variations in gene network evolution. Here we review the literature describing sex determination in diverse insects. We have screened public genomic sequence databases for orthologs and duplicates of 25 genes involved in insect sex determination, identifying patterns of presence and absence. These genes and a 3.5 reference set of 43 others were used to infer phylogenies and compared to accepted organismal relationships to examine patterns of congruence and divergence. The function of candidate genes for roles in sex determination (virilizer, female-lethal-2-d, transformer-2) and sex chromosome dosage compensation (male specific lethal-1, msl-2, msl-3) were tested using RNA interference in the milkweed bug, Oncopeltus fasciatus. None of these candidate genes exhibited conserved roles in these processes. Amidst this variation we wish to highlight the following themes for the evolution of sex determination: (1) Unique features within taxa influence network evolution. (2) Their position in the network influences a component's evolution. Our analyses also suggest an inverse association of protein sequence conservation with functional conservation.
Collapse
Affiliation(s)
- Mara Laslo
- Department of Cell Biology, Curriculum Fellows ProgramHarvard Medical School25 Shattuck StBostonMassachusettsUSA
| | - Josefine Just
- Department of Organismic and Evolutionary BiologyHarvard University26 Oxford StCambridgeMassachusettsUSA
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| | - David R. Angelini
- Department of BiologyColby College5734 Mayflower Hill DrWatervilleMaineUSA
| |
Collapse
|
17
|
Kress A, Poch O, Lecompte O, Thompson JD. Real or fake? Measuring the impact of protein annotation errors on estimates of domain gain and loss events. FRONTIERS IN BIOINFORMATICS 2023; 3:1178926. [PMID: 37151482 PMCID: PMC10158824 DOI: 10.3389/fbinf.2023.1178926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.
Collapse
|
18
|
Drobek M. Paralogous Genes Involved in Embryonic Development: Lessons from the Eye and Other Tissues. Genes (Basel) 2022; 13:2082. [PMID: 36360318 PMCID: PMC9690401 DOI: 10.3390/genes13112082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/23/2022] [Accepted: 11/05/2022] [Indexed: 07/09/2024] Open
Abstract
During evolution, gene duplications lead to a naturally increased gene dosage. Duplicated genes can be further retained or eliminated over time by purifying selection pressure. The retention probability is increased by functional diversification and by the acquisition of novel functions. Interestingly, functionally diverged paralogous genes can maintain a certain level of functional redundancy and at least a partial ability to replace each other. In such cases, diversification probably occurred at the level of transcriptional regulation. Nevertheless, some duplicated genes can maintain functional redundancy after duplication and the ability to functionally compensate for the loss of each other. Many of them are involved in proper embryonic development. The development of particular tissues/organs and developmental processes can be more or less sensitive to the overall gene dosage. Alterations in the gene dosage or a decrease below a threshold level may have dramatic phenotypic consequences or even lead to embryonic lethality. The number of functional alleles of particular paralogous genes and their mutual cooperation and interactions influence the gene dosage, and therefore, these factors play a crucial role in development. This review will discuss individual interactions between paralogous genes and gene dosage sensitivity during development. The eye was used as a model system, but other tissues are also included.
Collapse
Affiliation(s)
- Michaela Drobek
- Laboratory of Transcriptional Regulation, Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Praha 4, Czech Republic
- Laboratory of RNA Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Praha 4, Czech Republic
| |
Collapse
|
19
|
Cenci A, Concepción-Hernández M, Guignon V, Angenon G, Rouard M. Genome-Wide Classification and Phylogenetic Analyses of the GDSL-Type Esterase/Lipase (GELP) Family in Flowering Plants. Int J Mol Sci 2022; 23:ijms232012114. [PMID: 36292971 PMCID: PMC9602515 DOI: 10.3390/ijms232012114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/05/2022] [Accepted: 10/07/2022] [Indexed: 11/16/2022] Open
Abstract
GDSL-type esterase/lipase (GELP) enzymes have key functions in plants, such as developmental processes, anther and pollen development, and responses to biotic and abiotic stresses. Genes that encode GELP belong to a complex and large gene family, ranging from tens to more than hundreds of members per plant species. To facilitate functional transfer between them, we conducted a genome-wide classification of GELP in 46 plant species. First, we applied an iterative phylogenetic method using a selected set of representative angiosperm genomes (three monocots and five dicots) and identified 10 main clusters, subdivided into 44 orthogroups (OGs). An expert curation for gene structures, orthogroup composition, and functional annotation was made based on a literature review. Then, using the HMM profiles as seeds, we expanded the classification to 46 plant species. Our results revealed the variable evolutionary dynamics between OGs in which some expanded, mostly through tandem duplications, while others were maintained as single copies. Among these, dicot-specific clusters and specific amplifications in monocots and wheat were characterized. This approach, by combining manual curation and automatic identification, was effective in characterizing a large gene family, allowing the establishment of a classification framework for gene function transfer and a better understanding of the evolutionary history of GELP.
Collapse
Affiliation(s)
- Alberto Cenci
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
- Correspondence: (A.C.); (M.R.)
| | - Mairenys Concepción-Hernández
- Instituto de Biotecnología de las Plantas, Universidad Central “Marta Abreu” de Las Villas (UCLV), Carretera a Camajuaní km 5.5, Santa Clara C.P. 54830, Villa Clara, Cuba
- Research Group Plant Genetics, Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium
| | - Valentin Guignon
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | - Geert Angenon
- Research Group Plant Genetics, Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium
| | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
- Correspondence: (A.C.); (M.R.)
| |
Collapse
|
20
|
Almeida de Jesus D, Batista DM, Monteiro EF, Salzman S, Carvalho LM, Santana K, André T. Structural changes and adaptative evolutionary constraints in FLOWERING LOCUS T and TERMINAL FLOWER1-like genes of flowering plants. Front Genet 2022; 13:954015. [PMID: 36246591 PMCID: PMC9556947 DOI: 10.3389/fgene.2022.954015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
Regulation of flowering is a crucial event in the evolutionary history of angiosperms. The production of flowers is regulated through the integration of different environmental and endogenous stimuli, many of which involve the activation of different genes in a hierarchical and complex signaling network. The FLOWERING LOCUS T/TERMINAL FLOWER 1 (FT/TFL1) gene family is known to regulate important aspects of flowering in plants. To better understand the pivotal events that changed FT and TFL1 functions during the evolution of angiosperms, we reconstructed the ancestral sequences of FT/TFL1-like genes and predicted protein structures through in silico modeling to identify determinant sites that evolved in both proteins and allowed the adaptative diversification in the flowering phenology and developmental processes. In addition, we demonstrate that the occurrence of destabilizing mutations in residues located at the phosphatidylcholine binding sites of FT structure are under positive selection, and some residues of 4th exon are under negative selection, which is compensated by the occurrence of stabilizing mutations in key regions and the P-loop to maintain the overall protein stability. Our results shed light on the evolutionary history of key genes involved in the diversification of angiosperms.
Collapse
Affiliation(s)
- Deivid Almeida de Jesus
- Institute of Biology Genetics Graduate Program, Federal University of Rio de Janeiro Rio de Janeiro, Rio de Janeiro, Brazil
| | - Darlisson Mesquista Batista
- Programa de Pós-Graduação em Biodiversidade, Universidade Federal do Oeste do Pará Santarém, Pará, Santarém, Brazil
| | - Elton Figueira Monteiro
- Programa de Pós-Graduação em Biodiversidade, Universidade Federal do Oeste do Pará Santarém, Pará, Santarém, Brazil
| | - Shayla Salzman
- School of Integrative Plant Sciences. Section of Plant Biology. Cornell University Ithaca, New York, NY, United States
| | - Lucas Miguel Carvalho
- Center for Computing in Engineering and Sciences, State University of Campinas. Campinas, São Paulo, Brazil
| | - Kauê Santana
- Institute of Biodiversity, Federal University of Western Pará Santarém Pará, Santarém, Brazil
- *Correspondence: Kauê Santana, ; Thiago André,
| | - Thiago André
- Botany Department, University of Brasília, Brasília, Brazil
- *Correspondence: Kauê Santana, ; Thiago André,
| |
Collapse
|
21
|
Ahsan F, Yan Z, Precup D, Blanchette M. PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information. Bioinformatics 2022; 38:i299-i306. [PMID: 35758792 PMCID: PMC9235490 DOI: 10.1093/bioinformatics/btac259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faizy Ahsan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Zichao Yan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Doina Precup
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | | |
Collapse
|
22
|
Tanabe TS, Dahl C. HMS-S-S: a tool for the identification of sulfur metabolism-related genes and analysis of operon structures in genome and metagenome assemblies. Mol Ecol Resour 2022; 22:2758-2774. [PMID: 35579058 DOI: 10.1111/1755-0998.13642] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/25/2022] [Accepted: 05/11/2022] [Indexed: 11/26/2022]
Abstract
Sulfur compounds are used in a variety of biological processes including respiration and photosynthesis. Sulfide and sulfur compounds of intermediary oxidation state can serve as electron donors for lithotrophic growth while sulfate, thiosulfate and sulfur are used as electron acceptors in anaerobic respiration. The biochemistry underlying the manifold transformations of inorganic sulfur compounds occurring in sulfur metabolizing prokaryotes is astonishingly complex and knowledge about it has immensely increased over the last years. The advent of next-generation sequencing approaches as well as the significant increase of data availability in public databases has driven focus of environmental microbiology to probing the metabolic capacity of microbial communities by analysis of this sequence data. To facilitate these analyses, we created HMS-S-S, a comprehensive equivalogous hidden Markov model (HMM)-supported tool. Protein sequences related to sulfur compound oxidation, reduction, transport and intracellular transfer are efficiently detected and related enzymes involved in dissimilatory sulfur oxidation as opposed to sulfur compound reduction can be confidently distinguished. HMM search results are coupled to corresponding genes, which allows analysis of co-occurrence, synteny and genomic neighborhood. The HMMs were validated on an annotated test dataset and by cross-validation. We also proved its performance by exploring meta-assembled genomes isolated from samples from environments with active sulfur cycling, including members of the cable bacteria, novel Acidobacteria and assemblies from a sulfur-rich glacier, and were able to replicate and extend previous reports.
Collapse
Affiliation(s)
- Tomohisa Sebastian Tanabe
- Institut für Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Christiane Dahl
- Institut für Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
23
|
Long HS, Greenaway S, Powell G, Mallon AM, Lindgren CM, Simon MM. Making sense of the linear genome, gene function and TADs. Epigenetics Chromatin 2022; 15:4. [PMID: 35090532 PMCID: PMC8800309 DOI: 10.1186/s13072-022-00436-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 01/06/2022] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Topologically associating domains (TADs) are thought to act as functional units in the genome. TADs co-localise genes and their regulatory elements as well as forming the unit of genome switching between active and inactive compartments. This has led to the speculation that genes which are required for similar processes may fall within the same TADs, allowing them to share regulatory programs and efficiently switch between chromatin compartments. However, evidence to link genes within TADs to the same regulatory program is limited. RESULTS We investigated the functional similarity of genes which fall within the same TAD. To do this we developed a TAD randomisation algorithm to generate sets of "random TADs" to act as null distributions. We found that while pairs of paralogous genes are enriched in TADs overall, they are largely depleted in TADs with CCCTC-binding factor (CTCF) ChIP-seq peaks at both boundaries. By assessing gene constraint as a proxy for functional importance we found that genes which singly occupy a TAD have greater functional importance than genes which share a TAD, and these genes are enriched for developmental processes. We found little evidence that pairs of genes in CTCF bound TADs are more likely to be co-expressed or share functional annotations than can be explained by their linear proximity alone. CONCLUSIONS These results suggest that algorithmically defined TADs consist of two functionally different groups, those which are bound by CTCF and those which are not. We detected no association between genes sharing the same CTCF TADs and increased co-expression or functional similarity, other than that explained by linear genome proximity. We do, however, find that functionally important genes are more likely to fall within a TAD on their own suggesting that TADs play an important role in the insulation of these genes.
Collapse
Affiliation(s)
- Helen S Long
- Nuffield Department of Medicine, University of Oxford, Oxford, UK.
- Mammalian Genetics Unit, Harwell Institute, Didcot, UK.
| | | | - George Powell
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Mammalian Genetics Unit, Harwell Institute, Didcot, UK
| | | | - Cecilia M Lindgren
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
- Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | |
Collapse
|
24
|
Vazquez JM, Pena MT, Muhammad B, Kraft M, Adams LB, Lynch VJ. Parallel evolution of reduced cancer risk and tumor suppressor duplications in Xenarthra. eLife 2022; 11:82558. [PMID: 36480266 PMCID: PMC9810328 DOI: 10.7554/elife.82558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
The risk of developing cancer is correlated with body size and lifespan within species, but there is no correlation between cancer and either body size or lifespan between species indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Previously we showed that several large bodied Afrotherian lineages evolved reduced intrinsic cancer risk, particularly elephants and their extinct relatives (Proboscideans), coincident with pervasive duplication of tumor suppressor genes (Vazquez and Lynch, 2021). Unexpectedly, we also found that Xenarthrans (sloths, armadillos, and anteaters) evolved very low intrinsic cancer risk. Here, we show that: (1) several Xenarthran lineages independently evolved large bodies, long lifespans, and reduced intrinsic cancer risk; (2) the reduced cancer risk in the stem lineages of Xenarthra and Pilosa coincided with bursts of tumor suppressor gene duplications; (3) cells from sloths proliferate extremely slowly while Xenarthran cells induce apoptosis at very low doses of DNA damaging agents; and (4) the prevalence of cancer is extremely low Xenarthrans, and cancer is nearly absent from armadillos. These data implicate the duplication of tumor suppressor genes in the evolution of remarkably large body sizes and decreased cancer risk in Xenarthrans and suggest they are a remarkably cancer-resistant group of mammals.
Collapse
Affiliation(s)
- Juan Manuel Vazquez
- Department of Integrative Biology, Valley Life Sciences, University of California, BerkeleyBerkeleyUnited States
| | - Maria T Pena
- United States Department of Health and Human Services, Health Resources and Services Administration, Health Systems Bureau, National Hansen's Disease ProgramBaton RougeUnited States
| | - Baaqeyah Muhammad
- Department of Biological Sciences, University at Buffalo, SUNYBuffaloUnited States
| | - Morgan Kraft
- Department of Biological Sciences, University at Buffalo, SUNYBuffaloUnited States
| | - Linda B Adams
- United States Department of Health and Human Services, Health Resources and Services Administration, Health Systems Bureau, National Hansen's Disease ProgramBaton RougeUnited States
| | - Vincent J Lynch
- Department of Biological Sciences, University at Buffalo, SUNYBuffaloUnited States
| |
Collapse
|
25
|
Begum T, Serrano‐Serrano ML, Robinson‐Rechavi M. Performance of a phylogenetic independent contrast method and an improved pairwise comparison under different scenarios of trait evolution after speciation and duplication. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Martha Liliana Serrano‐Serrano
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Marc Robinson‐Rechavi
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| |
Collapse
|
26
|
Mo N, Zhang X, Shi W, Yu G, Chen X, Yang JR. Bidirectional Genetic Control of Phenotypic Heterogeneity and Its Implication for Cancer Drug Resistance. Mol Biol Evol 2021; 38:1874-1887. [PMID: 33355660 PMCID: PMC8097262 DOI: 10.1093/molbev/msaa332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Negative genetic regulators of phenotypic heterogeneity, or phenotypic capacitors/stabilizers, elevate population average fitness by limiting deviation from the optimal phenotype and increase the efficacy of natural selection by enhancing the phenotypic differences among genotypes. Stabilizers can presumably be switched off to release phenotypic heterogeneity in the face of extreme or fluctuating environments to ensure population survival. This task could, however, also be achieved by positive genetic regulators of phenotypic heterogeneity, or "phenotypic diversifiers," as shown by recently reported evidence that a bacterial divisome factor enhances antibiotic resistance. We hypothesized that such active creation of phenotypic heterogeneity by diversifiers, which is functionally independent of stabilizers, is more common than previously recognized. Using morphological phenotypic data from 4,718 single-gene knockout strains of Saccharomyces cerevisiae, we systematically identified 324 stabilizers and 160 diversifiers and constructed a bipartite network between these genes and the morphological traits they control. Further analyses showed that, compared with stabilizers, diversifiers tended to be weaker and more promiscuous (regulating more traits) regulators targeting traits unrelated to fitness. Moreover, there is a general division of labor between stabilizers and diversifiers. Finally, by incorporating NCI-60 human cancer cell line anticancer drug screening data, we found that human one-to-one orthologs of yeast diversifiers/stabilizers likely regulate the anticancer drug resistance of human cancer cell lines, suggesting that these orthologs are potential targets for auxiliary treatments. Our study therefore highlights stabilizers and diversifiers as the genetic regulators for the bidirectional control of phenotypic heterogeneity as well as their distinct evolutionary roles and functional independence.
Collapse
Affiliation(s)
- Ning Mo
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyu Zhang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenjun Shi
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Gongwang Yu
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoshu Chen
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| | - Jian-Rong Yang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, China
- RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| |
Collapse
|
27
|
Begum T, Robinson-Rechavi M. Special Care Is Needed in Applying Phylogenetic Comparative Methods to Gene Trees with Speciation and Duplication Nodes. Mol Biol Evol 2021; 38:1614-1626. [PMID: 33169790 PMCID: PMC8042747 DOI: 10.1093/molbev/msaa288] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
How gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.
Collapse
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
28
|
Tarashansky AJ, Musser JM, Khariton M, Li P, Arendt D, Quake SR, Wang B. Mapping single-cell atlases throughout Metazoa unravels cell type evolution. eLife 2021; 10:e66747. [PMID: 33944782 PMCID: PMC8139856 DOI: 10.7554/elife.66747] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/30/2021] [Indexed: 12/11/2022] Open
Abstract
Comparing single-cell transcriptomic atlases from diverse organisms can elucidate the origins of cellular diversity and assist the annotation of new cell atlases. Yet, comparison between distant relatives is hindered by complex gene histories and diversifications in expression programs. Previously, we introduced the self-assembling manifold (SAM) algorithm to robustly reconstruct manifolds from single-cell data (Tarashansky et al., 2019). Here, we build on SAM to map cell atlas manifolds across species. This new method, SAMap, identifies homologous cell types with shared expression programs across distant species within phyla, even in complex examples where homologous tissues emerge from distinct germ layers. SAMap also finds many genes with more similar expression to their paralogs than their orthologs, suggesting paralog substitution may be more common in evolution than previously appreciated. Lastly, comparing species across animal phyla, spanning sponge to mouse, reveals ancient contractile and stem cell families, which may have arisen early in animal evolution.
Collapse
Affiliation(s)
| | - Jacob M Musser
- European Molecular Biology Laboratory, Developmental Biology UnitHeidelbergGermany
| | | | - Pengyang Li
- Department of Bioengineering, Stanford UniversityStanfordUnited States
| | - Detlev Arendt
- European Molecular Biology Laboratory, Developmental Biology UnitHeidelbergGermany
- Centre for Organismal Studies, University of HeidelbergHeidelbergGermany
| | - Stephen R Quake
- Department of Bioengineering, Stanford UniversityStanfordUnited States
- Department of Applied Physics, Stanford UniversityStanfordUnited States
- Chan Zuckerberg BiohubSan FranciscoUnited States
| | - Bo Wang
- Department of Bioengineering, Stanford UniversityStanfordUnited States
- Department of Developmental Biology, Stanford University School of MedicineStanfordUnited States
| |
Collapse
|
29
|
Schaller D, Geiß M, Stadler PF, Hellmuth M. Complete Characterization of Incorrect Orthology Assignments in Best Match Graphs. J Math Biol 2021; 82:20. [PMID: 33606106 PMCID: PMC7894253 DOI: 10.1007/s00285-021-01564-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 09/23/2020] [Accepted: 12/21/2020] [Indexed: 02/06/2023]
Abstract
Genome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least [Formula: see text] of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.
Collapse
Affiliation(s)
- David Schaller
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Manuela Geiß
- Software Competence Center Hagenberg GmbH, Softwarepark 21, A-4232 Hagenberg, Austria
| | - Peter F. Stadler
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Inst. f. Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| | - Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, SE 106 91 Stockholm, Sweden
| |
Collapse
|
30
|
Rosselli R, La Porta N, Muresu R, Stevanato P, Concheri G, Squartini A. Pangenomics of the Symbiotic Rhizobiales. Core and Accessory Functions Across a Group Endowed with High Levels of Genomic Plasticity. Microorganisms 2021; 9:microorganisms9020407. [PMID: 33669391 PMCID: PMC7920277 DOI: 10.3390/microorganisms9020407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/10/2021] [Accepted: 02/11/2021] [Indexed: 11/16/2022] Open
Abstract
Pangenome analyses reveal major clues on evolutionary instances and critical genome core conservation. The order Rhizobiales encompasses several families with rather disparate ecological attitudes. Among them, Rhizobiaceae, Bradyrhizobiaceae, Phyllobacteriacreae and Xanthobacteriaceae, include members proficient in mutualistic symbioses with plants based on the bacterial conversion of N2 into ammonia (nitrogen-fixation). The pangenome of 12 nitrogen-fixing plant symbionts of the Rhizobiales was analyzed yielding total 37,364 loci, with a core genome constituting 700 genes. The percentage of core genes averaged 10.2% over single genomes, and between 5% to 7% were found to be plasmid-associated. The comparison between a representative reference genome and the core genome subset, showed the core genome highly enriched in genes for macromolecule metabolism, ribosomal constituents and overall translation machinery, while membrane/periplasm-associated genes, and transport domains resulted under-represented. The analysis of protein functions revealed that between 1.7% and 4.9% of core proteins could putatively have different functions.
Collapse
Affiliation(s)
- Riccardo Rosselli
- Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute of Sea Research, NL-1790 AB Den Burg, The Netherlands;
- Departamento de Fisiología, Genética y Microbiología, Universidad de Alicante, 03690 Alicante, Spain
| | - Nicola La Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, 38098 San Michele all’Adige, Italy;
- MOUNTFOR Project Centre, European Forest Institute, 38098 San Michele all’Adige, Italy
| | - Rosella Muresu
- Institute of Animal Production Systems in Mediterranean Environments-National Research Council, 07040 Sassari, Italy;
| | - Piergiorgio Stevanato
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
| | - Giuseppe Concheri
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
| | - Andrea Squartini
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, 35020 Legnaro, Italy; (P.S.); (G.C.)
- Correspondence: ; Tel.: +39-049-8272-923
| |
Collapse
|
31
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
32
|
Kim J, Coradetti ST, Kim YM, Gao Y, Yaegashi J, Zucker JD, Munoz N, Zink EM, Burnum-Johnson KE, Baker SE, Simmons BA, Skerker JM, Gladden JM, Magnuson JK. Multi-Omics Driven Metabolic Network Reconstruction and Analysis of Lignocellulosic Carbon Utilization in Rhodosporidium toruloides. Front Bioeng Biotechnol 2021; 8:612832. [PMID: 33585414 PMCID: PMC7873862 DOI: 10.3389/fbioe.2020.612832] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 12/04/2020] [Indexed: 01/11/2023] Open
Abstract
An oleaginous yeast Rhodosporidium toruloides is a promising host for converting lignocellulosic biomass to bioproducts and biofuels. In this work, we performed multi-omics analysis of lignocellulosic carbon utilization in R. toruloides and reconstructed the genome-scale metabolic network of R. toruloides. High-quality metabolic network models for model organisms and orthologous protein mapping were used to build a draft metabolic network reconstruction. The reconstruction was manually curated to build a metabolic model using functional annotation and multi-omics data including transcriptomics, proteomics, metabolomics, and RB-TDNA sequencing. The multi-omics data and metabolic model were used to investigate R. toruloides metabolism including lipid accumulation and lignocellulosic carbon utilization. The developed metabolic model was validated against high-throughput growth phenotyping and gene fitness data, and further refined to resolve the inconsistencies between prediction and data. We believe that this is the most complete and accurate metabolic network model available for R. toruloides to date.
Collapse
Affiliation(s)
- Joonhoon Kim
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Samuel T Coradetti
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Sandia National Laboratories, Livermore, CA, United States
| | - Young-Mo Kim
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Yuqian Gao
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Junko Yaegashi
- Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Jeremy D Zucker
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Nathalie Munoz
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Erika M Zink
- Pacific Northwest National Laboratory, Richland, WA, United States
| | - Kristin E Burnum-Johnson
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Scott E Baker
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| | - Blake A Simmons
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Jeffrey M Skerker
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, United States
| | - John M Gladden
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Sandia National Laboratories, Livermore, CA, United States
| | - Jon K Magnuson
- Department of Energy, Agile BioFoundry, Emeryville, CA, United States.,Department of Energy, Joint BioEnergy Institute, Emeryville, CA, United States.,Pacific Northwest National Laboratory, Richland, WA, United States
| |
Collapse
|