1
|
Mwamburi SM, Kawato S, Furukawa M, Konishi K, Nozaki R, Hirono I, Kondo H. De Novo Assembly and Annotation of the Siganus fuscescens (Houttuyn, 1782) Genome: Marking a Pioneering Advance for the Siganidae Family. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2024; 26:902-916. [PMID: 38850360 DOI: 10.1007/s10126-024-10325-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024]
Abstract
This study presents the first draft genome of Siganus fuscescens, and thereby establishes the first whole-genome sequence for a species in the Siganidae family. Leveraging both long and short read sequencing technologies, i.e., Oxford Nanopore and Illumina sequencing, we successfully assembled a mitogenome spanning 16.494 Kb and a first haploid genome encompassing 498 Mb. The assembled genome accounted for a 99.6% of the estimated genome size and was organized into 164 contigs with an N50 of 7.2 Mb. This genome assembly showed a GC content of 42.9% and a high Benchmarking Universal Single-Copy Orthologue (BUSCO) completeness score of 99.5% using actinopterygii_odb10 lineage, thereby meeting stringent quality standards. In addition to its structural aspects, our study also examined the functional genomics of this species, including the intricate capacity to biosynthesize long-chain polyunsaturated fatty acids (LC-PUFAs) and secrete venom. Notably, our analyses revealed various repeats elements, which collectively constituted 17.43% of the genome. Moreover, annotation of 28,351 genes uncovered both shared genetic signatures and those that are unique to S. fuscescens. Our assembled genome also displayed a moderate prevalence of gene duplication compared to other fish species, which suggests that this species has a distinctive evolutionary trajectory and potentially unique functional constraints. Taken altogether, this genomic resource establishes a robust foundation for future research on the biology, evolution, and the aquaculture potential of S. fuscescens.
Collapse
Affiliation(s)
- Samuel Mwakisha Mwamburi
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
- Department of Fisheries, Kenya Marine and Fisheries Research Institute, P.O BOX 81651-80100, Mombasa, Kenya
| | - Satoshi Kawato
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
| | - Miho Furukawa
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
| | - Kayo Konishi
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
| | - Reiko Nozaki
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
| | - Ikuo Hirono
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan
| | - Hidehiro Kondo
- Department of Marine Biosciences, Tokyo University of Marine Science and Technology, Konan 4-5-7, Minato-ku, Tokyo, 108-8477, Japan.
| |
Collapse
|
2
|
Gutiérrez EG, Maldonado JE, Castellanos-Morales G, Eguiarte LE, Martínez-Méndez N, Ortega J. Unraveling genomic features and phylogenomics through the analysis of three Mexican endemic Myotis genomes. PeerJ 2024; 12:e17651. [PMID: 38993980 PMCID: PMC11238727 DOI: 10.7717/peerj.17651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/07/2024] [Indexed: 07/13/2024] Open
Abstract
Background Genomic resource development for non-model organisms is rapidly progressing, seeking to uncover molecular mechanisms and evolutionary adaptations enabling thriving in diverse environments. Limited genomic data for bat species hinder insights into their evolutionary processes, particularly within the diverse Myotis genus of the Vespertilionidae family. In Mexico, 15 Myotis species exist, with three-M. vivesi, M. findleyi, and M. planiceps-being endemic and of conservation concern. Methods We obtained samples of Myotis vivesi, M. findleyi, and M. planiceps for genomic analysis. Each of three genomic DNA was extracted, sequenced, and assembled. The scaffolding was carried out utilizing the M. yumanensis genome via a genome-referenced approach within the ntJoin program. GapCloser was employed to fill gaps. Repeat elements were characterized, and gene prediction was done via ab initio and homology methods with MAKER pipeline. Functional annotation involved InterproScan, BLASTp, and KEGG. Non-coding RNAs were annotated with INFERNAL, and tRNAscan-SE. Orthologous genes were clustered using Orthofinder, and a phylogenomic tree was reconstructed using IQ-TREE. Results We present genome assemblies of these endemic species using Illumina NovaSeq 6000, each exceeding 2.0 Gb, with over 90% representing single-copy genes according to BUSCO analyses. Transposable elements, including LINEs and SINEs, constitute over 30% of each genome. Helitrons, consistent with Vespertilionids, were identified. Values around 20,000 genes from each of the three assemblies were derived from gene annotation and their correlation with specific functions. Comparative analysis of orthologs among eight Myotis species revealed 20,820 groups, with 4,789 being single copy orthogroups. Non-coding RNA elements were annotated. Phylogenomic tree analysis supported evolutionary chiropterans' relationships. These resources contribute significantly to understanding gene evolution, diversification patterns, and aiding conservation efforts for these endangered bat species.
Collapse
Affiliation(s)
- Edgar G. Gutiérrez
- Departamento de Zoología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, Mexico
| | - Jesus E. Maldonado
- Center for Conservation Genomics, Smithsonian’s National Zoo and Conservation Biology Institute, Washington, D.C., United States of America
| | - Gabriela Castellanos-Morales
- Departamento de Conservación de la Biodiversidad, El Colegio de la Frontera Sur, Unidad Villahermosa (ECOSUR-Villahermosa), Villahermosa, Tabasco, Mexico
| | - Luis E. Eguiarte
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
| | - Norberto Martínez-Méndez
- Departamento de Zoología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, Mexico
| | - Jorge Ortega
- Departamento de Zoología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, Mexico
| |
Collapse
|
3
|
Guerreiro R, Bonthala VS, Schlüter U, Hoang NV, Triesch S, Schranz ME, Weber APM, Stich B. A genomic panel for studying C3-C4 intermediate photosynthesis in the Brassiceae tribe. PLANT, CELL & ENVIRONMENT 2023; 46:3611-3627. [PMID: 37431820 DOI: 10.1111/pce.14662] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/18/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023]
Abstract
Research on C4 and C3-C4 photosynthesis has attracted significant attention because the understanding of the genetic underpinnings of these traits will support the introduction of its characteristics into commercially relevant crop species. We used a panel of 19 taxa of 18 Brassiceae species with different photosynthesis characteristics (C3 and C3-C4) with the following objectives: (i) create draft genome assemblies and annotations, (ii) quantify orthology levels using synteny maps between all pairs of taxa, (iii) describe the phylogenetic relatedness across all the species, and (iv) track the evolution of C3-C4 intermediate photosynthesis in the Brassiceae tribe. Our results indicate that the draft de novo genome assemblies are of high quality and cover at least 90% of the gene space. Therewith we more than doubled the sampling depth of genomes of the Brassiceae tribe that comprises commercially important as well as biologically interesting species. The gene annotation generated high-quality gene models, and for most genes extensive upstream sequences are available for all taxa, yielding potential to explore variants in regulatory sequences. The genome-based phylogenetic tree of the Brassiceae contained two main clades and indicated that the C3-C4 intermediate photosynthesis has evolved five times independently. Furthermore, our study provides the first genomic support of the hypothesis that Diplotaxis muralis is a natural hybrid of D. tenuifolia and D. viminea. Altogether, the de novo genome assemblies and the annotations reported in this study are a valuable resource for research on the evolution of C3-C4 intermediate photosynthesis.
Collapse
Affiliation(s)
- Ricardo Guerreiro
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
| | - Venkata Suresh Bonthala
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
| | - Urte Schlüter
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - Nam V Hoang
- Biosystematics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Sebastian Triesch
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - M Eric Schranz
- Biosystematics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - Benjamin Stich
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| |
Collapse
|
4
|
Rivetti C, Houghton J, Basili D, Hodges G, Campos B. Genes-to-Pathways Species Conservation Analysis: Enabling the Exploration of Conservation of Biological Pathways and Processes Across Species. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2023; 42:1152-1166. [PMID: 36861224 DOI: 10.1002/etc.5600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/19/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
The last two decades have witnessed a strong momentum toward integration of cell-based and computational approaches in safety assessments. This is fueling a global regulatory paradigm shift toward reduction and replacement of the use of animals in toxicity tests while promoting the use of new approach methodologies. The understanding of conservation of molecular targets and pathways provides an opportunity to extrapolate effects across species and ultimately to determine the taxonomic applicability domain of assays and biological effects. Despite the wealth of genome-linked data available, there is a compelling need for improved accessibility, while ensuring that it reflects the underpinning biology. We present the novel pipeline Genes-to-Pathways Species Conservation Analysis (G2P-SCAN) to further support understanding on cross-species extrapolation of biological processes. This R package extracts, synthetizes, and structures the data available from different databases, that is, gene orthologs, protein families, entities, and reactions, linked to human genes and respective pathways across six relevant model species. The use of G2P-SCAN enables the overall analysis of orthology and functional families to substantiate the identification of conservation and susceptibility at the pathway level. In the present study we discuss five case studies, demonstrating the validity of the developed pipeline and its potential use as species extrapolation support. We foresee this pipeline will provide valuable biological insights and create space for the use of mechanistically based data to inform potential species susceptibility for research and safety decision purposes. Environ Toxicol Chem 2023;42:1152-1166. © 2023 UNILEVER GLOBAL IP LTD. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.
Collapse
Affiliation(s)
- Claudia Rivetti
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Bedfordshire, United Kingdom
| | - Jade Houghton
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Bedfordshire, United Kingdom
| | - Danilo Basili
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Bedfordshire, United Kingdom
| | - Geoff Hodges
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Bedfordshire, United Kingdom
| | - Bruno Campos
- Safety and Environmental Assurance Centre, Unilever, Colworth Science Park, Bedfordshire, United Kingdom
| |
Collapse
|
5
|
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Hiller M, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Integrating gene annotation with orthology inference at scale. Science 2023; 380:eabn3107. [PMID: 37104600 DOI: 10.1126/science.abn3107] [Citation(s) in RCA: 45] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method that integrates structural gene annotation and orthology inference. TOGA implements a different paradigm to infer orthologous loci, improves ortholog detection and annotation of conserved genes compared with state-of-the-art methods, and handles even highly fragmented assemblies. TOGA scales to hundreds of genomes, which we demonstrate by applying it to 488 placental mammal and 501 bird assemblies, creating the largest comparative gene resources so far. Additionally, TOGA detects gene losses, enables selection screens, and automatically provides a superior measure of mammalian genome quality. TOGA is a powerful and scalable method to annotate and compare genes in the genomic era.
Collapse
Affiliation(s)
- Bogdan M Kirilenko
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - David Jebb
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Moritz Blumer
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Ariadna E Morales
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Alexis-Walid Ahmed
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Dimitrios-Georgios Kontopoulos
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Leon Hilgers
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Liu K, Chen Q, Huang GH. An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF. Genes (Basel) 2023; 14:421. [PMID: 36833348 PMCID: PMC9957060 DOI: 10.3390/genes14020421] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/10/2023] Open
Abstract
Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.
Collapse
Affiliation(s)
- Kai Liu
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| |
Collapse
|
7
|
Yepes-Blandón JA, Bian C, Benítez-Galeano MJ, Aristizabal-Regino JL, Estrada-Posada AL, Mir D, Vásquez-Machado G, Atencio-García VJ, Shi Q, Rodríguez-Osorio N. Draft genome assembly for the colombian freshwater bocachico fish, Prochilodus magdalenae. Front Genet 2023; 13:989788. [PMID: 36744175 PMCID: PMC9893009 DOI: 10.3389/fgene.2022.989788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 12/13/2022] [Indexed: 01/21/2023] Open
Abstract
We report the first draft genome assembly for Prochilodus magdalenae, the leading representative species of the Prochilodontidae family in Colombia. This 1.2-Gb assembly, with a GC content of 42.0% and a repetitive content of around 31.0%, is in the range of previously reported characid species genomes. Annotation identified 34,725 nuclear genes, and BUSCO completeness value was 94.9%. Gene ontology and primary metabolic pathway annotations indicate similar gene profiles for P. magdalenae and the closest species with annotated genomes: blind cave fish (Astyanax mexicanus) and red piranha (Pygocentrus nattereri). A comparative analysis showed similar genome traits to other characid species. The fully sequenced and annotated mitochondrial genome reproduces the taxonomic classification of P. magdalenae and confirms the low mitochondrial genetic divergence inside the Prochilodus genus. Phylogenomic analysis, using nuclear single-copy orthologous genes, also confirmed the evolutionary position of the species. This genome assembly provides a high-resolution genetic resource for sustainable P. magdalenae management in Colombia and, as the first genome assembly for the Prochilodontidae family, will contribute to fish genomics throughout South America.
Collapse
Affiliation(s)
| | - Chao Bian
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, Shenzhen, Guangdong, China
| | - María José Benítez-Galeano
- Unidad de Genómica y Bioinformática, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | | | | | - Daiana Mir
- Unidad de Genómica y Bioinformática, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| | | | | | - Qiong Shi
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, Shenzhen, Guangdong, China
| | - Nélida Rodríguez-Osorio
- Unidad de Genómica y Bioinformática, Departamento de Ciencias Biológicas, CENUR Litoral Norte, Universidad de la República, Salto, Uruguay
| |
Collapse
|
8
|
Kuznetsov D, Tegenfeldt F, Manni M, Seppey M, Berkeley M, Kriventseva E, Zdobnov EM. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 2022; 51:D445-D451. [PMID: 36350662 PMCID: PMC9825584 DOI: 10.1093/nar/gkac998] [Citation(s) in RCA: 81] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/15/2022] [Accepted: 10/26/2022] [Indexed: 11/10/2022] Open
Abstract
OrthoDB provides evolutionary and functional annotations of genes in a diverse sampling of eukaryotes, prokaryotes, and viruses. Genomics continues to accelerate our exploration of gene diversity and orthology is the most precise way of bridging gene functional knowledge with the rapidly expanding universe of genomic sequences. OrthoDB samples the most diverse organisms with the best quality genomics data to provide the leading coverage of species diversity. This update of the underlying data to over 18 000 prokaryotes and almost 2000 eukaryotes with over 100 million genes propels the coverage to another level. This achievement also demonstrates the scalability of the underlying OrthoLoger software for delineation of orthologs, freely available from https://orthologer.ezlab.org. In addition to the ab-initio computations of gene orthology used for the OrthoDB release, the OrthoLoger software allows mapping of novel gene sets to precomputed orthologs and thereby links to their annotations. The LEMMI-style benchmarking of OrthoLoger ensures its state-of-the-art performance and is available from https://lemortho.ezlab.org. The OrthoDB web interface has been further developed to include a pairwise orthology view from any gene to any other sampled species. OrthoDB-computed evolutionary annotations as well as extensively collated functional annotations can be accessed via REST API or SPARQL/RDF, downloaded or browsed online from https://www.orthodb.org.
Collapse
Affiliation(s)
| | | | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mathieu Seppey
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Matthew Berkeley
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | | | - Evgeny M Zdobnov
- To whom correspondence should be addressed. Tel: +41 22 379 59 73;
| |
Collapse
|
9
|
Nevers Y, Jones TEM, Jyothi D, Yates B, Ferret M, Portell-Silva L, Codo L, Cosentino S, Marcet-Houben M, Vlasova A, Poidevin L, Kress A, Hickman M, Persson E, Piližota I, Guijarro-Clarke C, Iwasaki W, Lecompte O, Sonnhammer E, Roos DS, Gabaldón T, Thybert D, Thomas PD, Hu Y, Emms DM, Bruford E, Capella-Gutierrez S, Martin MJ, Dessimoz C, Altenhoff A. The Quest for Orthologs orthology benchmark service in 2022. Nucleic Acids Res 2022; 50:W623-W632. [PMID: 35552456 PMCID: PMC9252809 DOI: 10.1093/nar/gkac330] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/07/2022] [Accepted: 04/30/2022] [Indexed: 11/15/2022] Open
Abstract
The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.
Collapse
Affiliation(s)
- Yannis Nevers
- To whom correspondence should be addressed. Tel: +41 21 692 5449;
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dushyanth Jyothi
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Bethan Yates
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Meritxell Ferret
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laura Portell-Silva
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Laia Codo
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain
| | - Salvatore Cosentino
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan
| | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Anna Vlasova
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Laetitia Poidevin
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France,BiGEst-ICube Platform, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Mark Hickman
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emma Persson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Guijarro-Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, the University of Tokyo, Tokyo, Japan,Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, Strasbourg, France
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Plaça Eusebi Güell, 1-3 08034 Barcelona, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain,Centro de Investigaciones Biomédicas en Red de Enfermedades Infecciosas, Barcelona, Spain
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90032, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - David M Emms
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK
| | - Elspeth Bruford
- HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | | | - Maria J Martin
- Protein Function development, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Department of Computer Science, University College London, London, UK,Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Adrian Altenhoff
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland,Computer Science Department, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
Inoue J. ORTHOSCOPE*: a phylogenetic pipeline to infer gene histories from genome-wide data. Mol Biol Evol 2021; 39:6400256. [PMID: 34662403 PMCID: PMC8763121 DOI: 10.1093/molbev/msab301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative genome-scale analyses of protein-coding gene sequences are employed to examine evidence for whole-genome duplication and horizontal gene transfer. For this purpose, an orthogroup should be delineated to infer evolutionary history regarding each gene, and results of all orthogroup analyses need to be integrated to infer a genome-scale history. An orthogroup is a set of genes descended from a single gene in the last common ancestor of all species under consideration. However, such analyses confront several problems: (1) analytical pipelines to infer all gene histories with methods comparing species and gene trees are not fully developed, and (2) without detailed analyses within orthogroups, evolutionary events of paralogous genes in the same orthogroup cannot be distinguished for genome-wide integration of results derived from multiple orthogroup analyses. Here I present an analytical pipeline, ORTHOSCOPE* (star), to infer evolutionary histories of animal/plant genes from genome-scale data. ORTHOSCOPE* estimates a tree for a specified gene, detects speciation/gene duplication events that occurred at nodes belonging to only one lineage leading to a species of interest, and then integrates results derived from gene trees estimated for all query genes in genome-wide data. Thus, ORTHOSCOPE* can be used to detect species nodes just after whole genome duplications as a first step of comparative genomic analyses. Moreover, by examining the presence or absence of genes belonging to species lineages with dense taxon sampling available from the ORTHOSCOPE web version, ORTHOSCOPE* can detect genes lost in specific lineages and horizontal gene transfers. This pipeline is available at https://github.com/jun-inoue/ORTHOSCOPE_STAR.
Collapse
Affiliation(s)
- Jun Inoue
- Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, University of Tokyo, Kashiwa, Japan
| |
Collapse
|
11
|
Huang LC, Taujale R, Gravel N, Venkat A, Yeung W, Byrne DP, Eyers PA, Kannan N. KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases. BMC Bioinformatics 2021; 22:446. [PMID: 34537014 PMCID: PMC8449880 DOI: 10.1186/s12859-021-04358-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Rahil Taujale
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Nathan Gravel
- PREP@UGA, University of Georgia, 500 D.W. Brooks Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Dominic P. Byrne
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Patrick A. Eyers
- Department of Biochemistry and Systems Biology, University of Liverpool, Crown St, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
12
|
Jiang D, Li Y, Wu W, Zhang H, Xu R, Xu H, Zhan R, Sun L. Identification and engineering on the nonconserved residues of metallo-β-lactamase-type thioesterase to improve the enzymatic activity. Biotechnol Bioeng 2021; 118:4623-4634. [PMID: 34427915 DOI: 10.1002/bit.27921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/14/2021] [Accepted: 08/14/2021] [Indexed: 11/12/2022]
Abstract
The standalone metallo-β-lactamase-type thioesterase (MβL-TE), belongs to the group V nonreducing polyketide synthase agene cluster, catalyzes the rate-limiting step of product releasing. Our work first investigated on the orthologous MβL-TEs from different origins to determine which nonconserved amino acid residues are important to the hydrolysis efficiency. A series of chimeric MβL-TEs were constructed by fragment swapping and site-directed mutagenesis, in vivo enzymatic assay showed that two nonconserved residues A19 and E75 (numbering in HyTE) were critical to the catalytic performance. Protein structure modeling suggested that these two residues are located in different areas of HyTE. A19 is on the entrance to the active sites, whereas E75 resides in the linker between the two β strands which hold the metal-binding sites. Combining with computational simulations and comparative enzymatic assay, different screening criteria were set up for selecting the variants on the two noncatalytic and nonconserved key residues to improve the catalytic activity. The rational design on A19 and E75 gave five candidates in total, two (A19F and E75Q) of which were thus found significantly improved the enzymatic performance of HyTE. The double-point mutant was constructed to further improve the activity, which was increased by 28.4-fold on product accumulation comparing to the wild-type HyTE. This study provides a novel approach for engineering on nonconserved residues to optimize enzymatic performance.
Collapse
Affiliation(s)
- Dayong Jiang
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Ya Li
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Wanqi Wu
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Hong Zhang
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Ruoxuan Xu
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Hui Xu
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Ruoting Zhan
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| | - Lei Sun
- Research Center of Chinese Herbal Resource Science and Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China.,Key Laboratory of Chinese Medicinal Resource from Lingnan (Guangzhou University of Chinese Medicine), Ministry of Education, Guangzhou, China.,Joint Laboratory of National Engineering Research Center for the Pharmaceutics of Traditional Chinese Medicines, Guangzhou, China
| |
Collapse
|
13
|
Abstract
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.
Collapse
Affiliation(s)
- Xavier Grau-Bové
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| | - Arnau Sebé-Pedrós
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, 08003, Spain
| |
Collapse
|
14
|
GenOrigin: A comprehensive protein-coding gene origination database on the evolutionary timescale of life. J Genet Genomics 2021; 48:1122-1129. [PMID: 34538772 DOI: 10.1016/j.jgg.2021.03.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/21/2021] [Accepted: 03/29/2021] [Indexed: 11/20/2022]
Abstract
The origination of new genes contributes to the biological diversity of life. New genes may quickly build their network, exert important functions, and generate novel phenotypes. Dating gene age and inferring the origination mechanisms of new genes, like primate-specific genes, is the basis for the functional study of the genes. However, no comprehensive resource of gene age estimates across species is available. Here, we systematically date the age of 9,102,113 protein-coding genes from 565 species in the Ensembl and Ensembl Genomes databases, including 82 bacteria, 57 protists, 134 fungi, 58 plants, 56 metazoa, and 178 vertebrates, using a protein-family-based pipeline with Wagner parsimony algorithm. We also collect gene age estimate data from other studies and uniformly distribute the gene age estimates to time ranges in a million years for comparison across studies. All the data are cataloged into GenOrigin (http://genorigin.chenzxlab.cn/), a user-friendly new database of gene age estimates, where users can browse gene age estimates by species, age, and gene ontology. In GenOrigin, the information such as gene age estimates, annotation, gene ontology, ortholog, and paralog, as well as detailed gene presence/absence views for gene age inference based on the species tree with evolutionary timescale, is provided to researchers for exploring gene functions.
Collapse
|
15
|
Conover JL, Sharbrough J, Wendel JF. pSONIC: Ploidy-aware Syntenic Orthologous Networks Identified via Collinearity. G3-GENES GENOMES GENETICS 2021; 11:6275219. [PMID: 33983433 PMCID: PMC8496325 DOI: 10.1093/g3journal/jkab170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022]
Abstract
Abstract
With the rapid rise in availability of high-quality genomes for closely related species, methods for orthology inference that incorporate synteny are increasingly useful. Polyploidy perturbs the 1:1 expected frequencies of orthologs between two species, complicating the identification of orthologs. Here we present a method of ortholog inference, Ploidy-aware Syntenic Orthologous Networks Identified via Collinearity (pSONIC). We demonstrate the utility of pSONIC using four species in the cotton tribe (Gossypieae), including one allopolyploid, and place between 75% and 90% of genes from each species into nearly 32,000 orthologous groups, 97% of which consist of at most singletons or tandemly duplicated genes—58.8% more than comparable methods that do not incorporate synteny. We show that 99% of singleton gene groups follow the expected tree topology and that our ploidy-aware algorithm recovers 97.5% identical groups when compared to splitting the allopolyploid into its two respective subgenomes, treating each as separate “species.”
Collapse
Affiliation(s)
- Justin L Conover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Joel Sharbrough
- Biology Department, Colorado State University, Fort Collins, CO 80521, USA
- Biology Department, New Mexico Institute of Mining and Technology, Socorro, NM 87801, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
16
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
17
|
Zdobnov EM, Kuznetsov D, Tegenfeldt F, Manni M, Berkeley M, Kriventseva EV. OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Res 2021; 49:D389-D393. [PMID: 33196836 PMCID: PMC7779051 DOI: 10.1093/nar/gkaa1009] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/12/2020] [Accepted: 10/29/2020] [Indexed: 12/22/2022] Open
Abstract
OrthoDB provides evolutionary and functional annotations of orthologs, inferred for a vast number of available organisms. OrthoDB is leading in the coverage and genomic diversity sampling of Eukaryotes, Prokaryotes and Viruses, and the sampling of Bacteria is further set to increase three-fold. The user interface has been enhanced in response to the massive growth in data. OrthoDB provides three views on the data: (i) a list of orthologous groups related to a user query, which are now arranged to visualize their hierarchical relations, (ii) a detailed view of an orthologous group, now featuring a Sankey diagram to facilitate navigation between the levels of orthology, from more finely-resolved to more general groups of orthologs, as well as an arrangement of orthologs into an interactive organism taxonomy structure, and (iii) we added a gene-centric view, showing the gene functional annotations and the pair-wise orthologs in example species. The OrthoDB standalone software for delineation of orthologs, Orthologer, is freely available. Online BUSCO assessments and mapping to OrthoDB of user-uploaded data enable interactive exploration of related annotations and generation of comparative charts. OrthoDB strives to predict orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations, and to compute evolutionary annotations such as evolutionary rate and phyletic profile. OrthoDB data can be assessed via SPARQL RDF, REST API, downloaded or browsed online from https://orthodb.org.
Collapse
Affiliation(s)
- Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Dmitry Kuznetsov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Fredrik Tegenfeldt
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Matthew Berkeley
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Evgenia V Kriventseva
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland, and Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| |
Collapse
|
18
|
Emms DM, Kelly S. Benchmarking Orthogroup Inference Accuracy: Revisiting Orthobench. Genome Biol Evol 2020; 12:2258-2266. [PMID: 33022036 PMCID: PMC7738749 DOI: 10.1093/gbe/evaa211] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2020] [Indexed: 01/24/2023] Open
Abstract
Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, United Kingdom
| |
Collapse
|
19
|
Zhou S, Chen Y, Guo C, Qi J. PhyloMCL: Accurate clustering of hierarchical orthogroups guided by phylogenetic relationship and inference of polyploidy events. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13401] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shengyu Zhou
- State key Laboratory of Genetic Engineering Institute of Plant Biology School of Life Sciences Fudan University Shanghai China
| | - Yamao Chen
- State key Laboratory of Genetic Engineering Institute of Plant Biology School of Life Sciences Fudan University Shanghai China
| | - Chunce Guo
- Jiangxi Provincial Key Laboratory for Bamboo Germplasm Resources and Utilization Forestry College Jiangxi Agricultural University Nanchang China
| | - Ji Qi
- State key Laboratory of Genetic Engineering Institute of Plant Biology School of Life Sciences Fudan University Shanghai China
| |
Collapse
|
20
|
Heger P, Zheng W, Rottmann A, Panfilio KA, Wiehe T. The genetic factors of bilaterian evolution. eLife 2020; 9:e45530. [PMID: 32672535 PMCID: PMC7535936 DOI: 10.7554/elife.45530] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/03/2020] [Indexed: 12/13/2022] Open
Abstract
The Cambrian explosion was a unique animal radiation ~540 million years ago that produced the full range of body plans across bilaterians. The genetic mechanisms underlying these events are unknown, leaving a fundamental question in evolutionary biology unanswered. Using large-scale comparative genomics and advanced orthology evaluation techniques, we identified 157 bilaterian-specific genes. They include the entire Nodal pathway, a key regulator of mesoderm development and left-right axis specification; components for nervous system development, including a suite of G-protein-coupled receptors that control physiology and behaviour, the Robo-Slit midline repulsion system, and the neurotrophin signalling system; a high number of zinc finger transcription factors; and novel factors that previously escaped attention. Contradicting the current view, our study reveals that genes with bilaterian origin are robustly associated with key features in extant bilaterians, suggesting a causal relationship.
Collapse
Affiliation(s)
- Peter Heger
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Wen Zheng
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Anna Rottmann
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| | - Kristen A Panfilio
- Institute for Zoology: Developmental Biology, Cologne Biocenter, University of CologneCologneGermany
- School of Life Sciences, University of Warwick, Gibbet Hill CampusCoventryUnited Kingdom
| | - Thomas Wiehe
- Institute for Genetics, Cologne Biocenter, University of CologneCologneGermany
| |
Collapse
|
21
|
Christian RW, Hewitt SL, Roalson EH, Dhingra A. Genome-Scale Characterization of Predicted Plastid-Targeted Proteomes in Higher Plants. Sci Rep 2020; 10:8281. [PMID: 32427841 PMCID: PMC7237471 DOI: 10.1038/s41598-020-64670-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 04/20/2020] [Indexed: 12/20/2022] Open
Abstract
Plastids are morphologically and functionally diverse organelles that are dependent on nuclear-encoded, plastid-targeted proteins for all biochemical and regulatory functions. However, how plastid proteomes vary temporally, spatially, and taxonomically has been historically difficult to analyze at a genome-wide scale using experimental methods. A bioinformatics workflow was developed and evaluated using a combination of fast and user-friendly subcellular prediction programs to maximize performance and accuracy for chloroplast transit peptides and demonstrate this technique on the predicted proteomes of 15 sequenced plant genomes. Gene family grouping was then performed in parallel using modified approaches of reciprocal best BLAST hits (RBH) and UCLUST. A total of 628 protein families were found to have conserved plastid targeting across angiosperm species using RBH, and 828 using UCLUST. However, thousands of clusters were also detected where only one species had predicted plastid targeting, most notably in Panicum virgatum which had 1,458 proteins with species-unique targeting. An average of 45% overlap was found in plastid-targeted protein-coding gene families compared with Arabidopsis, but an additional 20% of proteins matched against the full Arabidopsis proteome, indicating a unique evolution of plastid targeting. Neofunctionalization through subcellular relocalization is known to impart novel biological functions but has not been described before on a genome-wide scale for the plastid proteome. Further work to correlate these predicted novel plastid-targeted proteins to transcript abundance and high-throughput proteomics will uncover unique aspects of plastid biology and shed light on how the plastid proteome has evolved to influence plastid morphology and biochemistry.
Collapse
Affiliation(s)
- Ryan W Christian
- Department of Horticulture, Washington State University, Pullman, WA, USA
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
| | - Seanna L Hewitt
- Department of Horticulture, Washington State University, Pullman, WA, USA
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
| | - Eric H Roalson
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Amit Dhingra
- Department of Horticulture, Washington State University, Pullman, WA, USA.
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, USA.
| |
Collapse
|
22
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
23
|
Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 2020; 47:D807-D811. [PMID: 30395283 PMCID: PMC6323947 DOI: 10.1093/nar/gky1053] [Citation(s) in RCA: 547] [Impact Index Per Article: 136.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/29/2018] [Indexed: 11/13/2022] Open
Abstract
OrthoDB (https://www.orthodb.org) provides evolutionary and functional annotations of orthologs. This update features a major scaling up of the resource coverage, sampling the genomic diversity of 1271 eukaryotes, 6013 prokaryotes and 6488 viruses. These include putative orthologs among 448 metazoan, 117 plant, 549 fungal, 148 protist, 5609 bacterial, and 404 archaeal genomes, picking up the best sequenced and annotated representatives for each species or operational taxonomic unit. OrthoDB relies on a concept of hierarchy of levels-of-orthology to enable more finely resolved gene orthologies for more closely related species. Since orthologs are the most likely candidates to retain functions of their ancestor gene, OrthoDB is aimed at narrowing down hypotheses about gene functions and enabling comparative evolutionary studies. Optional registered-user sessions allow on-line BUSCO assessments of gene set completeness and mapping of the uploaded data to OrthoDB to enable further interactive exploration of related annotations and generation of comparative charts. The accelerating expansion of genomics data continues to add valuable information, and OrthoDB strives to provide orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations and to compute evolutionary annotations. The data can be browsed online, downloaded or assessed via REST API or SPARQL RDF compatible with both UniProt and Ensembl.
Collapse
Affiliation(s)
- Evgenia V Kriventseva
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Dmitry Kuznetsov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Fredrik Tegenfeldt
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Renata Dias
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Felipe A Simão
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, rue Michel-Servet 1, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| |
Collapse
|
24
|
Krishnamurthy P, Tsukamoto C, Ishimoto M. Reconstruction of the Evolutionary Histories of UGT Gene Superfamily in Legumes Clarifies the Functional Divergence of Duplicates in Specialized Metabolism. Int J Mol Sci 2020; 21:E1855. [PMID: 32182686 PMCID: PMC7084467 DOI: 10.3390/ijms21051855] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 03/03/2020] [Accepted: 03/05/2020] [Indexed: 01/08/2023] Open
Abstract
Plant uridine 5'-diphosphate glycosyltransferases (UGTs) influence the physiochemical properties of several classes of specialized metabolites including triterpenoids via glycosylation. To uncover the evolutionary past of UGTs of soyasaponins (a group of beneficial triterpene glycosides widespread among Leguminosae), the UGT gene superfamily in Medicago truncatula, Glycine max, Phaseolus vulgaris, Lotus japonicus, and Trifolium pratense genomes were systematically mined. A total of 834 nonredundant UGTs were identified and categorized into 98 putative orthologous loci (POLs) using tree-based and graph-based methods. Major key findings in this study were of, (i) 17 POLs represent potential catalysts for triterpene glycosylation in legumes, (ii) UGTs responsible for the addition of second (UGT73P2: galactosyltransferase and UGT73P10: arabinosyltransferase) and third (UGT91H4: rhamnosyltransferase and UGT91H9: glucosyltransferase) sugars of the C-3 sugar chain of soyasaponins were resulted from duplication events occurred before and after the hologalegina-millettoid split, respectively, and followed neofunctionalization in species-/ lineage-specific manner, and (iii) UGTs responsible for the C-22-O glycosylation of group A (arabinosyltransferase) and DDMP saponins (DDMPtransferase) and the second sugar of C-22 sugar chain of group A saponins (UGT73F2: glucosyltransferase) may all share a common ancestor. Our findings showed a way to trace the evolutionary history of UGTs involved in specialized metabolism.
Collapse
Affiliation(s)
| | - Chigen Tsukamoto
- Faculty of Agriculture, Iwate University, Morioka 020-8550, Japan
| | - Masao Ishimoto
- Institute of Crop Science, NARO, 2-1-2 Kannondai, Tsukuba 305-8518, Japan
| |
Collapse
|
25
|
Dolby GA, Morales M, Webster TH, DeNardo DF, Wilson MA, Kusumi K. Discovery of a New TLR Gene and Gene Expansion Event through Improved Desert Tortoise Genome Assembly with Chromosome-Scale Scaffolds. Genome Biol Evol 2020; 12:3917-3925. [PMID: 32011707 PMCID: PMC7058155 DOI: 10.1093/gbe/evaa016] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2020] [Indexed: 12/11/2022] Open
Abstract
Toll-like receptors (TLRs) are a complex family of innate immune genes that are well characterized in mammals and birds but less well understood in nonavian sauropsids (reptiles). The advent of highly contiguous draft genomes of nonmodel organisms enables study of such gene families through analysis of synteny and sequence identity. Here, we analyze TLR genes from the genomes of 22 tetrapod species. Findings reveal a TLR8 gene expansion in crocodilians and turtles (TLR8B), and a second duplication (TLR8C) specifically within turtles, followed by pseudogenization of that gene in the nonfreshwater species (desert tortoise and green sea turtle). Additionally, the Mojave desert tortoise (Gopherus agassizii) has a stop codon in TLR8B (TLR8-1) that is polymorphic among conspecifics. Revised orthology further reveals a new TLR homolog, TLR21-like, which is exclusive to lizards, snakes, turtles, and crocodilians. These analyses were made possible by a new draft genome assembly of the desert tortoise (gopAga2.0), which used chromatin-based assembly to yield draft chromosomal scaffolds (L50 = 26 scaffolds, N50 = 28.36 Mb, longest scaffold = 107 Mb) and an enhanced de novo genome annotation with 25,469 genes. Our three-step approach to orthology curation and comparative analysis of TLR genes shows what new insights are possible using genome assemblies with chromosome-scale scaffolds that permit integration of synteny conservation data.
Collapse
Affiliation(s)
- Greer A Dolby
- School of Life Sciences, Arizona State University
- Center for Mechanisms of Evolution, Arizona State University
| | | | - Timothy H Webster
- School of Life Sciences, Arizona State University
- Department of Anthropology, University of Utah
| | | | - Melissa A Wilson
- School of Life Sciences, Arizona State University
- Center for Evolution and Medicine, Arizona State University
| | - Kenro Kusumi
- School of Life Sciences, Arizona State University
| |
Collapse
|
26
|
Trefflich S, Dalmolin RJS, Ortega JM, Castro MAA. Which came first, the transcriptional regulator or its target genes? An evolutionary perspective into the construction of eukaryotic regulons. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194472. [PMID: 31825805 DOI: 10.1016/j.bbagrm.2019.194472] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 11/06/2019] [Accepted: 11/30/2019] [Indexed: 01/06/2023]
Abstract
Eukaryotic regulons are regulatory units formed by a set of genes under the control of the same transcription factor (TF). Despite the functional plasticity, TFs are highly conserved and recognize the same DNA sequences in different organisms. One of the main factors that confer regulatory specificity is the distribution of the binding sites of the TFs along the genome, allowing the configuration of different transcriptional regulatory networks (TRNs) from the same regulator. A similar scenario occurs between tissues of the same organism, where a TRN can be rewired by epigenetic factors, modulating the accessibility of the TF to its binding sites. In this article we discuss concepts that can help to formulate testable hypotheses about the construction of regulons, exploring the presence and absence of the elements that form a TRN throughout the evolution of an ancestral lineage. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Sheyla Trefflich
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil; Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil
| | - Rodrigo J S Dalmolin
- Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal 59078-400, Brazil
| | - José Miguel Ortega
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Mauro A A Castro
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba 81520-260, Brazil.
| |
Collapse
|
27
|
Rubanov LI, Zaraisky AG, Shilovsky GA, Seliverstov AV, Zverkov OA, Lyubetsky VA. Screening for mouse genes lost in mammals with long lifespans. BioData Min 2019; 12:20. [PMID: 31728160 PMCID: PMC6842137 DOI: 10.1186/s13040-019-0208-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 10/25/2019] [Indexed: 12/23/2022] Open
Abstract
Background Gerontogenes include those that modulate life expectancy in various species and may be the actual longevity genes. We believe that a long (relative to body weight) lifespan in individual rodent and primate species can be due, among other things, to the loss of particular genes that are present in short-lived species of the same orders. These genes can also explain the widely different rates of aging among diverse species as well as why similarly sized rodents or primates sometimes have anomalous life expectancies (e.g., naked mole-rats and humans). Here, we consider the gene loss in the context of the prediction of Williams’ theory that concerns the reallocation of physiological resources of an organism between active reproduction (r-strategy) and self-maintenance (K-strategy). We have identified such lost genes using an original computer-aided approach; the software considers the loss of a gene as disruptions in gene orthology, local gene synteny or both. Results A method and software identifying the genes that are absent from a predefined set of species but present in another predefined set of species are suggested. Examples of such pairs of sets include long-lived vs short-lived, homeothermic vs poikilothermic, amniotic vs anamniotic, aquatic vs terrestrial, and neotenic vs nonneotenic species, among others. Species are included in one of two sets according to the property of interest, such as longevity or homeothermy. The program is universal towards these pairs, i.e., towards the underlying property, although the sets should include species with quality genome assemblies. Here, the proposed method was applied to study the longevity of Euarchontoglires species. It largely predicted genes that are highly expressed in the testis, epididymis, uterus, mammary glands, and the vomeronasal and other reproduction-related organs. This agrees with Williams’ theory that hypothesizes a species transition from r-strategy to K-strategy. For instance, the method predicts the mouse gene Smpd5, which has an expression level 20 times greater in the testis than in organs unrelated to reproduction as experimentally demonstrated elsewhere. At the same time, its paralog Smpd3 is not predicted by the program and is widely expressed in many organs not specifically related to reproduction. Conclusions The method and program, which were applied here to screen for gene losses that can accompany increased lifespan, were also applied to study reduced regenerative capacity and development of the telencephalon, neoteny, etc. Some of these results have been carefully tested experimentally. Therefore, we assume that the method is widely applicable.
Collapse
Affiliation(s)
- Lev I Rubanov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Andrey G Zaraisky
- 2Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences (IBCH RAS) 16/10, Miklukho-Maklaya str., Moscow, 117997 Russia
| | - Gregory A Shilovsky
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Alexandr V Seliverstov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Oleg A Zverkov
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| | - Vassily A Lyubetsky
- 1Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute) IITP RAS, 19 build. 1 Bolshoy Karetny per., Moscow, 127051 Russia
| |
Collapse
|
28
|
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019; 20:238. [PMID: 31727128 PMCID: PMC6857279 DOI: 10.1186/s13059-019-1832-y] [Citation(s) in RCA: 2988] [Impact Index Per Article: 597.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022] Open
Abstract
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
29
|
Miller JB, Pickett BD, Ridge PG. JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm. Bioinformatics 2019; 35:546-552. [PMID: 30084941 PMCID: PMC6378933 DOI: 10.1093/bioinformatics/bty669] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/11/2018] [Accepted: 07/31/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. Results In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. Availability and implementation JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Justin B Miller
- Department of Biology, Brigham Young University, Provo, UT, USA
| | | | - Perry G Ridge
- Department of Biology, Brigham Young University, Provo, UT, USA
| |
Collapse
|
30
|
Hu X, Friedberg I. SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier. Gigascience 2019; 8:giz118. [PMID: 31648300 PMCID: PMC6812468 DOI: 10.1093/gigascience/giz118] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/07/2019] [Accepted: 09/05/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
Collapse
Affiliation(s)
- Xiao Hu
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
31
|
Transcriptome Landscape Variation in the Genus Thymus. Genes (Basel) 2019; 10:genes10080620. [PMID: 31426352 PMCID: PMC6723042 DOI: 10.3390/genes10080620] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/31/2019] [Accepted: 08/12/2019] [Indexed: 12/13/2022] Open
Abstract
Among the Lamiaceae family, the genus Thymus is an economically important genera due to its medicinal and aromatic properties. Most Thymus molecular research has focused on the determining the phylogenetic relationships between different species, but no published work has focused on the evolution of the transcriptome across the genus to elucidate genes involved in terpenoid biosynthesis. Hence, in this study, the transcriptomes of five different Thymus species were generated and analyzed to mine putative genes involved in thymol and carvacrol biosynthesis. High-throughput sequencing produced ~43 million high-quality reads per sample, which were assembled de novo using several tools, then further subjected to a quality evaluation. The best assembly for each species was used as queries to search within the UniProt, KEGG (Kyoto Encyclopedia of Genes and Genomes), COG (Clusters of Orthologous Groups) and TF (Transcription Factors) databases. Mining the transcriptomes resulted in the identification of 592 single-copy orthogroups used for phylogenetic analysis. The data showed strongly support a close genetic relationship between Thymus vulgaris and Thymus daenensis. Additionally, this study dates the speciation events between 1.5–2.1 and 9–10.2 MYA according to different methodologies. Our study provides a global overview of genes related to the terpenoid pathway in Thymus, and can help establish an understanding of the relationship that exists among Thymus species.
Collapse
|
32
|
Hellmuth M, Huber KT, Moulton V. Reconciling event-labeled gene trees with MUL-trees and species networks. J Math Biol 2019; 79:1885-1925. [PMID: 31410552 DOI: 10.1007/s00285-019-01414-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 05/08/2019] [Indexed: 11/30/2022]
Abstract
Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
33
|
Ding W, Baumdicker F, Neher RA. panX: pan-genome analysis and exploration. Nucleic Acids Res 2019; 46:e5. [PMID: 29077859 PMCID: PMC5758898 DOI: 10.1093/nar/gkx977] [Citation(s) in RCA: 156] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 10/10/2017] [Indexed: 11/24/2022] Open
Abstract
Horizontal transfer, gene loss, and duplication result in dynamic bacterial genomes shaped by a complex mixture of different modes of evolution. Closely related strains can differ in the presence or absence of many genes, and the total number of distinct genes found in a set of related isolates—the pan-genome—is often many times larger than the genome of individual isolates. We have developed a pipeline that efficiently identifies orthologous gene clusters in the pan-genome. This pipeline is coupled to a powerful yet easy-to-use web-based visualization for interactive exploration of the pan-genome. The visualization consists of connected components that allow rapid filtering and searching of genes and inspection of their evolutionary history. For each gene cluster, panX displays an alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree and infers gain and loss of genes on the core-genome phylogeny. PanX is available at pangenome.de. Custom pan-genomes can be visualized either using a web server or by serving panX locally as a browser-based application.
Collapse
Affiliation(s)
- Wei Ding
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Franz Baumdicker
- Mathematisches Institut, Albert-Ludwigs University of Freiburg, 79104 Freiburg, Germany
| | - Richard A Neher
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.,Biozentrum and SIB Swiss Institute of Bioinformatics, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
34
|
Vialle RA, Tamuri AU, Goldman N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol Biol Evol 2019; 35:1783-1797. [PMID: 29618097 PMCID: PMC5995191 DOI: 10.1093/molbev/msy055] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Accurate reconstruction of ancestral states is a critical evolutionary analysis when studying ancient proteins and comparing biochemical properties between parental or extinct species and their extant relatives. It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodological approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodology modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topology and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodological approaches. In addition, we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodological differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately determine ancestral states with confidence.
Collapse
Affiliation(s)
- Ricardo Assunção Vialle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom.,Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Department of Genetics and Molecular Biology, Laboratory of Human and Medical Genetics, Federal University of Pará, Belém, Pará, Brazil
| | - Asif U Tamuri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom.,Research IT Services, University College London, London, United Kingdom
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
35
|
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features. Comput Struct Biotechnol J 2019; 17:785-796. [PMID: 31312416 PMCID: PMC6607062 DOI: 10.1016/j.csbj.2019.05.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 05/23/2019] [Accepted: 05/26/2019] [Indexed: 12/23/2022] Open
Abstract
The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Collapse
Key Words
- CRISPR, Clustered regularly interspaced short palindromic repeats
- Essential genes
- Essentiality prediction
- Eukaryotes
- GBM, Gradient boosting method
- GI, Genetic interaction
- GLM, Generalised linear model
- GO, Gene ontology
- ML, Machine-learning
- Machine-learning
- NN, Artificial neural network
- OGEE, Online GEne essentiality database
- PPI, Protein-protein interaction
- PR-AUC, Area under the precision-recall curve
- RF, Random Forest
- RNAi, RNA interference
- ROC-AUC, Area under the receiver operating characteristic curve
- SPLS, Sparse partial least squares
- SVM, Support-Vector machine
Collapse
|
36
|
Li Y, Ning S, Calvo SE, Mootha VK, Liu JS. Bayesian hidden Markov tree models for clustering genes with shared evolutionary history. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Gilbert DG. Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 2019; 7:e6374. [PMID: 30723633 PMCID: PMC6361002 DOI: 10.7717/peerj.6374] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 12/29/2018] [Indexed: 01/19/2023] Open
Abstract
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.
Collapse
|
38
|
Abstract
The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.
Collapse
|
39
|
Song H, Sun J, Yang G. Comparative analysis of selection mode reveals different evolutionary rate and expression pattern in Arachis duranensis and Arachis ipaënsis duplicated genes. PLANT MOLECULAR BIOLOGY 2018; 98:349-361. [PMID: 30298428 DOI: 10.1007/s11103-018-0784-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 09/28/2018] [Indexed: 06/08/2023]
Abstract
Our results reveal that Ks is a determining factor affecting selective pressure and different evolution and expression patterns are detected between PSGs and NSGs in wild Arachis duplicates. Selective pressure, including purifying (negative) and positive selection, can be detected in organisms. However, studies on comparative evolutionary rates, gene expression patterns and gene features between negatively selected genes (NSGs) and positively selected genes (PSGs) are lagging in paralogs of plants. Arachis duranensis and Arachis ipaënsis are ancestors of the cultivated peanut, an important oil and protein crop. Here, we carried out a series of systematic analyses, comparing NSG and PSG in paralogs, using genome sequences and transcriptome datasets in A. duranensis and A. ipaënsis. We found that synonymous substitution rate (Ks) is a determining factor affecting selective pressure in A. duranensis and A. ipaënsis duplicated genes. Lower expression level, lower gene expression breadth, higher codon bias and shorter polypeptide length were found in PSGs and not in NSGs. The correlation analyses showed that gene expression breadth was positively correlated with polypeptide length and GC content at the first codon site (GC1) in PSGs and NSGs, respectively. There was a negative correlation between expression level and polypeptide length in PSGs. In NSGs, the Ks was positively correlated with expression level, gene expression breadth, GC1, and GC content at the third codon site (GC3), but selective pressure was negatively correlated with expression level, gene expression breadth, polypeptide length, GC1, and GC3 content. The function of most duplicated gene pairs was divergent under drought and nematode stress. Taken together, our results show that different evolution and expression patterns occur between PSGs and NSGs in paralogs of two wild Arachis species.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China.
| | - Juan Sun
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China
| | - Guofeng Yang
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China.
| |
Collapse
|
40
|
Ghiselli F, Iannello M, Puccio G, Chang PL, Plazzi F, Nuzhdin SV, Passamonti M. Comparative Transcriptomics in Two Bivalve Species Offers Different Perspectives on the Evolution of Sex-Biased Genes. Genome Biol Evol 2018; 10:1389-1402. [PMID: 29897459 PMCID: PMC6007409 DOI: 10.1093/gbe/evy082] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2018] [Indexed: 12/13/2022] Open
Abstract
Comparative genomics has become a central tool for evolutionary biology, and a better knowledge of understudied taxa represents the foundation for future work. In this study, we characterized the transcriptome of male and female mature gonads in the European clam Ruditapes decussatus, compared with that in the Manila clam Ruditapes philippinarum providing, for the first time in bivalves, information about transcription dynamics and sequence evolution of sex-biased genes. In both the species, we found a relatively low number of sex-biased genes (1,284, corresponding to 41.3% of the orthologous genes between the two species), probably due to the absence of sexual dimorphism, and the transcriptional bias is maintained in only 33% of the orthologs. The dN/dS is generally low, indicating purifying selection, with genes where the female-biased transcription is maintained between the two species showing a significantly higher dN/dS. Genes involved in embryo development, cell proliferation, and maintenance of genome stability show a faster sequence evolution. Finally, we report a lack of clear correlation between transcription level and evolutionary rate in these species, in contrast with studies that reported a negative correlation. We discuss such discrepancy and call into question some methodological approaches and rationales generally used in this type of comparative studies.
Collapse
Affiliation(s)
- Fabrizio Ghiselli
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | - Mariangela Iannello
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | - Guglielmo Puccio
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | - Peter L Chang
- Program in Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Federico Plazzi
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | - Sergey V Nuzhdin
- Program in Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Marco Passamonti
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| |
Collapse
|
41
|
Sheikhizadeh Anari S, de Ridder D, Schranz ME, Smit S. Efficient inference of homologs in large eukaryotic pan-proteomes. BMC Bioinformatics 2018; 19:340. [PMID: 30257640 PMCID: PMC6158922 DOI: 10.1186/s12859-018-2362-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 09/09/2018] [Indexed: 12/31/2022] Open
Abstract
Background Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data. Results To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa. Conclusions We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools. Electronic supplementary material The online version of this article (10.1186/s12859-018-2362-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - M Eric Schranz
- Biosystematics Group, Wageningen University, Wageningen, The Netherlands
| | - Sandra Smit
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
42
|
Abstract
This chapter covers the theory and practice of ortholog gene set computation. In the theoretical part we give detailed and formal descriptions of the relevant concepts. We also cover the topic of graph-based clustering as a tool to compute ortholog gene sets. In the second part we provide an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes, briefly describing some of the most popular programs and resources currently available for this task.
Collapse
|
43
|
Campos DA, Pereira EC, Jardim R, Cuadrat RRC, Bernardes JS, Dávila AMR. Homology Inference Based on a Reconciliation Approach for the Comparative Genomics of Protozoa. Evol Bioinform Online 2018; 14:1176934318785138. [PMID: 30034216 PMCID: PMC6048835 DOI: 10.1177/1176934318785138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 05/30/2018] [Indexed: 11/16/2022] Open
Abstract
Protozoa parasites are responsible for several diseases in tropical countries, such as malaria, sleeping sickness, Chagas disease, leishmaniasis, amebiasis, and giardiasis, which together threaten millions of people around the world. In addition, most of the classic parasitic diseases due to protozoa are zoonotic. Understanding the biology of these organisms plays a relevant role in combating these diseases. Using homology inference and comparative genomics, this study targeted 3 protozoan species from different Phyla: Cryptosporidium muris (Apicomplexa), Entamoeba invadens (Amoebozoa), and Trypanosoma grayi (Euglenozoa). In this study, we propose a new approach for the identification of homologs, based on the reconciliation of the results of 2 different homology inference software programs. Our results showed that 46.1% (59/128) of the groups inferred by our reconciliation approach could be validated using this methodology. These validated groups are here called homologous Supergroups and were compared with SUPERFAMILY and Pfam Clans.
Collapse
Affiliation(s)
- Darueck A Campos
- Acre Federal Institute of Education,
Science and Technology, Rio Branco, Brazil
- Computational and Systems Biology
Laboratory, Oswaldo Cruz Institute (FIOCRUZ), Rio de Janeiro, Brazil
| | - Elisa C Pereira
- Computational and Systems Biology
Laboratory, Oswaldo Cruz Institute (FIOCRUZ), Rio de Janeiro, Brazil
| | - Rodrigo Jardim
- Computational and Systems Biology
Laboratory, Oswaldo Cruz Institute (FIOCRUZ), Rio de Janeiro, Brazil
| | - Rafael RC Cuadrat
- Computational and Systems Biology
Laboratory, Oswaldo Cruz Institute (FIOCRUZ), Rio de Janeiro, Brazil
- Bioinformatics core facility, Max Planck
Institute for Biology of Ageing, Cologne, Germany
| | - Juliana S Bernardes
- Biologie Computationnelle et
Quantitative, Université Pierre et Marie Curie, Paris, France
| | - Alberto MR Dávila
- Computational and Systems Biology
Laboratory, Oswaldo Cruz Institute (FIOCRUZ), Rio de Janeiro, Brazil
| |
Collapse
|
44
|
Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C. Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 2018; 33:i75-i82. [PMID: 28881964 PMCID: PMC5870696 DOI: 10.1093/bioinformatics/btx229] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Motivation Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is necessary to improve the scalability and robustness of orthology inference methods. Results We present improvements in the OMA algorithm: (i) refining the pairwise orthology inference step to account for same-species paralogs evolving at different rates, and (ii) minimizing errors in the pairwise orthology verification step by testing the consistency of pairwise distance estimates, which can be problematic in the presence of fragmentary sequences. In addition we introduce a more scalable procedure for hierarchical orthologous group (HOG) clustering, which are several orders of magnitude faster on large datasets. Using the Quest for Orthologs consortium orthology benchmark service, we show that these changes translate into substantial improvement on multiple empirical datasets. Availability and Implementation This new OMA 2.0 algorithm is used in the OMA database (http://omabrowser.org) from the March 2017 release onwards, and can be run on custom genomes using OMA standalone version 2.0 and above (http://omabrowser.org/standalone).
Collapse
Affiliation(s)
- Clément-Marie Train
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center of Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Natasha M Glover
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center of Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Gaston H Gonnet
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Adrian M Altenhoff
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center of Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK
| |
Collapse
|
45
|
Nøjgaard N, Geiß M, Merkle D, Stadler PF, Wieseke N, Hellmuth M. Time-consistent reconciliation maps and forbidden time travel. Algorithms Mol Biol 2018; 13:2. [PMID: 29441122 PMCID: PMC5800358 DOI: 10.1186/s13015-018-0121-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Accepted: 01/20/2018] [Indexed: 12/04/2022] Open
Abstract
Background In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer. Results We investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {O}(|V(T)|\log (|V(S)|))$$\end{document}O(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The algorithms are implemented in C++ using the boost graph library and are freely available at https://github.com/Nojgaard/tc-recon. Significance The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems.
Collapse
|
46
|
Farrer RA. Synima: a Synteny imaging tool for annotated genome assemblies. BMC Bioinformatics 2017; 18:507. [PMID: 29162056 PMCID: PMC5697234 DOI: 10.1186/s12859-017-1939-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 11/14/2017] [Indexed: 11/10/2022] Open
Abstract
Background Ortholog prediction and synteny visualization across whole genomes are valuable methods for detecting and representing a range of evolutionary processes such as genome expansion, chromosomal rearrangement, and chromosomal translocation. Few standalone methods are currently available to visualize synteny across any number of annotated genomes. Results Here, I present a Synteny Imaging tool (Synima) written in Perl, which uses the graphical features of R. Synima takes orthologues computed from reciprocal best BLAST hits or OrthoMCL, and DAGchainer, and outputs an overview of genome-wide synteny in PDF. Each of these programs are included with the Synima package, and a pipeline for their use. Synima has a range of graphical parameters including size, colours, order, and labels, which are specified in a config file generated by the first run of Synima – and can be subsequently edited. Synima runs quickly on a command line to generate informative and publication quality figures. Synima is open source and freely available from https://github.com/rhysf/Synima under the MIT License. Conclusions Synima should be a valuable tool for visualizing synteny between two or more annotated genome assemblies.
Collapse
Affiliation(s)
- Rhys A Farrer
- Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK. .,Department of Genetics, Environment and Evolution, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
47
|
Song H, Gao H, Liu J, Tian P, Nan Z. Comprehensive analysis of correlations among codon usage bias, gene expression, and substitution rate in Arachis duranensis and Arachis ipaënsis orthologs. Sci Rep 2017; 7:14853. [PMID: 29093502 PMCID: PMC5665869 DOI: 10.1038/s41598-017-13981-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 10/04/2017] [Indexed: 11/22/2022] Open
Abstract
The relationship between evolutionary rates and gene expression in model plant orthologs is well documented. However, little is known about the relationships between gene expression and evolutionary trends in Arachis orthologs. We identified 7,435 one-to-one orthologs, including 925 single-copy and 6,510 multiple-copy sequences in Arachis duranensis and Arachis ipaënsis. Codon usage was stronger for shorter polypeptides, which were encoded by codons with higher GC contents. Highly expressed coding sequences had higher codon usage bias, GC content, and expression breadth. Additionally, expression breadth was positively correlated with polypeptide length, but there was no correlation between gene expression and polypeptide length. Inferred selective pressure was also negatively correlated with both gene expression and expression breadth in all one-to-one orthologs, while positively but non-significantly correlated with gene expression in sequences with signatures of positive selection. Gene expression levels and expression breadth were significantly higher for single-copy genes than for multiple-copy genes. Similarly, the gene expression and expression breadth in sequences with signatures of purifying selection were higher than those of sequences with positive selective signatures. These results indicated that gene expression differed between single-copy and multiple-copy genes as well as sequences with signatures of positive and purifying selection.
Collapse
Affiliation(s)
- Hui Song
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China.
| | - Hongjuan Gao
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Jing Liu
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Pei Tian
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Zhibiao Nan
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
48
|
Hellmuth M. Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol 2017; 12:23. [PMID: 28861118 PMCID: PMC5576477 DOI: 10.1186/s13015-017-0114-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 08/16/2017] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map-also known as DTL-scenario-between the event-labeled gene trees and a (possibly unknown) species tree. RESULTS In this contribution, we first characterize whether there is a valid reconciliation map for binary event-labeled gene trees T that contain speciation, duplication and horizontal gene transfer events and some unknown species tree S in terms of "informative" triples that are displayed in T and provide information of the topology of S. These informative triples are used to infer the unknown species tree S for T. We obtain a similar result for non-binary gene trees. To this end, however, the reconciliation map needs to be further restricted. We provide a polynomial-time algorithm to decide whether there is a species tree for a given event-labeled gene tree, and in the positive case, to construct the species tree and the respective (restricted) reconciliation map. However, informative triples as well as DTL-scenarios have their limitations when they are used to explain the biological feasibility of gene trees. While reconciliation maps imply biological feasibility, we show that the converse is not true in general. Moreover, we show that informative triples neither provide enough information to characterize "relaxed" DTL-scenarios nor non-restricted reconciliation maps for non-binary biologically feasible gene trees.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Strasse 47, 17487 Greifswald, Germany
- Center for Bioinformatics, Saarland University, Building E 2.1, P.O. Box 151150, 66041 Saarbrücken, Germany
| |
Collapse
|
49
|
Positive diversifying selection is a pervasive adaptive force throughout the Drosophila radiation. Mol Phylogenet Evol 2017; 112:230-243. [DOI: 10.1016/j.ympev.2017.04.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 04/26/2017] [Accepted: 04/26/2017] [Indexed: 01/02/2023]
|
50
|
Graham AM, Presnell JS. Hypoxia Inducible Factor (HIF) transcription factor family expansion, diversification, divergence and selection in eukaryotes. PLoS One 2017; 12:e0179545. [PMID: 28614393 PMCID: PMC5470732 DOI: 10.1371/journal.pone.0179545] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 05/31/2017] [Indexed: 01/01/2023] Open
Abstract
Hypoxia inducible factor (HIF) transcription factors are crucial for regulating a variety of cellular activities in response to oxygen stress (hypoxia). In this study, we determine the evolutionary history of HIF genes and their associated transactivation domains, as well as perform selection and functional divergence analyses across their four characteristic domains. Here we show that the HIF genes are restricted to metazoans: At least one HIF-α homolog is found within the genomes of non-bilaterians and bilaterian invertebrates, while most vertebrate genomes contain between two and six HIF-α genes. We also find widespread purifying selection across all four characteristic domain types, bHLH, PAS, NTAD, CTAD, in HIF-α genes, and evidence for Type I functional divergence between HIF-1α, HIF-2α /EPAS, and invertebrate HIF genes. Overall, we describe the evolutionary histories of the HIF transcription factor gene family and its associated transactivation domains in eukaryotes. We show that the NTAD and CTAD domains appear de novo, without any appearance outside of the HIF-α subunits. Although they both appear in invertebrates as well as vertebrate HIF- α sequences, there seems to have been a substantial loss across invertebrates or were convergently acquired in these few lineages. We reaffirm that HIF-1α is phylogenetically conserved among most metazoans, whereas HIF-2α appeared later. Overall, our findings can be attributed to the substantial integration of this transcription factor family into the critical tasks associated with maintenance of oxygen homeostasis and vascularization, particularly in the vertebrate lineage.
Collapse
Affiliation(s)
- Allie M. Graham
- Department of Biology, University of Miami, Coral Gables, Florida, United States of America
- * E-mail:
| | - Jason S. Presnell
- Department of Biology, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|