1
|
Reyes-Herrera PH, Delgadillo-Duran DA, Flores-Gonzalez M, Mueller LA, Cristancho MA, Barrero LS. Chromosome-scale genome assembly and annotation of the tetraploid potato cultivar Diacol Capiro adapted to the Andean region. G3 (BETHESDA, MD.) 2024:jkae139. [PMID: 39058924 DOI: 10.1093/g3journal/jkae139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 06/05/2024] [Indexed: 07/28/2024]
Abstract
Potato (Solanum tuberosum) is an essential crop for food security and is ranked as the third most important crop worldwide for human consumption. The Diacol Capiro cultivar holds the dominant position in Colombian cultivation, primarily catering to the food processing industry. This highly heterozygous, autotetraploid cultivar belongs to the Andigenum group and it stands out for its adaptation to a wide variety of environments spanning altitudes from 1,800 to 3,200 meters above sea level. Here, a chromosome-scale assembly, referred to as DC, is presented for this cultivar. The assembly was generated by combining circular consensus sequencing with proximity ligation Hi-C for the scaffolding and represents 2.369 Gb with 48 pseudochromosomes covering 2,091 Gb and an anchor rate of 88.26%. The reference genome metrics, including an N50 of 50.5 Mb, a BUSCO (Benchmarking Universal Single-Copy Orthologue) score of 99.38%, and an Long Terminal Repeat Assembly Index score of 13.53, collectively signal the achieved high assembly quality. A comprehensive annotation yielded a total of 154,114 genes, and the associated BUSCO score of 95.78% for the annotated sequences attests to their completeness. The number of predicted NLR (Nucleotide-Binding and Leucine-Rich-Repeat genes) was 2107 with a large representation of NBARC (for nucleotide binding domain shared by Apaf-1, certain R gene products, and CED-4) containing domains (99.85%). Further comparative analysis of the proposed annotation-based assembly with high-quality known potato genomes, showed a similar genome metrics with differences in total gene numbers related to the ploidy status. The genome assembly and annotation of DC presented in this study represent a valuable asset for comprehending potato genetics. This resource aids in targeted breeding initiatives and contributes to the creation of enhanced, resilient, and more productive potato varieties, particularly beneficial for countries in Latin America.
Collapse
Affiliation(s)
- Paula H Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA), Bogotá, Cundinamarca 250047, Colombia
| | - Diego A Delgadillo-Duran
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA), Bogotá, Cundinamarca 250047, Colombia
| | | | | | - Marco A Cristancho
- Vicerrectoría de Investigación y Creación, Universidad de los Andes, Bogotá 111711, Colombia
| | - Luz Stella Barrero
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA), Bogotá, Cundinamarca 250047, Colombia
| |
Collapse
|
2
|
Hayford RK, Haley OC, Cannon EK, Portwood JL, Gardiner JM, Andorf CM, Woodhouse MR. Functional annotation and meta-analysis of maize transcriptomes reveal genes involved in biotic and abiotic stress. BMC Genomics 2024; 25:533. [PMID: 38816789 PMCID: PMC11137889 DOI: 10.1186/s12864-024-10443-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/22/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Environmental stress factors, such as biotic and abiotic stress, are becoming more common due to climate variability, significantly affecting global maize yield. Transcriptome profiling studies provide insights into the molecular mechanisms underlying stress response in maize, though the functions of many genes are still unknown. To enhance the functional annotation of maize-specific genes, MaizeGDB has outlined a data-driven approach with an emphasis on identifying genes and traits related to biotic and abiotic stress. RESULTS We mapped high-quality RNA-Seq expression reads from 24 different publicly available datasets (17 abiotic and seven biotic studies) generated from the B73 cultivar to the recent version of the reference genome B73 (B73v5) and deduced stress-related functional annotation of maize gene models. We conducted a robust meta-analysis of the transcriptome profiles from the datasets to identify maize loci responsive to stress, identifying 3,230 differentially expressed genes (DEGs): 2,555 DEGs regulated in response to abiotic stress, 408 DEGs regulated during biotic stress, and 267 common DEGs (co-DEGs) that overlap between abiotic and biotic stress. We discovered hub genes from network analyses, and among the hub genes of the co-DEGs we identified a putative NAC domain transcription factor superfamily protein (Zm00001eb369060) IDP275, which previously responded to herbivory and drought stress. IDP275 was up-regulated in our analysis in response to eight different abiotic and four different biotic stresses. A gene set enrichment and pathway analysis of hub genes of the co-DEGs revealed hormone-mediated signaling processes and phenylpropanoid biosynthesis pathways, respectively. Using phylostratigraphic analysis, we also demonstrated how abiotic and biotic stress genes differentially evolve to adapt to changing environments. CONCLUSIONS These results will help facilitate the functional annotation of multiple stress response gene models and annotation in maize. Data can be accessed and downloaded at the Maize Genetics and Genomics Database (MaizeGDB).
Collapse
Affiliation(s)
- Rita K Hayford
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA.
| | - Olivia C Haley
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - Ethalinda K Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - John L Portwood
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA
| | - Jack M Gardiner
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| | | |
Collapse
|
3
|
Sen S, Woodhouse MR, Portwood JL, Andorf CM. Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications. Database (Oxford) 2023; 2023:baad078. [PMID: 37935586 PMCID: PMC10634621 DOI: 10.1093/database/baad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 09/16/2023] [Accepted: 10/19/2023] [Indexed: 11/09/2023]
Abstract
The big-data analysis of complex data associated with maize genomes accelerates genetic research and improves agronomic traits. As a result, efforts have increased to integrate diverse datasets and extract meaning from these measurements. Machine learning models are a powerful tool for gaining knowledge from large and complex datasets. However, these models must be trained on high-quality features to succeed. Currently, there are no solutions to host maize multi-omics datasets with end-to-end solutions for evaluating and linking features to target gene annotations. Our work presents the Maize Feature Store (MFS), a versatile application that combines features built on complex data to facilitate exploration, modeling and analysis. Feature stores allow researchers to rapidly deploy machine learning applications by managing and providing access to frequently used features. We populated the MFS for the maize reference genome with over 14 000 gene-based features based on published genomic, transcriptomic, epigenomic, variomic and proteomics datasets. Using the MFS, we created an accurate pan-genome classification model with an AUC-ROC score of 0.87. The MFS is publicly available through the maize genetics and genomics database. Database URL https://mfs.maizegdb.org/.
Collapse
Affiliation(s)
- Shatabdi Sen
- Department of Plant Pathology & Microbiology, Iowa State University, 1344 Advanced Teaching & Research Bldg, 2213 Pammel Dr, Ames, IA 50011, USA
| | - Margaret R Woodhouse
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, 819 Wallace Road, Ames, IA 50011, USA
| | - John L Portwood
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, 819 Wallace Road, Ames, IA 50011, USA
| | - Carson M Andorf
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, 819 Wallace Road, Ames, IA 50011, USA
- Department of Computer Science, Iowa State University, Atanasoff Hall, 2434 Osborn Dr, Ames, IA 50011, USA
| |
Collapse
|
4
|
Barrera-Redondo J, Lotharukpong JS, Drost HG, Coelho SM. Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biol 2023; 24:54. [PMID: 36964572 PMCID: PMC10037820 DOI: 10.1186/s13059-023-02895-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/10/2023] [Indexed: 03/26/2023] Open
Abstract
We present GenEra ( https://github.com/josuebarrera/GenEra ), a DIAMOND-fueled gene-family founder inference framework that addresses previously raised limitations and biases in genomic phylostratigraphy, such as homology detection failure. GenEra also reduces computational time from several months to a few days for any genome of interest. We analyze the emergence of taxonomically restricted gene families during major evolutionary transitions in plants, animals, and fungi. Our results indicate that the impact of homology detection failure on inferred patterns of gene emergence is lineage-dependent, suggesting that plants are more prone to evolve novelty through the emergence of new genes compared to animals and fungi.
Collapse
Affiliation(s)
- Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Jaruwatana Sodai Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Susana M Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| |
Collapse
|
5
|
Nesterenko M, Miroliubov A. From head to rootlet: comparative transcriptomic analysis of a rhizocephalan barnacle Peltogaster reticulata (Crustacea: Rhizocephala). F1000Res 2023; 11:583. [PMID: 36447930 PMCID: PMC9664023 DOI: 10.12688/f1000research.110492.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open
Abstract
Background: Rhizocephalan barnacles stand out in the diverse world of metazoan parasites. The body of a rhizocephalan female is modified beyond revealing any recognizable morphological features, consisting of the interna, a system of rootlets, and the externa, a sac-like reproductive body. Moreover, rhizocephalans have an outstanding ability to control their hosts, literally turning them into "zombies". Despite all these amazing traits, there are no genomic or transcriptomic data about any Rhizocephala. Methods: We collected transcriptomes from four body parts of an adult female rhizocephalan Peltogaster reticulata: the externa, and the main, growing, and thoracic parts of the interna. We used all prepared data for the de novo assembly of the reference transcriptome. Next, a set of encoded proteins was determined, the expression levels of protein-coding genes in different parts of the parasite's body were calculated and lists of enriched bioprocesses were identified. We also in silico identified and analyzed sets of potential excretory / secretory proteins. Finally, we applied phylostratigraphy and evolutionary transcriptomics approaches to our data. Results: The assembled reference transcriptome included transcripts of 12,620 protein-coding genes and was the first for any rhizocephalan. Based on the results obtained, the spatial heterogeneity of protein-coding gene expression in different regions of the adult female body of P. reticulata was established. The results of both transcriptomic analysis and histological studies indicated the presence of germ-like cells in the lumen of the interna. The potential molecular basis of the interaction between the nervous system of the host and the parasite's interna was also determined. Given the prolonged expression of development-associated genes, we suggest that rhizocephalans "got stuck in their metamorphosis", even at the reproductive stage. Conclusions: The results of the first comparative transcriptomic analysis for Rhizocephala not only clarified but also expanded the existing ideas about the biology of these extraordinary parasites.
Collapse
Affiliation(s)
- Maksim Nesterenko
- Department of Invertebrate Zoology, St Petersburg State University, St Petersburg, 199034, Russian Federation,Laboratory of parasitic worms and protists, Zoological Institute of Russian Academy of Sciences, St Petersburg, 199034, Russian Federation,
| | - Aleksei Miroliubov
- Laboratory of parasitic worms and protists, Zoological Institute of Russian Academy of Sciences, St Petersburg, 199034, Russian Federation
| |
Collapse
|
6
|
Ma S, Skarica M, Li Q, Xu C, Risgaard RD, Tebbenkamp AT, Mato-Blanco X, Kovner R, Krsnik Ž, de Martin X, Luria V, Martí-Pérez X, Liang D, Karger A, Schmidt DK, Gomez-Sanchez Z, Qi C, Gobeske KT, Pochareddy S, Debnath A, Hottman CJ, Spurrier J, Teo L, Boghdadi AG, Homman-Ludiye J, Ely JJ, Daadi EW, Mi D, Daadi M, Marín O, Hof PR, Rasin MR, Bourne J, Sherwood CC, Santpere G, Girgenti MJ, Strittmatter SM, Sousa AM, Sestan N. Molecular and cellular evolution of the primate dorsolateral prefrontal cortex. Science 2022; 377:eabo7257. [PMID: 36007006 PMCID: PMC9614553 DOI: 10.1126/science.abo7257] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The granular dorsolateral prefrontal cortex (dlPFC) is an evolutionary specialization of primates that is centrally involved in cognition. We assessed more than 600,000 single-nucleus transcriptomes from adult human, chimpanzee, macaque, and marmoset dlPFC. Although most cell subtypes defined transcriptomically are conserved, we detected several that exist only in a subset of species as well as substantial species-specific molecular differences across homologous neuronal, glial, and non-neural subtypes. The latter are exemplified by human-specific switching between expression of the neuropeptide somatostatin and tyrosine hydroxylase, the rate-limiting enzyme in dopamine production in certain interneurons. The above molecular differences are also illustrated by expression of the neuropsychiatric risk gene FOXP2, which is human-specific in microglia and primate-specific in layer 4 granular neurons. We generated a comprehensive survey of the dlPFC cellular repertoire and its shared and divergent features in anthropoid primates.
Collapse
Affiliation(s)
- Shaojie Ma
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Mario Skarica
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Qian Li
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Chuan Xu
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Ryan D. Risgaard
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
- Medical Scientist Training Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | | | - Xoel Mato-Blanco
- Neurogenomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), MELIS, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Rothem Kovner
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Željka Krsnik
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Croatian Institute for Brain Research, School of Medicine, University of Zagreb, 10000 Zagreb, Croatia
| | - Xabier de Martin
- Neurogenomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), MELIS, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Victor Luria
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Xavier Martí-Pérez
- Neurogenomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), MELIS, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Dan Liang
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA, USA
| | - Danielle K. Schmidt
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Zachary Gomez-Sanchez
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Cai Qi
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Kevin T. Gobeske
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, CT 06510, USA
| | - Sirisha Pochareddy
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Ashwin Debnath
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Cade J. Hottman
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Joshua Spurrier
- Program in Cellular Neuroscience, Neurodegeneration and Repair, Department of Neurology, Yale School of Medicine, New Haven, CT 06536, USA
| | - Leon Teo
- Australian Regenerative Medicine Institute, 15 Innovation Walk, Monash University, Clayton VIC, 3800, Australia
| | - Anthony G. Boghdadi
- Australian Regenerative Medicine Institute, 15 Innovation Walk, Monash University, Clayton VIC, 3800, Australia
| | - Jihane Homman-Ludiye
- Australian Regenerative Medicine Institute, 15 Innovation Walk, Monash University, Clayton VIC, 3800, Australia
| | - John J. Ely
- MAEBIOS, Alamogordo, NM 88310, USA
- Department of Anthropology and Center for the Advanced Study of Human Paleobiology, The George Washington University, Washington, DC, USA
| | - Etienne W. Daadi
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Da Mi
- Tsinghua-Peking Center for Life Sciences, IDG/McGovern Institute for Brain Research, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Marcel Daadi
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
- Department of Cell Systems & Anatomy, Radiology, Long School of Medicine, UT Health San Antonio
- NeoNeuron LLC, Palo Alto, CA 94306, USA
| | - Oscar Marín
- Centre for Developmental Neurobiology, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE1 1UL, UK
- MRC Centre for Neurodevelopmental Disorders, King’s College London, London SE1 1UL, UK
| | - Patrick R. Hof
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Mladen-Roko Rasin
- Department of Neuroscience and Cell Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ 08854, USA
| | - James Bourne
- Australian Regenerative Medicine Institute, 15 Innovation Walk, Monash University, Clayton VIC, 3800, Australia
| | - Chet C. Sherwood
- Department of Anthropology and Center for the Advanced Study of Human Paleobiology, The George Washington University, Washington, DC, USA
| | - Gabriel Santpere
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Neurogenomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), MELIS, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Matthew J. Girgenti
- Department of Psychiatry, Yale School of Medicine, New Haven, CT 06510, USA
- National Center for PTSD, US Department of Veterans Affairs, White River Junction, VT, USA
| | - Stephen M. Strittmatter
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Program in Cellular Neuroscience, Neurodegeneration and Repair, Department of Neurology, Yale School of Medicine, New Haven, CT 06536, USA
- Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - André M.M. Sousa
- Waisman Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Neuroscience, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI 53705, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, CT 06510, USA
- Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
- Departments of Genetics and Comparative Medicine, Program in Cellular Neuroscience, Neurodegeneration and Repair, and Yale Child Study Center, Yale School of Medicine, New Haven, CT 06510, USA
| |
Collapse
|
7
|
Raxwal VK, Singh S, Agarwal M, Riha K. Transcriptional and post-transcriptional regulation of young genes in plants. BMC Biol 2022; 20:134. [PMID: 35676681 PMCID: PMC9178820 DOI: 10.1186/s12915-022-01339-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 05/30/2022] [Indexed: 12/03/2022] Open
Abstract
Background New genes continuously emerge from non-coding DNA or by diverging from existing genes, but most of them are rapidly lost and only a few become fixed within the population. We hypothesized that young genes are subject to transcriptional and post-transcriptional regulation to limit their expression and minimize their exposure to purifying selection. Results We performed a protein-based homology search across the tree of life to determine the evolutionary age of protein-coding genes present in the rice genome. We found that young genes in rice have relatively low expression levels, which can be attributed to distal enhancers, and closed chromatin conformation at their transcription start sites (TSS). The chromatin in TSS regions can be re-modeled in response to abiotic stress, indicating conditional expression of young genes. Furthermore, transcripts of young genes in Arabidopsis tend to be targeted by nonsense-mediated RNA decay, presenting another layer of regulation limiting their expression. Conclusions These data suggest that transcriptional and post-transcriptional mechanisms contribute to the conditional expression of young genes, which may alleviate purging selection while providing an opportunity for phenotypic exposure and functionalization. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01339-7.
Collapse
Affiliation(s)
- Vivek Kumar Raxwal
- Department of Botany, University of Delhi, Delhi, 110007, India. .,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic.
| | - Somya Singh
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Manu Agarwal
- Department of Botany, University of Delhi, Delhi, 110007, India.
| | - Karel Riha
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic.
| |
Collapse
|
8
|
Nesterenko M, Miroliubov A. From head to rootlet: comparative transcriptomic analysis of a rhizocephalan barnacle Peltogaster reticulata (Crustacea: Rhizocephala). F1000Res 2022; 11:583. [PMID: 36447930 PMCID: PMC9664023 DOI: 10.12688/f1000research.110492.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 09/16/2023] Open
Abstract
Background: Rhizocephalan barnacles stand out in the diverse world of metazoan parasites. The body of a rhizocephalan female is modified beyond revealing any recognizable morphological features, consisting of the interna, a system of rootlets, and the externa, a sac-like reproductive body. Moreover, rhizocephalans have an outstanding ability to control their hosts, literally turning them into "zombies". Despite all these amazing traits, there are no genomic or transcriptomic data about any Rhizocephala. Methods: We collected transcriptomes from four body parts of an adult female rhizocephalan Peltogaster reticulata: the externa, and the main, growing, and thoracic parts of the interna. We used all prepared data for the de novo assembly of the reference transcriptome. Next, a set of encoded proteins was determined, the expression levels of protein-coding genes in different parts of the parasite's body were calculated and lists of enriched bioprocesses were identified. We also in silico identified and analyzed sets of potential excretory / secretory proteins. Finally, we applied phylostratigraphy and evolutionary transcriptomics approaches to our data. Results: The assembled reference transcriptome included transcripts of 12,620 protein-coding genes and was the first for any rhizocephalan. Based on the results obtained, the spatial heterogeneity of protein-coding gene expression in different regions of the adult female body of P. reticulata was established. The results of both transcriptomic analysis and histological studies indicated the presence of germ-like cells in the lumen of the interna. The potential molecular basis of the interaction between the nervous system of the host and the parasite's interna was also determined. Given the prolonged expression of development-associated genes, we suggest that rhizocephalans "got stuck in their metamorphosis", even at the reproductive stage. Conclusions: The results of the first comparative transcriptomic analysis for Rhizocephala not only clarified but also expanded the existing ideas about the biology of these extraordinary parasites.
Collapse
Affiliation(s)
- Maksim Nesterenko
- Department of Invertebrate Zoology, St Petersburg State University, St Petersburg, 199034, Russian Federation
- Laboratory of parasitic worms and protists, Zoological Institute of Russian Academy of Sciences, St Petersburg, 199034, Russian Federation
| | - Aleksei Miroliubov
- Laboratory of parasitic worms and protists, Zoological Institute of Russian Academy of Sciences, St Petersburg, 199034, Russian Federation
| |
Collapse
|
9
|
Li J, Singh U, Bhandary P, Campbell J, Arendsee Z, Seetharam AS, Wurtele ES. Foster thy young: enhanced prediction of orphan genes in assembled genomes. Nucleic Acids Res 2021; 50:e37. [PMID: 34928390 PMCID: PMC9023268 DOI: 10.1093/nar/gkab1238] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/22/2021] [Accepted: 12/02/2021] [Indexed: 02/06/2023] Open
Abstract
Proteins encoded by newly-emerged genes ('orphan genes') share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.
Collapse
Affiliation(s)
- Jing Li
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA
| | - Urminder Singh
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Priyanka Bhandary
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Jacqueline Campbell
- Corn Insects and Crop Genetics Research Unit, US Department of Agriculture Agriculture Research Service, Ames, IA 50014, USA
| | - Zebulun Arendsee
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Iowa State University, Ames, IA 50014, USA
| | - Eve Syrkin Wurtele
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50014, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA 50014, USA.,Genetics and Genomics Graduate Program, Iowa State University, Ames, IA 50014, USA.,Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50014, USA
| |
Collapse
|
10
|
Li F, Rane RV, Luria V, Xiong Z, Chen J, Li Z, Catullo RA, Griffin PC, Schiffer M, Pearce S, Lee SF, McElroy K, Stocker A, Shirriffs J, Cockerell F, Coppin C, Sgrò CM, Karger A, Cain JW, Weber JA, Santpere G, Kirschner MW, Hoffmann AA, Oakeshott JG, Zhang G. Phylogenomic analyses of the genus Drosophila reveals genomic signals of climate adaptation. Mol Ecol Resour 2021; 22:1559-1581. [PMID: 34839580 PMCID: PMC9299920 DOI: 10.1111/1755-0998.13561] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 11/10/2021] [Indexed: 01/13/2023]
Abstract
Many Drosophila species differ widely in their distributions and climate niches, making them excellent subjects for evolutionary genomic studies. Here, we have developed a database of high‐quality assemblies for 46 Drosophila species and one closely related Zaprionus. Fifteen of the genomes were newly sequenced, and 20 were improved with additional sequencing. New or improved annotations were generated for all 47 species, assisted by new transcriptomes for 19. Phylogenomic analyses of these data resolved several previously ambiguous relationships, especially in the melanogaster species group. However, it also revealed significant phylogenetic incongruence among genes, mainly in the form of incomplete lineage sorting in the subgenus Sophophora but also including asymmetric introgression in the subgenus Drosophila. Using the phylogeny as a framework and taking into account these incongruences, we then screened the data for genome‐wide signals of adaptation to different climatic niches. First, phylostratigraphy revealed relatively high rates of recent novel gene gain in three temperate pseudoobscura and five desert‐adapted cactophilic mulleri subgroup species. Second, we found differing ratios of nonsynonymous to synonymous substitutions in several hundred orthologues between climate generalists and specialists, with trends for significantly higher ratios for those in tropical and lower ratios for those in temperate‐continental specialists respectively than those in the climate generalists. Finally, resequencing natural populations of 13 species revealed tropics‐restricted species generally had smaller population sizes, lower genome diversity and more deleterious mutations than the more widespread species. We conclude that adaptation to different climates in the genus Drosophila has been associated with large‐scale and multifaceted genomic changes.
Collapse
Affiliation(s)
- Fang Li
- BGI-Shenzhen, Shenzhen, China.,Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rahul V Rane
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia.,Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia
| | - Victor Luria
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Zijun Xiong
- BGI-Shenzhen, Shenzhen, China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences (CAS), Kunming, Yunnan, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | | | | | - Renee A Catullo
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia.,Division of Ecology and Evolution, Centre for Biodiversity Analysis, The Australian National University, Acton, ACT, Australia
| | - Philippa C Griffin
- Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia
| | - Michele Schiffer
- Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia.,Daintree Rainforest Observatory, James Cook University, Cape Tribulation, Qld, Australia
| | - Stephen Pearce
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia
| | - Siu Fai Lee
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia.,Applied BioSciences, Macquarie University, North Ryde, NSW, Australia
| | - Kerensa McElroy
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia
| | - Ann Stocker
- Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia
| | - Jennifer Shirriffs
- Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia
| | - Fiona Cockerell
- School of Biological Sciences, Monash University, Clayton, Vic., Australia
| | - Chris Coppin
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia
| | - Carla M Sgrò
- School of Biological Sciences, Monash University, Clayton, Vic., Australia
| | - Amir Karger
- IT - Research Computing, Harvard Medical School, Boston, Massachusetts, USA
| | - John W Cain
- Department of Mathematics, Harvard University, Cambridge, Massachusetts, USA
| | - Jessica A Weber
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Gabriel Santpere
- Neurogenomics Group, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences (DCEXS), Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Marc W Kirschner
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Ary A Hoffmann
- Bio21 Institute, School of BioSciences, University of Melbourne, Parkville, Vic., Australia
| | - John G Oakeshott
- Commonwealth Scientific and Industrial Research Organisation, Acton, ACT, Australia.,Applied BioSciences, Macquarie University, North Ryde, NSW, Australia
| | - Guojie Zhang
- BGI-Shenzhen, Shenzhen, China.,Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences (CAS), Kunming, Yunnan, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
11
|
Stein WD, Hoshen MB. During evolution from the earliest tetrapoda, newly-recruited genes are increasingly paralogues of existing genes and distribute non-randomly among the chromosomes. BMC Genomics 2021; 22:794. [PMID: 34736418 PMCID: PMC8570013 DOI: 10.1186/s12864-021-08066-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 09/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The present availability of full genome sequences of a broad range of animal species across the whole range of evolutionary history enables one to ask questions as to the distribution of genes across the chromosomes. Do newly recruited genes, as new clades emerge, distribute at random or at non-random locations? RESULTS We extracted values for the ages of the human genes and for their current chromosome locations, from published sources. A quantitative analysis showed that the distribution of newly-added genes among and within the chromosomes appears to be increasingly non-random if one observes animals along the evolutionary series from the precursors of the tetrapoda through to the great apes, whereas the oldest genes are randomly distributed. CONCLUSIONS Randomization will result from chromosome evolution, but less and less time is available for this process as evolution proceeds. Much of the bunching of recently-added genes arises from new gene formation as paralogues in gene families, near the location of genes that were recruited in the preceding phylostratum. As examples we cite the KRTAP, ZNF, OR and some minor gene families. We show that bunching can also result from the evolution of the chromosomes themselves when, as for the KRTAP genes, blocks of genes that had previously been on disparate chromosomes become linked together.
Collapse
Affiliation(s)
- Wilfred D Stein
- Silberman Institute of Life Sciences, Hebrew University, 91904, Jerusalem, Israel.
| | - Moshe B Hoshen
- Bioinformatics Department, Jerusalem College of Technology, Tal Campus, Beit HaDfus 7, 95483, Jerusalem, Israel
| |
Collapse
|
12
|
Seetharam AS, Yu Y, Bélanger S, Clark LG, Meyers BC, Kellogg EA, Hufford MB. The Streptochaeta Genome and the Evolution of the Grasses. FRONTIERS IN PLANT SCIENCE 2021; 12:710383. [PMID: 34671369 PMCID: PMC8521107 DOI: 10.3389/fpls.2021.710383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Accepted: 09/08/2021] [Indexed: 05/15/2023]
Abstract
In this work, we sequenced and annotated the genome of Streptochaeta angustifolia, one of two genera in the grass subfamily Anomochlooideae, a lineage sister to all other grasses. The final assembly size is over 99% of the estimated genome size. We find good collinearity with the rice genome and have captured most of the gene space. Streptochaeta is similar to other grasses in the structure of its fruit (a caryopsis or grain) but has peculiar flowers and inflorescences that are distinct from those in the outgroups and in other grasses. To provide tools for investigations of floral structure, we analyzed two large families of transcription factors, AP2-like and R2R3 MYBs, that are known to control floral and spikelet development in rice and maize among other grasses. Many of these are also regulated by small RNAs. Structure of the gene trees showed that the well documented whole genome duplication at the origin of the grasses (ρ) occurred before the divergence of the Anomochlooideae lineage from the lineage leading to the rest of the grasses (the spikelet clade) and thus that the common ancestor of all grasses probably had two copies of the developmental genes. However, Streptochaeta (and by inference other members of Anomochlooideae) has lost one copy of many genes. The peculiar floral morphology of Streptochaeta may thus have derived from an ancestral plant that was morphologically similar to the spikelet-bearing grasses. We further identify 114 loci producing microRNAs and 89 loci generating phased, secondary siRNAs, classes of small RNAs known to be influential in transcriptional and post-transcriptional regulation of several plant functions.
Collapse
Affiliation(s)
- Arun S. Seetharam
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, United States
| | - Yunqing Yu
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | | | - Lynn G. Clark
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, United States
| | - Blake C. Meyers
- Donald Danforth Plant Science Center, St. Louis, MO, United States
- Division of Plant Sciences, University of Missouri, Columbia, MO, United States
| | | | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, United States
| |
Collapse
|
13
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
14
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
15
|
Sociality sculpts similar patterns of molecular evolution in two independently evolved lineages of eusocial bees. Commun Biol 2021; 4:253. [PMID: 33637860 PMCID: PMC7977082 DOI: 10.1038/s42003-021-01770-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/28/2021] [Indexed: 12/19/2022] Open
Abstract
While it is well known that the genome can affect social behavior, recent models posit that social lifestyles can, in turn, influence genome evolution. Here, we perform the most phylogenetically comprehensive comparative analysis of 16 bee genomes to date: incorporating two published and four new carpenter bee genomes (Apidae: Xylocopinae) for a first-ever genomic comparison with a monophyletic clade containing solitary through advanced eusocial taxa. We find that eusocial lineages have undergone more gene family expansions, feature more signatures of positive selection, and have higher counts of taxonomically restricted genes than solitary and weakly social lineages. Transcriptomic data reveal that caste-affiliated genes are deeply-conserved; gene regulatory and functional elements are more closely tied to social phenotype than phylogenetic lineage; and regulatory complexity increases steadily with social complexity. Overall, our study provides robust empirical evidence that social evolution can act as a major and surprisingly consistent driver of macroevolutionary genomic change.
Collapse
|
16
|
Singh U, Hur M, Dorman K, Wurtele ES. MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Res 2020; 48:e23. [PMID: 31956905 PMCID: PMC7039010 DOI: 10.1093/nar/gkz1209] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 12/05/2019] [Accepted: 12/17/2019] [Indexed: 12/17/2022] Open
Abstract
The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.
Collapse
Affiliation(s)
- Urminder Singh
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Manhoi Hur
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Karin Dorman
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Department of Statistics, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
17
|
Abstract
Analysis of yeast, fly and human genomes suggests that sequence divergence is not the main source of orphan genes.
Collapse
Affiliation(s)
- Urminder Singh
- Department of Genetics, Developmental and Cell Biology, Iowa State UniversityAmesUnited States
| | - Eve Syrkin Wurtele
- Department of Genetics, Developmental and Cell Biology, Iowa State UniversityAmesUnited States
| |
Collapse
|
18
|
Leiboff S, Hake S. Reconstructing the Transcriptional Ontogeny of Maize and Sorghum Supports an Inverse Hourglass Model of Inflorescence Development. Curr Biol 2019; 29:3410-3419.e3. [PMID: 31587998 DOI: 10.1016/j.cub.2019.08.044] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 06/29/2019] [Accepted: 08/19/2019] [Indexed: 12/31/2022]
Abstract
Assembling meaningful comparisons between species is a major limitation in studying the evolution of organismal form. To understand development in maize and sorghum, closely related species with architecturally distinct inflorescences, we collected RNA-seq profiles encompassing inflorescence body-plan specification in both species. We reconstructed molecular ontogenies from 40 B73 maize tassels and 47 BTx623 sorghum panicles and separated them into transcriptional stages. To discover new markers of inflorescence development, we used random forest machine learning to determine stage by RNA-seq. We used two descriptions of transcriptional conservation to identify hourglass-like stages during inflorescence development. Despite a relatively short 12 million years since their last common ancestor, we found maize and sorghum inflorescences are most different during their hourglass-like stages of development, following an inverse-hourglass model of development. We discuss whether agricultural selection may account for the rapid divergence signatures in these species and the observed separation of evolutionary pressure and developmental reprogramming.
Collapse
Affiliation(s)
- Samuel Leiboff
- Plant Gene Expression Center, U.S. Department of Agriculture-Agricultural Research Service and University of California, Berkeley, Albany, CA 94710, USA.
| | - Sarah Hake
- Plant Gene Expression Center, U.S. Department of Agriculture-Agricultural Research Service and University of California, Berkeley, Albany, CA 94710, USA
| |
Collapse
|
19
|
Arendsee Z, Li J, Singh U, Bhandary P, Seetharam A, Wurtele ES. fagin: synteny-based phylostratigraphy and finer classification of young genes. BMC Bioinformatics 2019; 20:440. [PMID: 31455236 PMCID: PMC6712868 DOI: 10.1186/s12859-019-3023-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 08/08/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the "Unknown" A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Jing Li
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
| | - Urminder Singh
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Arun Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA, 50011, USA
| | - Eve Syrkin Wurtele
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA.
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA.
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|