1
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
2
|
Wolfsberger W, Chhugani K, Shchubelka K, Frolova A, Salyha Y, Zlenko O, Arych M, Dziuba D, Parkhomenko A, Smolanka V, Gümüş ZH, Sezgin E, Diaz-Lameiro A, Toth VR, Maci M, Bortz E, Kondrashov F, Morton PM, Łabaj PP, Romero V, Hlávka J, Mangul S, Oleksyk TK. Scientists without borders: lessons from Ukraine. Gigascience 2022; 12:giad045. [PMID: 37496156 PMCID: PMC10372202 DOI: 10.1093/gigascience/giad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 07/28/2023] Open
Abstract
Conflicts and natural disasters affect entire populations of the countries involved and, in addition to the thousands of lives destroyed, have a substantial negative impact on the scientific advances these countries provide. The unprovoked invasion of Ukraine by Russia, the devastating earthquake in Turkey and Syria, and the ongoing conflicts in the Middle East are just a few examples. Millions of people have been killed or displaced, their futures uncertain. These events have resulted in extensive infrastructure collapse, with loss of electricity, transportation, and access to services. Schools, universities, and research centers have been destroyed along with decades' worth of data, samples, and findings. Scholars in disaster areas face short- and long-term problems in terms of what they can accomplish now for obtaining grants and for employment in the long run. In our interconnected world, conflicts and disasters are no longer a local problem but have wide-ranging impacts on the entire world, both now and in the future. Here, we focus on the current and ongoing impact of war on the scientific community within Ukraine and from this draw lessons that can be applied to all affected countries where scientists at risk are facing hardship. We present and classify examples of effective and feasible mechanisms used to support researchers in countries facing hardship and discuss how these can be implemented with help from the international scientific community and what more is desperately needed. Reaching out, providing accessible training opportunities, and developing collaborations should increase inclusion and connectivity, support scientific advancements within affected communities, and expedite postwar and disaster recovery.
Collapse
Affiliation(s)
- Walter Wolfsberger
- Department of Biological Sciences, Oakland University,
Rochester, MI 48309-4479, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and
Pharmaceutical Sciences, University of Southern California,
Los Angeles, CA 90033, USA
| | - Khrystyna Shchubelka
- Department of Biological Sciences, Oakland University,
Rochester, MI 48309-4479, USA
| | - Alina Frolova
- Institute of Molecular Biology and Genetics of National Academy of Sciences
of Ukraine, Kyiv Academic University, Kyiv 03143,
Ukraine
| | - Yuriy Salyha
- Institute of Animal Biology, National Academy of Agrarian Sciences (NAAS)
of Ukraine, Lviv 79034, Ukraine
| | - Oksana Zlenko
- National Scientific Center “Institute of Experimental and Clinical
Veterinary Medicine,” Kharkiv 61023, Ukraine
| | - Mykhailo Arych
- Institute of Economics and Management, National University of Food
Technologies (NUFT) of Ukraine, Kyiv 01601,
Ukraine
| | - Dmytro Dziuba
- Department of Anesthesiology and Intensive Care, P.L. Shpyk
NUHC Ukraine, Kyiv 04112, Ukraine
| | - Andrii Parkhomenko
- Department of Finance and Business Economics, Marshall School
of Business, University of Southern California, Los Angeles, CA 90089, USA
| | - Volodymyr Smolanka
- Department of Medicine, Uzhhorod National University,
Uzhhorod 88000, Ukraine
| | - Zeynep H Gümüş
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at
Mount Sinai, New York, NY 10029, USA
| | - Efe Sezgin
- Department of Food Engineering, Izmir Institute of
Technology, Urla, Izmir 35430, Turkey
| | - Alondra Diaz-Lameiro
- Department of Biology, University of Puerto Rico at Mayagüez,
Mayagüez 00681, Puerto
Rico
| | - Viktor R Toth
- Aquatic Botany and Microbial Ecology Research Group, Balaton Limnological
Research Institute, Tihany 8237, Hungary
| | - Megi Maci
- Stritch School of Medicine, Loyola University Chicago,
Maywood, IL 60153, USA
| | - Eric Bortz
- Department of Biological Sciences, University of Alaska,
Anchorage, AK 99508, USA
| | - Fyodor Kondrashov
- Institute of Science and Technology Austria,
Klosterneuburg 3400, Austria
| | - Patricia M Morton
- Department of Sociology, Department of Public Health, Wayne State
University, Detroit, MI 48202, USA
| | - Paweł P Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University,
Kraków 30-348, Poland
| | - Veronika Romero
- Department of Neurobiology, University of Utah, Salt Lake
City, UT 84112, USA
| | - Jakub Hlávka
- Price School of Public Policy, University of Southern
California, Los Angeles, CA 90089-3333, USA
- Masaryk University, Brno 6017, Czech Republic
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and
Pharmaceutical Sciences, University of Southern California,
Los Angeles, CA 90033, USA
- Department of Computational Biology, University of Southern
California, Los Angeles, CA 90033, USA
| | - Taras K Oleksyk
- Department of Biological Sciences, Oakland University,
Rochester, MI 48309-4479, USA
- Department of Biology, Uzhhorod National University, Uzhhorod
88000, Ukraine
| |
Collapse
|
3
|
Doyle SR. Improving helminth genome resources in the post-genomic era. Trends Parasitol 2022; 38:831-840. [PMID: 35810065 DOI: 10.1016/j.pt.2022.06.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/14/2022] [Accepted: 06/14/2022] [Indexed: 01/02/2023]
Abstract
Rapid advancement in high-throughput sequencing and analytical approaches has seen a steady increase in the generation of genomic resources for helminth parasites. Now, helminth genomes and their annotations are a cornerstone of numerous efforts to compare genetic and transcriptomic variation, from single cells to populations of globally distributed parasites, to genome modifications to understand gene function. Our understanding of helminths is increasingly reliant on these genomic resources, which are primarily static once published and vary widely in quality and completeness between species. This article seeks to highlight the cause and effect of this variation and argues for the continued improvement of these genomic resources - even after their publication - which is necessary to provide a more accurate and complete understanding of the biology of these important pathogens.
Collapse
Affiliation(s)
- Stephen R Doyle
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
| |
Collapse
|
4
|
Grimplet J. Genomic and Bioinformatic Resources for Perennial Fruit Species. Curr Genomics 2022; 23:217-233. [PMID: 36777875 PMCID: PMC9875543 DOI: 10.2174/1389202923666220428102632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 03/12/2022] [Accepted: 03/12/2022] [Indexed: 11/22/2022] Open
Abstract
In the post-genomic era, data management and development of bioinformatic tools are critical for the adequate exploitation of genomics data. In this review, we address the actual situation for the subset of crops represented by the perennial fruit species. The agronomical singularity of these species compared to plant and crop model species provides significant challenges on the implementation of good practices generally not addressed in other species. Studies are usually performed over several years in non-controlled environments, usage of rootstock is common, and breeders heavily rely on vegetative propagation. A reference genome is now available for all the major species as well as many members of the economically important genera for breeding purposes. Development of pangenome for these species is beginning to gain momentum which will require a substantial effort in term of bioinformatic tool development. The available tools for genome annotation and functional analysis will also be presented.
Collapse
Affiliation(s)
- Jérôme Grimplet
- Centro de Investigación y Tecnología Agroalimentaria de Aragón (CITA), Unidad de Hortofruticultura, Gobierno de Aragón, Avda. Montañana, Zaragoza, Spain;,Instituto Agroalimentario de Aragón–IA2 (CITA-Universidad de Zaragoza), Calle Miguel Servet, Zaragoza, Spain,Address correspondence to this author at the Centro de Investigación y Tecnología Agroalimentaria de Aragón (CITA), Unidad de Hortofruticultura, Gobierno de Aragón, Avda. Montañana, Zaragoza, Spain; Instituto Agroalimentario de Aragón–IA2 (CITA-Universidad de Zaragoza), Calle Miguel Servet, Zaragoza, Spain; Tel: +34976713635; E-mail:
| |
Collapse
|
5
|
Reynolds M, de Oliveira L, Vosburg C, Paris T, Massimino C, Norus J, Ortiz Y, Espino M, Davis N, Masse R, Neiman A, Holcomb R, Gervais K, Kemp M, Hoang M, Shippy TD, Hosmani PS, Flores-Gonzalez M, Pelz-Stelinski K, Qureshi JA, Mueller LA, Hunter WB, Benoit JB, Brown SJ, D’Elia T, Saha S. Annotation of putative circadian rhythm-associated genes in Diaphorina citri (Hemiptera: Liviidae). GIGABYTE 2022; 2022:gigabyte48. [PMID: 36824532 PMCID: PMC9662589 DOI: 10.46471/gigabyte.48] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/28/2022] [Indexed: 11/09/2022] Open
Abstract
The circadian rhythm involves multiple genes that generate an internal molecular clock, allowing organisms to anticipate environmental conditions produced by the Earth's rotation on its axis. Here, we present the results of the manual curation of 27 genes that are associated with circadian rhythm in the genome of Diaphorina citri, the Asian citrus psyllid. This insect is the vector for the bacterial pathogen Candidatus Liberibacter asiaticus (CLas), the causal agent of citrus greening disease (Huanglongbing). This disease severely affects citrus industries and has drastically decreased crop yields worldwide. Based on cry1 and cry2 identified in the psyllid genome, D. citri likely possesses a circadian model similar to the lepidopteran butterfly, Danaus plexippus. Manual annotation will improve the quality of circadian rhythm gene models, allowing the future development of molecular therapeutics, such as RNA interference or antisense technologies, to target these genes to disrupt the psyllid biology.
Collapse
Affiliation(s)
- Max Reynolds
- Indian River State College, Fort Pierce, FL 34981, USA
| | | | - Chad Vosburg
- Indian River State College, Fort Pierce, FL 34981, USA,Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, PA 16802, USA
| | - Thomson Paris
- Entomology and Nematology Department, University of Florida, North Florida Research and Education Center, Research Road, Quincy 32351, Florida, USA
| | | | - Jordan Norus
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Yasmin Ortiz
- Indian River State College, Fort Pierce, FL 34981, USA
| | | | - Nina Davis
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Ron Masse
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Alan Neiman
- Indian River State College, Fort Pierce, FL 34981, USA
| | | | - Kylie Gervais
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Melissa Kemp
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Maria Hoang
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Teresa D. Shippy
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | | | | | - Kirsten Pelz-Stelinski
- Department of Entomology and Nematology, University of Florida, Lake Alfred, FL 33850, USA
| | - Jawwad A. Qureshi
- Indian River Research and Education Center, University of Florida, IFAS, 2199 South Rock Road, Fort Pierce, FL 34945-3138, USA,Southwest Florida Research and Education Center, University of Florida, IFAS, 2685 State Road 29 North, Immokalee, FL 34142, USA
| | | | - Wayne B. Hunter
- USDA-ARS, US Horticultural Research Laboratory, Fort Pierce, FL 34945, USA
| | - Joshua B. Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Susan J. Brown
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | - Tom D’Elia
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14853, USA,Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ 85721, USA, Corresponding author. E-mail:
| |
Collapse
|
6
|
Tamayo B, Kercher K, Vosburg C, Massimino C, Jernigan MR, Hasan DL, Harper D, Mathew A, Adkins S, Shippy T, Hosmani PS, Flores-Gonzalez M, Panitz N, Mueller LA, Hunter WB, Benoit JB, Brown SJ, D’Elia T, Saha S. Annotation of glycolysis, gluconeogenesis, and trehaloneogenesis pathways provide insight into carbohydrate metabolism in the Asian citrus psyllid. GIGABYTE 2022; 2022:gigabyte41. [PMID: 36824510 PMCID: PMC9933520 DOI: 10.46471/gigabyte.41] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 02/11/2022] [Indexed: 11/09/2022] Open
Abstract
Citrus greening disease is caused by the pathogen Candidatus Liberibacter asiaticus and transmitted by the Asian citrus psyllid, Diaphorina citri. No curative treatment or significant prevention mechanism exists for this disease, which causes economic losses from reduced citrus production. A high-quality genome of D. citri is being manually annotated to provide accurate gene models to identify novel control targets and increase understanding of this pest. Here, we annotated 25 D. citri genes involved in glycolysis and gluconeogenesis, and seven in trehaloneogenesis. Comparative analysis showed that glycolysis genes in D. citri are highly conserved but copy numbers vary. Analysis of expression levels revealed upregulation of several enzymes in the glycolysis pathway in the thorax, consistent with the primary use of glucose by thoracic flight muscles. Manually annotating these core metabolic pathways provides accurate genomic foundation for developing gene-targeting therapeutics to control D. citri.
Collapse
Affiliation(s)
- Blessy Tamayo
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Kyle Kercher
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Chad Vosburg
- Indian River State College, Fort Pierce, FL 34981, USA
| | | | | | | | | | - Anuja Mathew
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Samuel Adkins
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Teresa Shippy
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | | | | | | | | | - Wayne B. Hunter
- US Department of Agriculture-Agricultural Research Service (USDA-ARS), US Horticultural Research Laboratory, Fort Pierce, FL 34945, USA
| | - Joshua B. Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Susan J. Brown
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | - Tom D’Elia
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Surya Saha
- Boyce Thompson InstituteIthaca, NY 14853, USA,Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ 85721, USA, Corresponding author. E-mail:
| |
Collapse
|
7
|
Thurlow KE, Lovering RC, De Miranda Pinheiro S. Student biocuration projects as a learning environment. F1000Res 2022; 10:1023. [PMID: 35211294 PMCID: PMC8831850 DOI: 10.12688/f1000research.72808.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/06/2022] [Indexed: 11/20/2022] Open
Abstract
Background: Bioinformatics is becoming an essential tool for the majority of biological and biomedical researchers. Although bioinformatics data is exploited by academic and industrial researchers, limited focus is on teaching this area to undergraduates, postgraduates and senior scientists. Many scientists are developing their own expertise without formal training and often without appreciating the source of the data they are reliant upon. Some universities do provide courses on a variety of bioinformatics resources and tools, a few also provide biocuration projects, during which students submit data to annotation resources. Methods: To assess the usefulness and enjoyability of annotation projects a survey was sent to University College London (UCL) students who have undertaken Gene Ontology biocuration projects. Results: Analysis of survey responses suggest that these projects provide students with an opportunity not only to learn about bioinformatics resources but also to improve their literature analysis, presentation and writing skills. Conclusion: Biocuration student projects provide valuable annotations as well as enabling students to develop a variety of skills relevant to their future careers. It is also hoped that, as future scientists, these students will critically assess their own manuscripts and ensure that these are written with the biocurators of the future in mind.
Collapse
Affiliation(s)
- Katherine E. Thurlow
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, WC1E 6JF, UK
| | - Ruth C. Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, WC1E 6JF, UK
| | - Sandra De Miranda Pinheiro
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, WC1E 6JF, UK
| |
Collapse
|
8
|
Ramsey J, McIntosh B, Renfro D, Aleksander SA, LaBonte S, Ross C, Zweifel AE, Liles N, Farrar S, Gill JJ, Erill I, Ades S, Berardini TZ, Bennett JA, Brady S, Britton R, Carbon S, Caruso SM, Clements D, Dalia R, Defelice M, Doyle EL, Friedberg I, Gurney SMR, Hughes L, Johnson A, Kowalski JM, Li D, Lovering RC, Mans TL, McCarthy F, Moore SD, Murphy R, Paustian TD, Perdue S, Peterson CN, Prüß BM, Saha MS, Sheehy RR, Tansey JT, Temple L, Thorman AW, Trevino S, Vollmer AC, Walbot V, Willey J, Siegele DA, Hu JC. Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO). PLoS Comput Biol 2021; 17:e1009463. [PMID: 34710081 PMCID: PMC8553046 DOI: 10.1371/journal.pcbi.1009463] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.
Collapse
Affiliation(s)
- Jolene Ramsey
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| | - Brenley McIntosh
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Daniel Renfro
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Suzanne A. Aleksander
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Sandra LaBonte
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Curtis Ross
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| | - Adrienne E. Zweifel
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Nathan Liles
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Shabnam Farrar
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Jason J. Gill
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
- Department of Animal Science, Texas A&M University, College Station, Texas, United States of America
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
| | - Sarah Ades
- Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Tanya Z. Berardini
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
| | - Jennifer A. Bennett
- Department of Biology and Earth Science, Otterbein University, Westerville, Ohio, United States of America
| | - Siobhan Brady
- Department of Plant Biology and Genome Center, University of California Davis, Davis, California, United States of America
| | - Robert Britton
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Steven M. Caruso
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
| | - Dave Clements
- Department of Biology, John Hopkins University, Baltimore, Maryland, United States of America
| | - Ritu Dalia
- Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Meredith Defelice
- Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Erin L. Doyle
- Biology Department, Doane University, Crete, Nebraska, United States of America
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Susan M. R. Gurney
- Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Lee Hughes
- Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
| | - Allison Johnson
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Jason M. Kowalski
- Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
| | - Donghui Li
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
| | - Ruth C. Lovering
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - Tamara L. Mans
- Department of Biochemistry and Biotechnology, Minnesota State University Moorhead, Brooklyn Park, Minnesota, United States of America
| | - Fiona McCarthy
- Department of Basic Science, College of Veterinary Medicine, Mississippi State University, Starkville, Mississippi, United States of America
| | - Sean D. Moore
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America
| | - Rebecca Murphy
- Department of Biology, Centenary College of Louisiana, Shreveport, Louisiana, United States of America
| | - Timothy D. Paustian
- Department of Bacteriology, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Sarah Perdue
- Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
| | - Celeste N. Peterson
- Biology Department, Suffolk University, Boston, Massachusetts, United States of America
| | - Birgit M. Prüß
- Microbiological Sciences Department, North Dakota State University, Fargo, North Dakota, United States of America
| | - Margaret S. Saha
- Department of Biology, College of William & Mary, Williamsburg, Virginia, United States of America
| | - Robert R. Sheehy
- Biology Department, Radford University, Radford, Virginia, United States of America
| | - John T. Tansey
- Department of Biochemistry and Molecular Biology, Otterbein University, Westerville, Ohio, United States of America
| | - Louise Temple
- School of Integrated Sciences, James Madison University, Harrisonburg, Virginia, United States of America
| | - Alexander William Thorman
- Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Saul Trevino
- Department of Chemistry, Math, and Physics, Houston Baptist University, Houston, Texas, United States of America
| | - Amy Cheng Vollmer
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Virginia Walbot
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Joanne Willey
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Deborah A. Siegele
- Department of Biology, Texas A&M University, College Station, Texas, United States of America
| | - James C. Hu
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
9
|
Finkers R, van Kaauwen M, Ament K, Burger-Meijer K, Egging R, Huits H, Kodde L, Kroon L, Shigyo M, Sato S, Vosman B, van Workum W, Scholten O. Insights from the first genome assembly of Onion (Allium cepa). G3 (BETHESDA, MD.) 2021; 11. [PMID: 34544132 DOI: 10.1101/2021.03.05.434149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/06/2021] [Indexed: 05/18/2023]
Abstract
Onion is an important vegetable crop with an estimated genome size of 16 Gb. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with an N50 of 464 Kb. Of this, 2.4 Gb was ordered into eight pseudomolecules using four genetic linkage maps. The remainder of the genome is available in 89.6 K scaffolds. Only 72.4% of the genome could be identified as repetitive sequences and consist, to a large extent, of (retro) transposons. In addition, an estimated 20% of the putative (retro) transposons had accumulated a large number of mutations, hampering their identification, but facilitating their assembly. These elements are probably already quite old. The ab initio gene prediction indicated 540,925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. Of these models, 47,066 showed RNASeq support. No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with Allium sativum (garlic) showed collinearity but also major rearrangements between both species. This assembly is the first high-quality genome sequence available for the study of onion and will be a valuable resource for further research.
Collapse
Affiliation(s)
- Richard Finkers
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Martijn van Kaauwen
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Kai Ament
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Karin Burger-Meijer
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | | | - Henk Huits
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Linda Kodde
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Laurens Kroon
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Masayoshi Shigyo
- Laboratory of Vegetable Crop Science, College of Agriculture, Graduate School of Sciences and Technology for Innovation, Yamaguchi University Yamaguchi City, Yamaguchi 753-8515, Japan
| | - Shusei Sato
- Graduate School of Life Sciences, Tohoku University, Sendai 980-8577, Japan
| | - Ben Vosman
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | | | - Olga Scholten
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| |
Collapse
|
10
|
Finkers R, van Kaauwen M, Ament K, Burger-Meijer K, Egging R, Huits H, Kodde L, Kroon L, Shigyo M, Sato S, Vosman B, van Workum W, Scholten O. Insights from the first genome assembly of Onion (Allium cepa). G3 (BETHESDA, MD.) 2021; 11:jkab243. [PMID: 34544132 PMCID: PMC8496297 DOI: 10.1093/g3journal/jkab243] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/06/2021] [Indexed: 11/17/2022]
Abstract
Onion is an important vegetable crop with an estimated genome size of 16 Gb. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with an N50 of 464 Kb. Of this, 2.4 Gb was ordered into eight pseudomolecules using four genetic linkage maps. The remainder of the genome is available in 89.6 K scaffolds. Only 72.4% of the genome could be identified as repetitive sequences and consist, to a large extent, of (retro) transposons. In addition, an estimated 20% of the putative (retro) transposons had accumulated a large number of mutations, hampering their identification, but facilitating their assembly. These elements are probably already quite old. The ab initio gene prediction indicated 540,925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. Of these models, 47,066 showed RNASeq support. No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with Allium sativum (garlic) showed collinearity but also major rearrangements between both species. This assembly is the first high-quality genome sequence available for the study of onion and will be a valuable resource for further research.
Collapse
Affiliation(s)
- Richard Finkers
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Martijn van Kaauwen
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Kai Ament
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Karin Burger-Meijer
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | | | - Henk Huits
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Linda Kodde
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | - Laurens Kroon
- Bejo Zaden B.V., 1749 CZ Warmerhuizen, The Netherlands
| | - Masayoshi Shigyo
- Laboratory of Vegetable Crop Science, College of Agriculture, Graduate School of Sciences and Technology for Innovation, Yamaguchi University Yamaguchi City, Yamaguchi 753-8515, Japan
| | - Shusei Sato
- Graduate School of Life Sciences, Tohoku University, Sendai 980-8577, Japan
| | - Ben Vosman
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| | | | - Olga Scholten
- Plant Breeding, Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands
| |
Collapse
|
11
|
Saha S, Cooksey AM, Childers AK, Poelchau MF, McCarthy FM. Workflows for Rapid Functional Annotation of Diverse Arthropod Genomes. INSECTS 2021; 12:748. [PMID: 34442314 PMCID: PMC8397112 DOI: 10.3390/insects12080748] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/16/2021] [Accepted: 08/11/2021] [Indexed: 12/03/2022]
Abstract
Genome sequencing of a diverse array of arthropod genomes is already underway, and these genomes will be used to study human health, agriculture, biodiversity, and ecology. These new genomes are intended to serve as community resources and provide the foundational information required to apply 'omics technologies to a more diverse set of species. However, biologists require genome annotation to use these genomes and derive a better understanding of complex biological systems. Genome annotation incorporates two related, but distinct, processes: Demarcating genes and other elements present in genome sequences (structural annotation); and associating a function with genetic elements (functional annotation). While there are well-established and freely available workflows for structural annotation of gene identification in newly assembled genomes, workflows for providing the functional annotation required to support functional genomics studies are less well understood. Genome-scale functional annotation is required for functional modeling (enrichment, networks, etc.). A first-pass genome-wide functional annotation effort can rapidly identify under-represented gene sets for focused community annotation efforts. We present an open-source, open access, and containerized pipeline for genome-scale functional annotation of insect proteomes and apply it to various arthropod species. We show that the performance of the predictions is consistent across a set of arthropod genomes with varying assembly and annotation quality.
Collapse
Affiliation(s)
- Surya Saha
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA;
- School of Animal and Comparative Biomedical Sciences, University of Arizona, 1117 E. Lowell St., Tucson, AZ 85721, USA;
| | - Amanda M. Cooksey
- School of Animal and Comparative Biomedical Sciences, University of Arizona, 1117 E. Lowell St., Tucson, AZ 85721, USA;
- CyVerse, BioScience Research Laboratories, University of Arizona, 1230 N. Cherry Ave., Tucson, AZ 85721, USA
| | - Anna K. Childers
- Bee Research Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, 10300 Baltimore Ave., Beltsville, MD 20705, USA;
| | - Monica F. Poelchau
- National Agricultural Library, Agricultural Research Service, USDA, 10301 Baltimore Ave., Beltsville, MD 20705, USA;
| | - Fiona M. McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, 1117 E. Lowell St., Tucson, AZ 85721, USA;
| |
Collapse
|
12
|
Massimino C, Vosburg C, Shippy T, Hosmani PS, Flores-Gonzalez M, Mueller LA, Hunter WB, Benoit JB, Brown SJ, D’Elia T, Saha S. Annotation of yellow genes in Diaphorina citri, the vector for Huanglongbing disease. GIGABYTE 2021; 2021:gigabyte20. [PMID: 36824344 PMCID: PMC9631960 DOI: 10.46471/gigabyte.20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/18/2021] [Indexed: 11/09/2022] Open
Abstract
Huanglongbing (HLB), also known as citrus greening disease, is caused by the bacterium Candidatus Liberibacter asiaticus (CLas). It is a serious threat to global citrus production. This bacterium is transmitted by the Asian citrus psyllid, Diaphorina citri (Hemiptera). There are no effective in planta treatments for CLas. Therefore, one strategy is to manage the psyllid population. Manual annotation of the D. citri genome can identify and characterize gene families that could be novel targets for psyllid control. The yellow gene family is an excellent target because yellow genes, which have roles in melanization, are linked to development and immunity. Combined analysis of the genome with RNA-seq datasets, sequence homology, and phylogenetic trees were used to identify and annotate nine yellow genes in the D. citri genome. Manual curation of genes in D. citri provided in-depth analysis of the yellow family among hemipteran insects and provides new targets for molecular control of this psyllid pest. Manual annotation was done as part of a collaborative Citrus Greening community annotation project.
Collapse
Affiliation(s)
| | - Chad Vosburg
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Teresa Shippy
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | | | | | | | - Wayne B. Hunter
- USDA-ARS, US Horticultural Research Laboratory, Fort Pierce, FL 34945, USA
| | - Joshua B. Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Susan J. Brown
- Division of Biology, Kansas State University, Manhattan, KS 66506, USA
| | - Tom D’Elia
- Indian River State College, Fort Pierce, FL 34981, USA
| | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14853, USA,Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ 85721, USA, Corresponding author. E-mail:
| |
Collapse
|
13
|
Saha S, Shippy TD, Brown SJ, Benoit JB, D’Elia T. Undergraduate Virtual Engagement in Community Genome Annotation Provides Flexibility to Overcome Course Disruptions. JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION 2021; 22:22.1.38. [PMID: 33884059 PMCID: PMC8011878 DOI: 10.1128/jmbe.v22i1.2395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
Recently, students and faculty have been forced to deal with unprecedented disruptions to their courses and broader uncertainties that have presented serious challenges to quality instruction. We present a flexible, team-based approach to teaching and learning that can transition seamlessly between face-to-face, hybrid, and fully online instruction when disruptions occur. We have built a community genome annotation program that can be implemented as a module in a biology course, as an entire course, or as directed research projects. This approach maintains an engaging and supportive educational environment and provides students the opportunity to learn and contribute to science with undergraduate research. Students are provided guidance through multiple interactions with faculty and peer mentors to support their progress and encourage learning. Integration of the developed instructional tools with available technology ensures that students can contribute remotely. Through this process, students seamlessly continue their annotation coursework, participate in undergraduate research, and prepare abstracts and posters for virtual conferences. Importantly, this strategy does not impose any additional burden or workload on students, who may already be overwhelmed with the additional work associated with the transition to remote learning. Here, we present tips for implementing this instructional platform, provide an overview of tools that facilitate instruction, and discuss expected educational outcomes.
Collapse
Affiliation(s)
- Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14853, and Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ 85721
| | - Teresa D. Shippy
- Division of Biology, Kansas State University, Manhattan, KS 66506
| | - Susan J. Brown
- Division of Biology, Kansas State University, Manhattan, KS 66506
| | - Joshua B. Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221
| | - Tom D’Elia
- Biology Department, Indian River State College, Fort Pierce, FL 34981
| |
Collapse
|
14
|
Stein W, Talasu S, Vidal-Gadea A, DeMaegd ML. Physiologists turned Geneticists: Identifying transcripts and genes for neuronal function in the Marbled Crayfish, Procambarus virginalis. JOURNAL OF UNDERGRADUATE NEUROSCIENCE EDUCATION : JUNE : A PUBLICATION OF FUN, FACULTY FOR UNDERGRADUATE NEUROSCIENCE 2020; 19:A36-A51. [PMID: 33880091 PMCID: PMC8040847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 09/01/2000] [Accepted: 09/21/2020] [Indexed: 06/12/2023]
Abstract
The number of undergraduate researchers interested in pursuing neurophysiological research exceeds the research laboratory positions and hands-on course experiences available because these types of experiments often require extensive experience or expensive equipment. In contrast, genetic and molecular tools can more easily incorporate undergraduates with less time or training. With the explosion of newly sequenced genomes and transcriptomes, there is a large pool of untapped molecular and genetic information which would greatly inform neurophysiological processes. Classically trained neurophysiologists often struggle to make use of newly available genetic information for themselves and their trainees, despite the clear advantage of combining genetic and physiological techniques. This is particularly prevalent among researchers working with organisms that historically had no or only few genetic tools available. Combining these two fields will expose undergraduates to a greater variety of research approaches, concepts, and hands-on experiences. The goal of this manuscript is to provide an easily understandable and reproducible workflow that can be applied in both lab and classroom settings to identify genes involved in neuronal function. We outline clear learning objectives that can be acquired by following our workflow and assessed by peer-evaluation. Using our workflow, we identify and validate the sequence of two new Gamma Aminobutyric Acid A (GABAA) receptor subunit homologs in the recently published genome and transcriptome of the marbled crayfish, Procambarus virginalis. Altogether, this allows undergraduate students to apply their knowledge of the processes of gene expression to functional neuronal outcomes. It also provides them with opportunities to contribute significantly to physiological research, thereby exposing them to interdisciplinary approaches.
Collapse
Affiliation(s)
- Wolfgang Stein
- School of Biological Science, Illinois State University, Normal, IL 61790
| | - Saisupritha Talasu
- School of Biological Science, Illinois State University, Normal, IL 61790
| | - Andrés Vidal-Gadea
- School of Biological Science, Illinois State University, Normal, IL 61790
| | - Margaret L DeMaegd
- School of Biological Science, Illinois State University, Normal, IL 61790
| |
Collapse
|
15
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
16
|
Sargent L, Liu Y, Leung W, Mortimer NT, Lopatto D, Goecks J, Elgin SCR. G-OnRamp: Generating genome browsers to facilitate undergraduate-driven collaborative genome annotation. PLoS Comput Biol 2020; 16:e1007863. [PMID: 32497138 PMCID: PMC7272004 DOI: 10.1371/journal.pcbi.1007863] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Scientists are sequencing new genomes at an increasing rate with the goal of associating genome contents with phenotypic traits. After a new genome is sequenced and assembled, structural gene annotation is often the first step in analysis. Despite advances in computational gene prediction algorithms, most eukaryotic genomes still benefit from manual gene annotation. This requires access to good genome browsers to enable annotators to visualize and evaluate multiple lines of evidence (e.g., sequence similarity, RNA sequencing [RNA-Seq] results, gene predictions, repeats) and necessitates many volunteers to participate in the work. To address the technical barriers to creating genome browsers, the Genomics Education Partnership (GEP; https://gep.wustl.edu/) has partnered with the Galaxy Project (https://galaxyproject.org) to develop G-OnRamp (http://g-onramp.org), a web-based platform for creating UCSC Genome Browser Assembly Hubs and JBrowse genome browsers. G-OnRamp also converts a JBrowse instance into an Apollo instance for collaborative genome annotations in research and educational settings. The genome browsers produced can be transferred to the CyVerse Data Store for long-term access. G-OnRamp enables researchers to easily visualize their experimental results, educators to create Course-based Undergraduate Research Experiences (CUREs) centered on genome annotation, and students to participate in genomics research. In the process, students learn about genes/genomes and about how to utilize large datasets. Development of G-OnRamp was guided by extensive user feedback. Sixty-five researchers/educators from >40 institutions participated through in-person workshops, which produced >20 genome browsers now available for research and education. Genome browsers generated for four parasitoid wasp species have been used in a CURE engaging students at 15 colleges and universities. Our assessment results in the classroom demonstrate that the genome browsers produced by G-OnRamp are effective tools for engaging undergraduates in research and in enabling their contributions to the scientific literature in genomics. Expansion of such genomics research/education partnerships will be beneficial to researchers, faculty, and students alike.
Collapse
Affiliation(s)
- Luke Sargent
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Yating Liu
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Wilson Leung
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Nathan T. Mortimer
- School of Biological Sciences, Illinois State University, Normal, Illinois, United States of America
| | - David Lopatto
- Department of Psychology, Grinnell College, Grinnell, Iowa, United States of America
| | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Sarah C. R. Elgin
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
17
|
Monnahan PJ, Michno JM, O'Connor C, Brohammer AB, Springer NM, McGaugh SE, Hirsch CN. Using multiple reference genomes to identify and resolve annotation inconsistencies. BMC Genomics 2020; 21:281. [PMID: 32264824 DOI: 10.1101/651984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 03/24/2020] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. RESULTS We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3-5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. CONCLUSIONS Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses.
Collapse
Affiliation(s)
- Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Jean-Michel Michno
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Christine O'Connor
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
| | - Alex B Brohammer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Suzanne E McGaugh
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
18
|
Monnahan PJ, Michno JM, O'Connor C, Brohammer AB, Springer NM, McGaugh SE, Hirsch CN. Using multiple reference genomes to identify and resolve annotation inconsistencies. BMC Genomics 2020; 21:281. [PMID: 32264824 PMCID: PMC7140576 DOI: 10.1186/s12864-020-6696-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 03/24/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Advances in sequencing technologies have led to the release of reference genomes and annotations for multiple individuals within more well-studied systems. While each of these new genome assemblies shares significant portions of synteny between each other, the annotated structure of gene models within these regions can differ. Of particular concern are split-gene misannotations, in which a single gene is incorrectly annotated as two distinct genes or two genes are incorrectly annotated as a single gene. These misannotations can have major impacts on functional prediction, estimates of expression, and many downstream analyses. RESULTS We developed a high-throughput method based on pairwise comparisons of annotations that detect potential split-gene misannotations and quantifies support for whether the genes should be merged into a single gene model. We demonstrated the utility of our method using gene annotations of three reference genomes from maize (B73, PH207, and W22), a difficult system from an annotation perspective due to the size and complexity of the genome. On average, we found several hundred of these potential split-gene misannotations in each pairwise comparison, corresponding to 3-5% of gene models across annotations. To determine which state (i.e. one gene or multiple genes) is biologically supported, we utilized RNAseq data from 10 tissues throughout development along with a novel metric and simulation framework. The methods we have developed require minimal human interaction and can be applied to future assemblies to aid in annotation efforts. CONCLUSIONS Split-gene misannotations occur at appreciable frequency in maize annotations. We have developed a method to easily identify and correct these misannotations. Importantly, this method is generic in that it can utilize any type of short-read expression data. Failure to account for split-gene misannotations has serious consequences for biological inference, particularly for expression-based analyses.
Collapse
Affiliation(s)
- Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Jean-Michel Michno
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Christine O'Connor
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
| | - Alex B Brohammer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Suzanne E McGaugh
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, 55108, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
19
|
Lopatto D, Rosenwald AG, DiAngelo JR, Hark AT, Skerritt M, Wawersik M, Allen AK, Alvarez C, Anderson S, Arrigo C, Arsham A, Barnard D, Bazinet C, Bedard JEJ, Bose I, Braverman JM, Burg MG, Burgess RC, Croonquist P, Du C, Dubowsky S, Eisler H, Escobar MA, Foulk M, Furbee E, Giarla T, Glaser RL, Goodman AL, Gosser Y, Haberman A, Hauser C, Hays S, Howell CE, Jemc J, Johnson ML, Jones CJ, Kadlec L, Kagey JD, Keller KL, Kennell J, Key SCS, Kleinschmit AJ, Kleinschmit M, Kokan NP, Kopp OR, Laakso MM, Leatherman J, Long LJ, Manier M, Martinez-Cruzado JC, Matos LF, McClellan AJ, McNeil G, Merkhofer E, Mingo V, Mistry H, Mitchell E, Mortimer NT, Mukhopadhyay D, Myka JL, Nagengast A, Overvoorde P, Paetkau D, Paliulis L, Parrish S, Preuss ML, Price JV, Pullen NA, Reinke C, Revie D, Robic S, Roecklein-Canfield JA, Rubin MR, Sadikot T, Sanford JS, Santisteban M, Saville K, Schroeder S, Shaffer CD, Sharif KA, Sklensky DE, Small C, Smith M, Smith S, Spokony R, Sreenivasan A, Stamm J, Sterne-Marr R, Teeter KC, Thackeray J, Thompson JS, Peters ST, Van Stry M, Velazquez-Ulloa N, Wolfe C, Youngblom J, Yowler B, Zhou L, Brennan J, Buhler J, Leung W, Reed LK, Elgin SCR. Facilitating Growth through Frustration: Using Genomics Research in a Course-Based Undergraduate Research Experience. JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION 2020; 21:jmbe-21-6. [PMID: 32148609 PMCID: PMC7048401 DOI: 10.1128/jmbe.v21i1.2005] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/23/2020] [Indexed: 06/10/2023]
Abstract
A hallmark of the research experience is encountering difficulty and working through those challenges to achieve success. This ability is essential to being a successful scientist, but replicating such challenges in a teaching setting can be difficult. The Genomics Education Partnership (GEP) is a consortium of faculty who engage their students in a genomics Course-Based Undergraduate Research Experience (CURE). Students participate in genome annotation, generating gene models using multiple lines of experimental evidence. Our observations suggested that the students' learning experience is continuous and recursive, frequently beginning with frustration but eventually leading to success as they come up with defendable gene models. In order to explore our "formative frustration" hypothesis, we gathered data from faculty via a survey, and from students via both a general survey and a set of student focus groups. Upon analyzing these data, we found that all three datasets mentioned frustration and struggle, as well as learning and better understanding of the scientific process. Bioinformatics projects are particularly well suited to the process of iteration and refinement because iterations can be performed quickly and are inexpensive in both time and money. Based on these findings, we suggest that a dynamic of "formative frustration" is an important aspect for a successful CURE.
Collapse
Affiliation(s)
- David Lopatto
- Center for Teaching, Learning and Assessment, Grinnell College, Grinnell, IA 50112, USA
| | | | | | - Amy T. Hark
- Biology, Muhlenberg College, Allentown, PA 18104, USA
| | | | - Matthew Wawersik
- Biology, College of William and Mary, Williamsburg, VA 23187, USA
| | - Anna K. Allen
- Biology, Howard University, Washington, DC 20059, USA
| | | | - Sara Anderson
- Biosciences, Minnesota State University Moorhead, Moorhead, MN 56563, USA
| | - Cindy Arrigo
- Biology, New Jersey City University, Jersey City, NJ 07305, USA
| | - Andrew Arsham
- Biology, Bemidji State University, Bemidji, MN 56601, USA
| | - Daron Barnard
- Biology, Worcester State University, Worcester, MA 01602, USA
| | | | - James E. J. Bedard
- Biology, University of the Fraser Valley, Abbotsford, BC, V2S 7M8, Canada
| | - Indrani Bose
- Biology, Western Carolina University, Cullowhee, NC 28723, USA
| | | | - Martin G. Burg
- Biomedical Sciences and Cell & Molecular Biology, Grand Valley State University, Allendale, MI 49401, USA
| | | | - Paula Croonquist
- Biology, Anoka-Ramsey Community College, Coon Rapids, MN 55433, USA
| | - Chunguang Du
- Biology, Montclair State University, Montclair, NJ 07043, USA
| | | | - Heather Eisler
- Biology, University of the Cumberlands, Williamsburg, KY 40769, USA
| | - Matthew A. Escobar
- Biological Sciences, California State University San Marcos, CA 92096, USA
| | | | - Emily Furbee
- Biology, Washington and Jefferson College, Washington, PA 15301, USA
| | | | - Rivka L. Glaser
- Biological Sciences, Stevenson University, Owings Mills, MD 21117, USA
| | - Anya L. Goodman
- Chemistry and Biochemistry, California Polytechnic State University, San Luis Obispo, CA 93407, USA
| | - Yuying Gosser
- Student Research and Scholarship, City College CUNY, New York, NY 10031, USA
| | - Adam Haberman
- Biology, University of San Diego, San Diego, CA 92110, USA
| | | | - Shan Hays
- Biology, Western Colorado University, Gunnison, CO 81231, USA
| | - Carina E. Howell
- Biological Sciences, Lock Haven University, Lock Haven, PA 17745, USA
| | - Jennifer Jemc
- Biology, Loyola University Chicago, Chicago, IL 60660, USA
| | | | | | - Lisa Kadlec
- Biology, Wilkes University, Wilkes-Barre, PA 18766, USA
| | - Jacob D. Kagey
- Biology, University of Detroit Mercy, Detroit, MI 48221, USA
| | | | | | - S. Catherine Silver Key
- Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA
| | | | | | - Nighat P. Kokan
- Natural Sciences, Cardinal Stritch University, Milwaukee, WI 53217, USA
| | | | - Meg M. Laakso
- Biology, Eastern University, St. Davids, PA 19087, USA
| | - Judith Leatherman
- Biological Sciences, University of Northern Colorado, Greeley, CO 80639, USA
| | - Lindsey J. Long
- Biology, Oklahoma Christian University, Oklahoma City, OK 73136, USA
| | - Mollie Manier
- Biological Sciences, George Washington University, Washington, DC 20052, USA
| | | | - Luis F. Matos
- Biology, Eastern Washington University, Cheney, WA 99004, USA
| | - Amie Jo McClellan
- Science and Mathematics, Bennington College, Bennington, VT 05201, USA
| | - Gerard McNeil
- Biology, York College / CUNY, Jamaica, NY 11451, USA
| | - Evan Merkhofer
- Natural Sciences, Mount Saint Mary College, Newbergh, NY 12550, USA
| | - Vida Mingo
- Biology, Columbia College, Columbia, SC 29203, USA
| | - Hemlata Mistry
- Biology and Biochemistry, Widener University, Chester, PA 19013, USA
| | | | | | - Debaditya Mukhopadhyay
- Molecular Biology, Biochemistry, and Bioinformatics, Towson University, Towson, MD 21252, USA
| | | | - Alexis Nagengast
- Chemistry and Biochemistry, Widener University, Chester, PA 19013, USA
| | | | - Don Paetkau
- Biology, Saint Mary’s College, Notre Dame, IN 46556, USA
| | | | - Susan Parrish
- Biology, McDaniel College, Westminster, MD 21157, USA
| | - Mary Lai Preuss
- Biological Sciences, Webster University, St. Louis, MO 63119, USA
| | | | - Nicholas A. Pullen
- Biological Sciences, University of Northern Colorado, Greeley, CO 80639, USA
| | | | - Dennis Revie
- Biology, California Lutheran University, Thousand Oaks, CA 91360, USA
| | | | | | - Michael R. Rubin
- Biology, University of Puerto Rico at Cayey, Cayey, PR 00736, USA
| | | | | | - Maria Santisteban
- Biology, University of North Carolina at Pembroke, Pembroke, NC 28372, USA
| | | | | | | | - Karim A. Sharif
- Biology, Massasoit Community College, Brockton, MA 02302, USA
| | | | - Chiyedza Small
- Biology, Medgar Evers College, CUNY, Brooklyn, NY 11225, USA
| | - Mary Smith
- Biology, North Carolina A & T State University, Greensboro, NC 27411, USA
| | - Sheryl Smith
- Biology, Arcadia, University, Glenside, PA 19038, USA
| | - Rebecca Spokony
- Natural Sciences, Baruch College, CUNY, New York, NY 10010, USA
| | - Aparna Sreenivasan
- Biology, School of Natural Sciences, California State University, Monterey Bay, Seaside, CA 93950, USA
| | - Joyce Stamm
- Biology, University of Evansville, Evansville, IN 47722, USA
| | | | | | | | | | | | | | | | - Cindy Wolfe
- Biology, Kentucky Wesleyan College, Owensboro, KY 42301, USA
| | - James Youngblom
- Biological Sciences, California State University Stanislaus, Turlock, CA 95382, USA
| | - Brian Yowler
- Biology, Grove City College, Grove City, PA 16127, USA
| | - Leming Zhou
- Health Information Management, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Janie Brennan
- Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Jeremy Buhler
- Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Wilson Leung
- Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Laura K. Reed
- Biological Sciences, University of Alabama Tuscaloosa, AL 35487, USA
| | - Sarah C. R. Elgin
- Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| |
Collapse
|
20
|
Tello-Ruiz MK, Marco CF, Hsu FM, Khangura RS, Qiao P, Sapkota S, Stitzer MC, Wasikowski R, Wu H, Zhan J, Chougule K, Barone LC, Ghiban C, Muna D, Olson AC, Wang L, Ware D, Micklos DA. Double triage to identify poorly annotated genes in maize: The missing link in community curation. PLoS One 2019; 14:e0224086. [PMID: 31658277 PMCID: PMC6816542 DOI: 10.1371/journal.pone.0224086] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 10/05/2019] [Indexed: 02/02/2023] Open
Abstract
The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors–including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.
Collapse
Affiliation(s)
- Marcela K. Tello-Ruiz
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Department of Biological Sciences, State University of New York at Old Westbury, Old Westbury, New York, United States of America
| | - Cristina F. Marco
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- * E-mail:
| | - Fei-Man Hsu
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
| | - Rajdeep S. Khangura
- Department of Biochemistry, Purdue University, West Lafayette, Indiana, United States of America
| | - Pengfei Qiao
- Plant Biology Section, School of Integrative Plant Sciences, Cornell University, Ithaca, New York, United States of America
| | - Sirjan Sapkota
- Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Michelle C. Stitzer
- Department of Plant Sciences and Center for Population Biology, University of California Davis, Davis, California, United States of America
| | - Rachael Wasikowski
- Department of Biological Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Hao Wu
- Genetics, Development & Cell Biology Department, Iowa State University, Ames, Iowa, United States of America
| | - Junpeng Zhan
- School of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- Donald Danforth Plant Science Center, St. Louis, Missouri, United States of America
| | - Kapeel Chougule
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Lindsay C. Barone
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Cornel Ghiban
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Demitri Muna
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Andrew C. Olson
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Liya Wang
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Doreen Ware
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- USDA, Agricultural Research Service, Washington, D.C., United States of America
| | - David A. Micklos
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|