1
|
Espinoza JL, Phillips A, Prentice MB, Tan GS, Kamath PL, Lloyd KG, Dupont CL. Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing. Nucleic Acids Res 2024; 52:e63. [PMID: 38909293 DOI: 10.1093/nar/gkae528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/21/2024] [Accepted: 06/10/2024] [Indexed: 06/24/2024] Open
Abstract
The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA's versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible software suite that bridges the gap between genomics and biotechnological solutions.
Collapse
Affiliation(s)
- Josh L Espinoza
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Allan Phillips
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Melanie B Prentice
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
| | - Gene S Tan
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Pauline L Kamath
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
- Maine Center for Genetics in the Environment, University of Maine, Orono, ME 04469, USA
| | - Karen G Lloyd
- Microbiology Department, University of Tennessee, Knoxville, TN 37917, USA
| | - Chris L Dupont
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| |
Collapse
|
2
|
McEvoy SL, Meyer RS, Hasenstab-Lehman KE, Guilliams CM. The reference genome of an endangered Asteraceae, Deinandra increscens subsp. villosa, endemic to the Central Coast of California. G3 (BETHESDA, MD.) 2024; 14:jkae117. [PMID: 38845594 PMCID: PMC11304951 DOI: 10.1093/g3journal/jkae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/26/2024] [Indexed: 08/09/2024]
Abstract
We present a reference genome for the federally endangered Gaviota tarplant, Deinandra increscens subsp. villosa (Madiinae, Asteraceae), an annual herb endemic to the Central California coast. Generating PacBio HiFi, Oxford Nanopore Technologies, and Dovetail Omni-C data, we assembled a haploid consensus genome of 1.67 Gb as 28.7 K scaffolds with a scaffold N50 of 74.9 Mb. We annotated repeat content in 74.8% of the genome. Long terminal repeats (LTRs) covered 44.0% of the genome with Copia families predominant at 22.9% followed by Gypsy at 14.2%. Both Gypsy and Copia elements were common in ancestral peaks of LTRs, and the most abundant element was a Gypsy element containing nested Copia/Angela sequence similarity, reflecting a complex evolutionary history of repeat activity. Gene annotation produced 33,257 genes and 68,942 transcripts, of which 99% were functionally annotated. BUSCO scores for the annotated proteins were 96.0% complete of which 77.6% was single copy and 18.4% duplicates. Whole genome duplication synonymous mutation rates of Gaviota tarplant and sunflower (Helianthus annuus) shared peaks that correspond to the last Asteraceae polyploidization event and subsequent divergence from a common ancestor at ∼27 MYA. Regions of high-density tandem genes were identified, pointing to potentially important loci of environmental adaptation in this species.
Collapse
Affiliation(s)
- Susan L McEvoy
- Department of Conservation and Research, Santa Barbara Botanic Garden, Santa Barbara, CA 93105, USA
| | - Rachel S Meyer
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - C Matt Guilliams
- Department of Conservation and Research, Santa Barbara Botanic Garden, Santa Barbara, CA 93105, USA
| |
Collapse
|
3
|
Chen L, Li C, Li B, Zhou X, Bai Y, Zou X, Zhou Z, He Q, Chen B, Wang M, Xue Y, Jiang Z, Feng J, Zhou T, Liu Z, Xu P. Evolutionary divergence of subgenomes in common carp provides insights into speciation and allopolyploid success. FUNDAMENTAL RESEARCH 2024; 4:589-602. [PMID: 38933191 PMCID: PMC11197550 DOI: 10.1016/j.fmre.2023.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 06/28/2024] Open
Abstract
Hybridization and polyploidization have made great contributions to speciation, heterosis, and agricultural production within plants, but there is still limited understanding and utilization in animals. Subgenome structure and expression reorganization and cooperation post hybridization and polyploidization are essential for speciation and allopolyploid success. However, the mechanisms have not yet been comprehensively assessed in animals. Here, we produced a high-fidelity reference genome sequence for common carp, a typical allotetraploid fish species cultured worldwide. This genome enabled in-depth analysis of the evolution of subgenome architecture and expression responses. Most genes were expressed with subgenome biases, with a trend of transition from the expression of subgenome A during the early stages to that of subgenome B during the late stages of embryonic development. While subgenome A evolved more rapidly, subgenome B contributed to a greater level of expression during development and under stressful conditions. Stable dominant patterns for homoeologous gene pairs both during development and under thermal stress suggest a potential fixed heterosis in the allotetraploid genome. Preferentially expressing either copy of a homoeologous gene at higher levels to confer development and response to stress indicates the dominant effect of heterosis. The plasticity of subgenomes and their shifting of dominant expression during early development, and in response to stressful conditions, provide novel insights into the molecular basis of the successful speciation, evolution, and heterosis of the allotetraploid common carp.
Collapse
Affiliation(s)
- Lin Chen
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Chengyu Li
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Bijun Li
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Xiaofan Zhou
- Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China
| | - Yulin Bai
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Xiaoqing Zou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Zhixiong Zhou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Qian He
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Baohua Chen
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Mei Wang
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Yaguo Xue
- College of Fisheries, Henan Normal University, Xinxiang 453007, China
| | - Zhou Jiang
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Jianxin Feng
- Henan Academy of Fishery Science, Zhengzhou 450044, China
| | - Tao Zhou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Zhanjiang Liu
- Department of Biology, College of Arts and Sciences, Syracuse University, Syracuse 13244, USA
| | - Peng Xu
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
4
|
Roces V, Guerrero S, Álvarez A, Pascual J, Meijón M. PlantFUNCO: Integrative Functional Genomics Database Reveals Clues into Duplicates Divergence Evolution. Mol Biol Evol 2024; 41:msae042. [PMID: 38411627 PMCID: PMC10917205 DOI: 10.1093/molbev/msae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 02/08/2024] [Accepted: 02/16/2024] [Indexed: 02/28/2024] Open
Abstract
Evolutionary epigenomics and, more generally, evolutionary functional genomics, are emerging fields that study how non-DNA-encoded alterations in gene expression regulation are an important form of plasticity and adaptation. Previous evidence analyzing plants' comparative functional genomics has mostly focused on comparing same assay-matched experiments, missing the power of heterogeneous datasets for conservation inference. To fill this gap, we developed PlantFUN(ctional)CO(nservation) database, which is constituted by several tools and two main resources: interspecies chromatin states and functional genomics conservation scores, presented and analyzed in this work for three well-established plant models (Arabidopsis thaliana, Oryza sativa, and Zea mays). Overall, PlantFUNCO elucidated evolutionary information in terms of cross-species functional agreement. Therefore, providing a new complementary comparative-genomics source for assessing evolutionary studies. To illustrate the potential applications of this database, we replicated two previously published models predicting genetic redundancy in A. thaliana and found that chromatin states are a determinant of paralogs degree of functional divergence. These predictions were validated based on the phenotypes of mitochondrial alternative oxidase knockout mutants under two different stressors. Taking all the above into account, PlantFUNCO aim to leverage data diversity and extrapolate molecular mechanisms findings from different model organisms to determine the extent of functional conservation, thus, deepening our understanding of how plants epigenome and functional noncoding genome have evolved. PlantFUNCO is available at https://rocesv.github.io/PlantFUNCO.
Collapse
Affiliation(s)
- Víctor Roces
- Plant Physiology, Department of Organisms and Systems Biology, Faculty of Biology and Biotechnology Institute of Asturias, University of Oviedo, Asturias, Spain
| | - Sara Guerrero
- Plant Physiology, Department of Organisms and Systems Biology, Faculty of Biology and Biotechnology Institute of Asturias, University of Oviedo, Asturias, Spain
| | - Ana Álvarez
- Plant Physiology, Department of Organisms and Systems Biology, Faculty of Biology and Biotechnology Institute of Asturias, University of Oviedo, Asturias, Spain
| | - Jesús Pascual
- Plant Physiology, Department of Organisms and Systems Biology, Faculty of Biology and Biotechnology Institute of Asturias, University of Oviedo, Asturias, Spain
| | - Mónica Meijón
- Plant Physiology, Department of Organisms and Systems Biology, Faculty of Biology and Biotechnology Institute of Asturias, University of Oviedo, Asturias, Spain
| |
Collapse
|
5
|
Song B, Buckler ES, Stitzer MC. New whole-genome alignment tools are needed for tapping into plant diversity. TRENDS IN PLANT SCIENCE 2024; 29:355-369. [PMID: 37749022 DOI: 10.1016/j.tplants.2023.08.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/19/2023] [Accepted: 08/23/2023] [Indexed: 09/27/2023]
Abstract
Genome alignment is one of the most foundational methods for genome sequence studies. With rapid advances in sequencing and assembly technologies, these newly assembled genomes present challenges for alignment tools to meet the increased complexity and scale. Plant genome alignment is technologically challenging because of frequent whole-genome duplications (WGDs) as well as chromosome rearrangements and fractionation, high nucleotide diversity, widespread structural variation, and high transposable element (TE) activity causing large proportions of repeat elements. We summarize classical pairwise and multiple genome alignment (MGA) methods, and highlight techniques that are widely used or are being developed by the plant research community. We also outline the remaining challenges for precise genome alignment and the interpretation of alignment results in plants.
Collapse
Affiliation(s)
- Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong 261325, China; Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA; Agricultural Research Service, United States Department of Agriculture, Ithaca, NY 14853, USA
| | - Michelle C Stitzer
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
6
|
Abdelwahab O, Belzile F, Torkamaneh D. Performance analysis of conventional and AI-based variant callers using short and long reads. BMC Bioinformatics 2023; 24:472. [PMID: 38097928 PMCID: PMC10720095 DOI: 10.1186/s12859-023-05596-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. RESULTS In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. CONCLUSION This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
Collapse
Affiliation(s)
- Omar Abdelwahab
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
- Institut intelligence et données (IID), Université Laval, Québec, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada.
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada.
- Institut intelligence et données (IID), Université Laval, Québec, Canada.
| |
Collapse
|
7
|
White LC. Shallow sequencing can mislead when evaluating hybridization capture methods. CONSERV GENET RESOUR 2023. [DOI: 10.1007/s12686-023-01298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
8
|
Li LZ, Xu ZG, Chang TG, Wang L, Kang H, Zhai D, Zhang LY, Zhang P, Liu H, Zhu XG, Wang JW. Common evolutionary trajectory of short life-cycle in Brassicaceae ruderal weeds. Nat Commun 2023; 14:290. [PMID: 36653415 PMCID: PMC9849336 DOI: 10.1038/s41467-023-35966-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 01/10/2023] [Indexed: 01/19/2023] Open
Abstract
Weed species are detrimental to crop yield. An understanding of how weeds originate and adapt to field environments is needed for successful crop management and reduction of herbicide use. Although early flowering is one of the weed trait syndromes that enable ruderal weeds to overcome frequent disturbances, the underlying genetic basis is poorly understood. Here, we establish Cardamine occulta as a model to study weed ruderality. By genome assembly and QTL mapping, we identify impairment of the vernalization response regulator gene FLC and a subsequent dominant mutation in the blue-light receptor gene CRY2 as genetic drivers for the establishment of short life cycle in ruderal weeds. Population genomics study further suggests that the mutations in these two genes enable individuals to overcome human disturbances through early deposition of seeds into the soil seed bank and quickly dominate local populations, thereby facilitating their spread in East China. Notably, functionally equivalent dominant mutations in CRY2 are shared by another weed species, Rorippa palustris, suggesting a common evolutionary trajectory of early flowering in ruderal weeds in Brassicaceae.
Collapse
Affiliation(s)
- Ling-Zi Li
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Zhou-Geng Xu
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
- University of Chinese Academy of Sciences, Shanghai, 200032, China
| | - Tian-Gen Chang
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Long Wang
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Heng Kang
- Department of Computer Science and Technology, Nanjing University, Nanjing, 210093, China
| | - Dong Zhai
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
- University of Chinese Academy of Sciences, Shanghai, 200032, China
| | - Lu-Yi Zhang
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
- University of Chinese Academy of Sciences, Shanghai, 200032, China
| | - Peng Zhang
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Hongtao Liu
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Xin-Guang Zhu
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics (NKLPMG), CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai, 200032, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
9
|
Henniges MC, Johnston E, Pellicer J, Hidalgo O, Bennett MD, Leitch IJ. The Plant DNA C-Values Database: A One-Stop Shop for Plant Genome Size Data. Methods Mol Biol 2023; 2703:111-122. [PMID: 37646941 DOI: 10.1007/978-1-0716-3389-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Genome size is a plant character with far-reaching implications, ranging from impacts on the financial and computing feasibility of sequencing and assembling genomes all the way to influencing the very ecology and evolution of species. The increasing recognition of the role of genome size in plant science has led to a rising demand for comprehensive and easily accessible sources of genome size data. The Plant DNA C-values database has established itself as a trusted and widely used central hub for users needing to access available plant genome size data, complemented with related cytogenetic (ploidy level) and karyological (chromosome number) information where available. Since its inception in 2001, the database has undergone six major updates to incorporate newly available genome size information, leading to the most recent release (Release 7.1), which comprises data for 12,273 species across all the major land plant and some algal lineages. Here we describe how to use the database efficiently, making use of its different query and filtering settings.
Collapse
Affiliation(s)
- Marie C Henniges
- Royal Botanic Gardens, Kew, Richmond, UK
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | | | - Jaume Pellicer
- Royal Botanic Gardens, Kew, Richmond, UK
- Institut Botànic de Barcelona, IBB (CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | - Oriane Hidalgo
- Royal Botanic Gardens, Kew, Richmond, UK
- Institut Botànic de Barcelona, IBB (CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | | | | |
Collapse
|
10
|
Russo A, Mayjonade B, Frei D, Potente G, Kellenberger RT, Frachon L, Copetti D, Studer B, Frey JE, Grossniklaus U, Schlüter PM. Low-Input High-Molecular-Weight DNA Extraction for Long-Read Sequencing From Plants of Diverse Families. FRONTIERS IN PLANT SCIENCE 2022; 13:883897. [PMID: 35665166 PMCID: PMC9161206 DOI: 10.3389/fpls.2022.883897] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/21/2022] [Indexed: 05/16/2023]
Abstract
Long-read DNA sequencing technologies require high molecular weight (HMW) DNA of adequate purity and integrity, which can be difficult to isolate from plant material. Plant leaves usually contain high levels of carbohydrates and secondary metabolites that can impact DNA purity, affecting downstream applications. Several protocols and kits are available for HMW DNA extraction, but they usually require a high amount of input material and often lead to substantial DNA fragmentation, making sequencing suboptimal in terms of read length and data yield. We here describe a protocol for plant HMW DNA extraction from low input material (0.1 g) which is easy to follow and quick (2.5 h). This method successfully enabled us to extract HMW from four species from different families (Orchidaceae, Poaceae, Brassicaceae, Asteraceae). In the case of recalcitrant species, we show that an additional purification step is sufficient to deliver a clean DNA sample. We demonstrate the suitability of our protocol for long-read sequencing on the Oxford Nanopore Technologies PromethION® platform, with and without the use of a short fragment depletion kit.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Plant and Microbial Biology and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
- Department of Plant Evolutionary Biology, Institute of Biology, University of Hohenheim, Stuttgart, Germany
- Department of Systematic and Evolutionary Botany and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
| | - Baptiste Mayjonade
- Laboratoire des Interactions Plantes Microbes Environnement (LIPME), INRAE, Toulouse, France
| | - Daniel Frei
- Department of Method Development and Analytics, Agroscope, Wädenswil, Switzerland
| | - Giacomo Potente
- Department of Systematic and Evolutionary Botany and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
| | | | - Léa Frachon
- Department of Systematic and Evolutionary Botany and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
| | - Dario Copetti
- Institute of Agricultural Sciences and Zurich-Basel Plant Science Centre, ETH Zürich, Zurich, Switzerland
| | - Bruno Studer
- Institute of Agricultural Sciences and Zurich-Basel Plant Science Centre, ETH Zürich, Zurich, Switzerland
| | - Jürg E. Frey
- Department of Method Development and Analytics, Agroscope, Wädenswil, Switzerland
| | - Ueli Grossniklaus
- Department of Plant and Microbial Biology and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
| | - Philipp M. Schlüter
- Department of Plant Evolutionary Biology, Institute of Biology, University of Hohenheim, Stuttgart, Germany
- Department of Systematic and Evolutionary Botany and Zurich-Basel Plant Science Centre, University of Zurich, Zurich, Switzerland
| |
Collapse
|
11
|
Amerifar S, Norouzi M, Ghandi M. A tool for feature extraction from biological sequences. Brief Bioinform 2022; 23:6563937. [PMID: 35383372 DOI: 10.1093/bib/bbac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/12/2022] Open
Abstract
With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.
Collapse
Affiliation(s)
- Sare Amerifar
- Bioinformatics, Tatbiat Modares University, Jalal Al Ahmad, 14115-111, Tehran, Iran
| | - Mahammad Norouzi
- Computer Science, Technical University of Darmstadt, Hochschulstr. 1, 64293, Hesse, Germany
| | - Mahmoud Ghandi
- Bioinformatics, Monte Rosa Therapeutics, Summer Street, 02210, Boston, United States
| |
Collapse
|
12
|
Cheng A, Harikrishna JA, Redwood CS, Lit LC, Nath SK, Chua KH. Genetics Matters: Voyaging from the Past into the Future of Humanity and Sustainability. Int J Mol Sci 2022; 23:ijms23073976. [PMID: 35409335 PMCID: PMC8999725 DOI: 10.3390/ijms23073976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open
Abstract
The understanding of how genetic information may be inherited through generations was established by Gregor Mendel in the 1860s when he developed the fundamental principles of inheritance. The science of genetics, however, began to flourish only during the mid-1940s when DNA was identified as the carrier of genetic information. The world has since then witnessed rapid development of genetic technologies, with the latest being genome-editing tools, which have revolutionized fields from medicine to agriculture. This review walks through the historical timeline of genetics research and deliberates how this discipline might furnish a sustainable future for humanity.
Collapse
Affiliation(s)
- Acga Cheng
- Institute of Biological Science, Faculty of Science, Universiti Malaya, Kuala Lumpur 50603, Malaysia; (A.C.); (J.A.H.)
| | - Jennifer Ann Harikrishna
- Institute of Biological Science, Faculty of Science, Universiti Malaya, Kuala Lumpur 50603, Malaysia; (A.C.); (J.A.H.)
- Centre for Research in Biotechnology for Agriculture, University of Malaya, Kuala Lumpur 50603, Malaysia
| | - Charles S. Redwood
- Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK;
| | - Lei Cheng Lit
- Department of Physiology, Faculty of Medicine, Universiti Malaya, Kuala Lumpur 50603, Malaysia;
| | - Swapan K. Nath
- Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
- Correspondence: (S.K.N.); (K.H.C.)
| | - Kek Heng Chua
- Department of Biomedical Science, Faculty of Medicine, Universiti Malaya, Kuala Lumpur 50603, Malaysia
- Correspondence: (S.K.N.); (K.H.C.)
| |
Collapse
|
13
|
Sim SB, Corpuz RL, Simmonds TJ, Geib SM. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics 2022; 23:157. [PMID: 35193521 PMCID: PMC8864876 DOI: 10.1186/s12864-022-08375-1] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 02/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows. RESULTS Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies. CONCLUSIONS Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.
Collapse
Affiliation(s)
- Sheina B Sim
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA.
| | - Renee L Corpuz
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA
| | - Tyler J Simmonds
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA.,Oak Ridge Institute for Science and Education, Oak Ridge Associated Universities, Oak Ridge, TN, 37830, USA
| | - Scott M Geib
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA
| |
Collapse
|
14
|
Wambugu PW, Henry R. Supporting in situ conservation of the genetic diversity of crop wild relatives using genomic technologies. Mol Ecol 2022; 31:2207-2222. [PMID: 35170117 PMCID: PMC9303585 DOI: 10.1111/mec.16402] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 02/08/2022] [Accepted: 02/11/2022] [Indexed: 11/27/2022]
Abstract
The last decade has witnessed huge technological advances in genomics, particularly in DNA sequencing. Here, we review the actual and potential application of genomics in supporting in situ conservation of crop wild relatives (CWRs). In addition to helping in prioritization of protection of CWR taxa and in situ conservation sites, genome analysis is allowing the identification of novel alleles that need to be prioritized for conservation. Genomics is enabling the identification of potential sources of important adaptive traits that can guide the establishment or enrichment of in situ genetic reserves. Genomic tools also have the potential for developing a robust framework for monitoring and reporting genome‐based indicators of genetic diversity changes associated with factors such as land use or climate change. These tools have been demonstrated to have an important role in managing the conservation of populations, supporting sustainable access and utilization of CWR diversity, enhancing accelerated domestication of new crops and forensic genomics thus preventing misappropriation of genetic resources. Despite this great potential, many policy makers and conservation managers have failed to recognize and appreciate the need to accelerate the application of genomics to support the conservation and management of biodiversity in CWRs to underpin global food security. Funding and inadequate genomic expertise among conservation practitioners also remain major hindrances to the widespread application of genomics in conservation.
Collapse
Affiliation(s)
- Peterson W Wambugu
- Kenya Agricultural and Livestock Research Organization, Genetic Resources Research Institute, P.O. Box 30148, 00100, Nairobi, Kenya
| | - Robert Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, 4072, Australia.,ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, Brisbane, QLD, 4072, Australia
| |
Collapse
|
15
|
Penev L, Koureas D, Groom Q, Lanfear J, Agosti D, Casino A, Miller J, Arvanitidis C, Cochrane G, Hobern D, Banki O, Addink W, Kõljalg U, Copas K, Mergen P, Güntsch A, Benichou L, Benito Gonzalez Lopez J, Ruch P, Martin C, Barov B, Hristova K. Biodiversity Community Integrated Knowledge Library (BiCIKL). RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e81136] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BiCIKL is an European Union Horizon 2020 project that will initiate and build a new European starting community of key research infrastructures, establishing open science practices in the domain of biodiversity through provision of access to data, associated tools and services at each separate stage of and along the entire research cycle. BiCIKL will provide new methods and workflows for an integrated access to harvesting, liberating, linking, accessing and re-using of subarticle-level data (specimens, material citations, samples, sequences, taxonomic names, taxonomic treatments, figures, tables) extracted from literature. BiCIKL will provide for the first time access and tools for seamless linking and usage tracking of data along the line: specimens > sequences > species > analytics > publications > biodiversity knowledge graph > re-use.
Collapse
|
16
|
Kress WJ, Soltis DE, Kersey PJ, Wegrzyn JL, Leebens-Mack JH, Gostel MR, Liu X, Soltis PS. Green plant genomes: What we know in an era of rapidly expanding opportunities. Proc Natl Acad Sci U S A 2022; 119:e2115640118. [PMID: 35042803 PMCID: PMC8795535 DOI: 10.1073/pnas.2115640118] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future.
Collapse
Affiliation(s)
- W John Kress
- National Museum of Natural History, Smithsonian Institution, Department of Botany, Washington, DC 20013-7012;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
- Arnold Arboretum, Harvard University, Boston, MA 02130
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Paul J Kersey
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, United Kingdom
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, Institute for Systems Genomics: Computational Biology Core, University of Connecticut, Storrs, CT 06269-3214
| | - James H Leebens-Mack
- Department of Plant Biology, 2101 Miller Plant Sciences, University of Georgia, Athens, GA 30602-7271
| | - Morgan R Gostel
- Botanical Research Institute of Texas, Fort Worth, TX 76107-3400
| | - Xin Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518120, China
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
| |
Collapse
|
17
|
Song B, Marco-Sola S, Moreto M, Johnson L, Buckler ES, Stitzer MC. AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc Natl Acad Sci U S A 2022; 119:e2113075119. [PMID: 34934012 PMCID: PMC8740769 DOI: 10.1073/pnas.2113075119] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 12/04/2022] Open
Abstract
Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication-informed collinear anchor identification between genomes and performs base pair-resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor-binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.
Collapse
Affiliation(s)
- Baoxing Song
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853;
| | - Santiago Marco-Sola
- Department of Computer Sciences, Barcelona Supercomputing Center, Barcelona 08034, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona 08193, Spain
| | - Miquel Moreto
- Department of Computer Sciences, Barcelona Supercomputing Center, Barcelona 08034, Spain
- Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona 08034, Spain
| | - Lynn Johnson
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853;
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853
- Agricultural Research Service, US Department of Agriculture, Ithaca, NY 14853
| | - Michelle C Stitzer
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853;
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853
| |
Collapse
|
18
|
Marks RA, Hotaling S, Frandsen PB, VanBuren R. Representation and participation across 20 years of plant genome sequencing. NATURE PLANTS 2021; 7:1571-1578. [PMID: 34845350 PMCID: PMC8677620 DOI: 10.1038/s41477-021-01031-8] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/27/2021] [Indexed: 05/22/2023]
Abstract
The field of plant genome sequencing has grown rapidly in the past 20 years, leading to increases in the quantity and quality of publicly available genomic resources. The growing wealth of genomic data from an increasingly diverse set of taxa provides unprecedented potential to better understand the genome biology and evolution of land plants. Here we provide a contemporary view of land plant genomics, including analyses on assembly quality, taxonomic distribution of sequenced species and national participation. We show that assembly quality has increased dramatically in recent years, that substantial taxonomic gaps exist and that the field has been dominated by affluent nations in the Global North and China, despite a wide geographic distribution of study species. We identify numerous disconnects between the native range of focal species and the national affiliation of the researchers studying them, which we argue are rooted in colonialism-both past and present. Luckily, falling sequencing costs, widening availability of analytical tools and an increasingly connected scientific community provide key opportunities to improve existing assemblies, fill sampling gaps and empower a more global plant genomics community.
Collapse
Affiliation(s)
- Rose A Marks
- Department of Horticulture, Michigan State University, East Lansing, MI, USA.
- Plant Resilience Institute, Michigan State University, East Lansing, MI, USA.
- Department of Molecular and Cell Biology, University of Cape Town, Rondebosch, South Africa.
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Paul B Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
- Data Science Lab, Smithsonian Institution, Washington, DC, USA
| | - Robert VanBuren
- Department of Horticulture, Michigan State University, East Lansing, MI, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
19
|
Tan MH, Loke S, Croft LJ, Gleason FH, Lange L, Pilgaard B, Trevathan-Tackett SM. First Genome of Labyrinthula sp., an Opportunistic Seagrass Pathogen, Reveals Novel Insight into Marine Protist Phylogeny, Ecology and CAZyme Cell-Wall Degradation. MICROBIAL ECOLOGY 2021; 82:498-511. [PMID: 33410934 DOI: 10.1007/s00248-020-01647-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 11/15/2020] [Indexed: 06/12/2023]
Abstract
Labyrinthula spp. are saprobic, marine protists that also act as opportunistic pathogens and are the causative agents of seagrass wasting disease (SWD). Despite the threat of local- and large-scale SWD outbreaks, there are currently gaps in our understanding of the drivers of SWD, particularly surrounding Labyrinthula spp. virulence and ecology. Given these uncertainties, we investigated the Labyrinthula genus from a novel genomic perspective by presenting the first draft genome and predicted proteome of a pathogenic isolate Labyrinthula SR_Ha_C, generated from a hybrid assembly of Nanopore and Illumina sequences. Phylogenetic and cross-phyla comparisons revealed insights into the evolutionary history of Stramenopiles. Genome annotation showed evidence of glideosome-type machinery and an apicoplast protein typically found in protist pathogens and parasites. Proteins involved in Labyrinthula SR_Ha_C's actin-myosin mode of transport, as well as carbohydrate degradation were also prevalent. Further, CAZyme functional predictions revealed a repertoire of enzymes involved in breakdown of cell-wall and carbohydrate storage compounds common to seagrasses. The relatively low number of CAZymes annotated from the genome of Labyrinthula SR_Ha_C compared to other Labyrinthulea species may reflect the conservative annotation parameters, a specialized substrate affinity and the scarcity of characterized protist enzymes. Inherently, there is high probability for finding both unique and novel enzymes from Labyrinthula spp. This study provides resources for further exploration of Labyrinthula spp. ecology and evolution, and will hopefully be the catalyst for new hypothesis-driven SWD research revealing more details of molecular interactions between the Labyrinthula genus and its host substrate.
Collapse
Affiliation(s)
- Mun Hua Tan
- Centre of Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia
- Deakin Genomics Centre, Deakin University, Geelong, Victoria, Australia
- School of BioSciences, Bio21 Institute, University of Melbourne, Parkville, Victoria, Australia
- Department of Microbiology and Immunology, University of Melbourne, Bio21 Institute, Melbourne, Victoria, Australia
| | - Stella Loke
- Centre of Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia
- Deakin Genomics Centre, Deakin University, Geelong, Victoria, Australia
| | - Laurence J Croft
- Centre of Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia
- Deakin Genomics Centre, Deakin University, Geelong, Victoria, Australia
| | - Frank H Gleason
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Lene Lange
- BioEconomy, Research & Advisory, Valby, Copenhagen, Denmark
| | - Bo Pilgaard
- Protein Chemistry and Enzyme Technology, Department of Bioengineering, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Stacey M Trevathan-Tackett
- Centre of Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria, Australia.
| |
Collapse
|
20
|
Guo X, Chen F, Gao F, Li L, Liu K, You L, Hua C, Yang F, Liu W, Peng C, Wang L, Yang X, Zhou F, Tong J, Cai J, Li Z, Wan B, Zhang L, Yang T, Zhang M, Yang L, Yang Y, Zeng W, Wang B, Wei X, Xu X. CNSA: a data repository for archiving omics data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5875523. [PMID: 32705130 PMCID: PMC7377928 DOI: 10.1093/database/baaa055] [Citation(s) in RCA: 206] [Impact Index Per Article: 68.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 05/31/2020] [Accepted: 06/25/2020] [Indexed: 12/16/2022]
Abstract
With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-omics data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its further analyzed results which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly and Variation at present. Moreover, CNSA has created a correlation model of living samples, sample information and analytical data on some projects. Both living samples and analytical data are directly correlated with the sample information. From either one, information or data of the other two can be obtained, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for storing, managing and sharing of omics data. We will continue to improve the data standards and provide free access to open-data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/.
Collapse
Affiliation(s)
- Xueqin Guo
- China National GeneBank, Shenzhen 518120, China
| | | | - Fei Gao
- China National GeneBank, Shenzhen 518120, China
| | - Ling Li
- China National GeneBank, Shenzhen 518120, China
| | - Ke Liu
- China National GeneBank, Shenzhen 518120, China
| | - Lijin You
- China National GeneBank, Shenzhen 518120, China
| | - Cong Hua
- China National GeneBank, Shenzhen 518120, China
| | - Fan Yang
- China National GeneBank, Shenzhen 518120, China
| | | | | | - Lina Wang
- China National GeneBank, Shenzhen 518120, China
| | | | - Feiyu Zhou
- China National GeneBank, Shenzhen 518120, China
| | - Jiawei Tong
- China National GeneBank, Shenzhen 518120, China
| | - Jia Cai
- China National GeneBank, Shenzhen 518120, China
| | - Zhiyong Li
- China National GeneBank, Shenzhen 518120, China
| | - Bo Wan
- China National GeneBank, Shenzhen 518120, China
| | - Lei Zhang
- China National GeneBank, Shenzhen 518120, China
| | - Tao Yang
- China National GeneBank, Shenzhen 518120, China
| | | | - Linlin Yang
- China National GeneBank, Shenzhen 518120, China
| | - Yawen Yang
- China National GeneBank, Shenzhen 518120, China
| | - Wenjun Zeng
- China National GeneBank, Shenzhen 518120, China
| | - Bo Wang
- China National GeneBank, Shenzhen 518120, China
| | | | - Xun Xu
- China National GeneBank, Shenzhen 518120, China.,BGI-Shenzhen, Shenzhen 518083, China.,Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen 518120, China
| |
Collapse
|
21
|
Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021; 18:366-368. [PMID: 33828273 PMCID: PMC8026399 DOI: 10.1038/s41592-021-01101-x] [Citation(s) in RCA: 1165] [Impact Index Per Article: 388.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 02/22/2021] [Indexed: 12/05/2022]
Abstract
We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP. An updated version of DIAMOND uses improved algorithmic procedures and a customized high-performance computing framework to make seemingly prohibitive large-scale protein sequence alignments feasible.
Collapse
|
22
|
Rossetto M, Yap JYS, Lemmon J, Bain D, Bragg J, Hogbin P, Gallagher R, Rutherford S, Summerell B, Wilson TC. A conservation genomics workflow to guide practical management actions. Glob Ecol Conserv 2021. [DOI: 10.1016/j.gecco.2021.e01492] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
23
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
24
|
Tian Z, Wang JW, Li J, Han B. Designing future crops: challenges and strategies for sustainable agriculture. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 105:1165-1178. [PMID: 33258137 DOI: 10.1111/tpj.15107] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 11/22/2020] [Accepted: 11/26/2020] [Indexed: 05/26/2023]
Abstract
Crop production is facing unprecedented challenges. Despite the fact that the food supply has significantly increased over the past half-century, ~8.9 and 14.3% people are still suffering from hunger and malnutrition, respectively. Agricultural environments are continuously threatened by a booming world population, a shortage of arable land, and rapid changes in climate. To ensure food and ecosystem security, there is a need to design future crops for sustainable agriculture development by maximizing net production and minimalizing undesirable effects on the environment. The future crops design projects, recently launched by the National Natural Science Foundation of China and Chinese Academy of Sciences (CAS), aim to develop a roadmap for rapid design of customized future crops using cutting-edge technologies in the Breeding 4.0 era. In this perspective, we first introduce the background and missions of these projects. We then outline strategies to design future crops, such as improvement of current well-cultivated crops, de novo domestication of wild species and redomestication of current cultivated crops. We further discuss how these ambitious goals can be achieved by the recent development of new integrative omics tools, advanced genome-editing tools and synthetic biology approaches. Finally, we summarize related opportunities and challenges in these projects.
Collapse
Affiliation(s)
- Zhixi Tian
- State Key Laboratory of Plant Cell and Chromosome Engineering, Innovation Academy for Seed Design, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, 200032, China
- ShanghaiTech University, Shanghai, 200031, China
| | - Jiayang Li
- University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Plant Genomics, and National Center for Plant Gene Research (Beijing), Innovation Academy for Seed Design, Institute of Genetics and Developmental Biology Chinese Academy of Sciences, Beijing, 100101, China
| | - Bin Han
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, 200032, China
- ShanghaiTech University, Shanghai, 200031, China
- National Center for Gene Research, Shanghai, 200233, China
| |
Collapse
|
25
|
Besse P. Guidelines for the Choice of Sequences for Molecular Plant Taxonomy. Methods Mol Biol 2021; 2222:39-55. [PMID: 33301086 DOI: 10.1007/978-1-0716-0997-2_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This chapter presents an overview of the major plant DNA sequences and molecular methods available for plant taxonomy. Guidelines are provided for the choice of sequences and methods to be used, based on the DNA compartment (nuclear, chloroplastic, mitochondrial), evolutionary mechanisms, and the level of taxonomic differentiation of the plants under survey.
Collapse
Affiliation(s)
- Pascale Besse
- UMR PVBMT, Universite de la Reunion, St Pierre, Réunion, France.
| |
Collapse
|
26
|
Fernie AR. Decoding indigo: the chromosome-scale genome of Strobilanthes cusia a highly pigmented plant important to diverse ethnic cultures in Asia. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:861-863. [PMID: 33217084 DOI: 10.1111/tpj.15016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
|
27
|
Henkhaus N, Bartlett M, Gang D, Grumet R, Jordon‐Thaden I, Lorence A, Lyons E, Miller S, Murray S, Nelson A, Specht C, Tyler B, Wentworth T, Ackerly D, Baltensperger D, Benfey P, Birchler J, Chellamma S, Crowder R, Donoghue M, Dundore‐Arias JP, Fletcher J, Fraser V, Gillespie K, Guralnick L, Haswell E, Hunter M, Kaeppler S, Kepinski S, Li F, Mackenzie S, McDade L, Min Y, Nemhauser J, Pearson B, Petracek P, Rogers K, Sakai A, Sickler D, Taylor C, Wayne L, Wendroth O, Zapata F, Stern D. Plant science decadal vision 2020-2030: Reimagining the potential of plants for a healthy and sustainable future. PLANT DIRECT 2020; 4:e00252. [PMID: 32904806 PMCID: PMC7459197 DOI: 10.1002/pld3.252] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 07/15/2020] [Indexed: 05/17/2023]
Abstract
Plants, and the biological systems around them, are key to the future health of the planet and its inhabitants. The Plant Science Decadal Vision 2020-2030 frames our ability to perform vital and far-reaching research in plant systems sciences, essential to how we value participants and apply emerging technologies. We outline a comprehensive vision for addressing some of our most pressing global problems through discovery, practical applications, and education. The Decadal Vision was developed by the participants at the Plant Summit 2019, a community event organized by the Plant Science Research Network. The Decadal Vision describes a holistic vision for the next decade of plant science that blends recommendations for research, people, and technology. Going beyond discoveries and applications, we, the plant science community, must implement bold, innovative changes to research cultures and training paradigms in this era of automation, virtualization, and the looming shadow of climate change. Our vision and hopes for the next decade are encapsulated in the phrase reimagining the potential of plants for a healthy and sustainable future. The Decadal Vision recognizes the vital intersection of human and scientific elements and demands an integrated implementation of strategies for research (Goals 1-4), people (Goals 5 and 6), and technology (Goals 7 and 8). This report is intended to help inspire and guide the research community, scientific societies, federal funding agencies, private philanthropies, corporations, educators, entrepreneurs, and early career researchers over the next 10 years. The research encompass experimental and computational approaches to understanding and predicting ecosystem behavior; novel production systems for food, feed, and fiber with greater crop diversity, efficiency, productivity, and resilience that improve ecosystem health; approaches to realize the potential for advances in nutrition, discovery and engineering of plant-based medicines, and "green infrastructure." Launching the Transparent Plant will use experimental and computational approaches to break down the phytobiome into a "parts store" that supports tinkering and supports query, prediction, and rapid-response problem solving. Equity, diversity, and inclusion are indispensable cornerstones of realizing our vision. We make recommendations around funding and systems that support customized professional development. Plant systems are frequently taken for granted therefore we make recommendations to improve plant awareness and community science programs to increase understanding of scientific research. We prioritize emerging technologies, focusing on non-invasive imaging, sensors, and plug-and-play portable lab technologies, coupled with enabling computational advances. Plant systems science will benefit from data management and future advances in automation, machine learning, natural language processing, and artificial intelligence-assisted data integration, pattern identification, and decision making. Implementation of this vision will transform plant systems science and ripple outwards through society and across the globe. Beyond deepening our biological understanding, we envision entirely new applications. We further anticipate a wave of diversification of plant systems practitioners while stimulating community engagement, underpinning increasing entrepreneurship. This surge of engagement and knowledge will help satisfy and stoke people's natural curiosity about the future, and their desire to prepare for it, as they seek fuller information about food, health, climate and ecological systems.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Andrew Nelson
- Boyce Thompson Institute for Plant ResearchIthacaNYUSA
| | | | - Brett Tyler
- Center for Genome Research and Biocomputing, and Department of Botany and Plant PathologyOregon State UniversityCorvallisArmenia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fay‐Wei Li
- Boyce Thompson Institute, and Plant Biology SectionCornell UniversityIthacaNYUSA
| | | | | | - Ya Min
- Harvard UniversitySeattleWAUSA
| | | | | | | | - Katie Rogers
- American Society of Plant BiologistsRockvilleMDUSA
| | | | | | | | | | | | | | - David Stern
- Boyce Thompson Institute for Plant ResearchIthacaNYUSA
| |
Collapse
|