1
|
Joe S, Park JL, Kim J, Kim S, Park JH, Yeo MK, Lee D, Yang JO, Kim SY. Comparison of structural variant callers for massive whole-genome sequence data. BMC Genomics 2024; 25:318. [PMID: 38549092 PMCID: PMC10976732 DOI: 10.1186/s12864-024-10239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.
Collapse
Grants
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
Collapse
Affiliation(s)
- Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jong-Lyul Park
- Aging Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Functional Genomics, University of Science and Technology (UST), 34113, Daejeon, Republic of Korea
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Sangok Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea
| | - Min-Kyung Yeo
- Department of Pathology, Chungnam National University School of Medicine, Daejeon, 35015, Republic of Korea
| | - Dongyoon Lee
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| | - Seon-Young Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| |
Collapse
|
2
|
Le MH, Morgan B, Lu MY, Moctezuma V, Burgos O, Huang JP. The genomes of Hercules beetles reveal putative adaptive loci and distinct demographic histories in pristine North American forests. Mol Ecol Resour 2024; 24:e13908. [PMID: 38063363 DOI: 10.1111/1755-0998.13908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/14/2023] [Accepted: 11/20/2023] [Indexed: 01/12/2024]
Abstract
Beetles, despite their remarkable biodiversity and a long history of research, remain lacking in reference genomes annotated with structural variations in loci of adaptive significance. We sequenced and assembled high-quality chromosome-level genomes of four Hercules beetles which exhibit divergence in male horn size and shape and body colouration. The four Hercules beetle genomes were assembled to 11 pseudo-chromosomes, where the three genomes assembled using Nanopore data (Dynastes grantii, D. hyllus and D. tityus) were mapped to the genome assembled using PacBio + Hi-C data (D. maya). We demonstrated a striking similarity in genome structure among the four species. This conservative genome structure may be attributed to our use of the D. maya assembly as the reference; however, it is worth noting that such a conservative genome structure is a recurring phenomenon among scarab beetles. We further identified homologues of nine and three candidate-gene families that may be associated with the evolution of horn structure and body colouration respectively. Structural variations in Scr and Ebony2 were detected and discussed for their putative impacts on generating morphological diversity in beetles. We also reconstructed the demographic histories of the four Hercules beetles using heterozygosity information from the diploid genomes. We found that the demographic histories of the beetles closely recapitulated historical changes in suitable forest habitats driven by climate shifts.
Collapse
Affiliation(s)
- My-Hanh Le
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Brett Morgan
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
- Smithsonian Environmental Research Center, Edgewater, Maryland, USA
| | - Mei-Yeh Lu
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Victor Moctezuma
- Centro Tlaxcala de Biología de la Conducta, Universidad Autónoma de Tlaxcala, Tlaxcala de Xicohténcatl, Tlaxcala, Mexico
| | - Oscar Burgos
- Centro de Investigaciones Biológicas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Jen-Pan Huang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
3
|
Bringloe TT, Parent GJ. Contrasting new and available reference genomes to highlight uncertainties in assemblies and areas for future improvement: an example with monodontid species. BMC Genomics 2023; 24:693. [PMID: 37985969 PMCID: PMC10659057 DOI: 10.1186/s12864-023-09779-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/31/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND Reference genomes provide a foundational framework for evolutionary investigations, ecological analysis, and conservation science, yet uncertainties in the assembly of reference genomes are difficult to assess, and by extension rarely quantified. Reference genomes for monodontid cetaceans span a wide spectrum of data types and analytical approaches, providing the context to derive broader insights related to discrepancies and regions of uncertainty in reference genome assembly. We generated three beluga (Delphinapterus leucas) and one narwhal (Monodon monoceros) reference genomes and contrasted these with published chromosomal scale assemblies for each species to quantify discrepancies associated with genome assemblies. RESULTS The new reference genomes achieved chromosomal scale assembly using a combination of PacBio long reads, Illumina short reads, and Hi-C scaffolding data. For beluga, we identified discrepancies in the order and orientation of contigs in 2.2-3.7% of the total genome depending on the pairwise comparison of references. In addition, unsupported higher order scaffolding was identified in published reference genomes. In contrast, we estimated 8.2% of the compared narwhal genomes featured discrepancies, with inversions being notably abundant (5.3%). Discrepancies were linked to repetitive elements in both species. CONCLUSIONS We provide several new reference genomes for beluga (Delphinapterus leucas), while highlighting potential avenues for improvements. In particular, additional layers of data providing information on ultra-long genomic distances are needed to resolve persistent errors in reference genome construction. The comparative analyses of monodontid reference genomes suggested that the three new reference genomes for beluga are more accurate compared to the currently published reference genome, but that the new narwhal genome is less accurate than one published. We also present a conceptual summary for improving the accuracy of reference genomes with relevance to end-user needs and how they relate to levels of assembly quality and uncertainty.
Collapse
Affiliation(s)
- Trevor T Bringloe
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| | - Geneviève J Parent
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| |
Collapse
|
4
|
Schelkunov MI. Mabs, a suite of tools for gene-informed genome assembly. BMC Bioinformatics 2023; 24:377. [PMID: 37794322 PMCID: PMC10548655 DOI: 10.1186/s12859-023-05499-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Despite constantly improving genome sequencing methods, error-free eukaryotic genome assembly has not yet been achieved. Among other kinds of problems of eukaryotic genome assembly are so-called "haplotypic duplications", which may manifest themselves as cases of alleles being mistakenly assembled as paralogues. Haplotypic duplications are dangerous because they create illusions of gene family expansions and, thus, may lead scientists to incorrect conclusions about genome evolution and functioning. RESULTS Here, I present Mabs, a suite of tools that serve as parameter optimizers of the popular genome assemblers Hifiasm and Flye. By optimizing the parameters of Hifiasm and Flye, Mabs tries to create genome assemblies with the genes assembled as accurately as possible. Tests on 6 eukaryotic genomes showed that in 6 out of 6 cases, Mabs created assemblies with more accurately assembled genes than those generated by Hifiasm and Flye when they were run with default parameters. When assemblies of Mabs, Hifiasm and Flye were postprocessed by a popular tool for haplotypic duplication removal, Purge_dups, genes were better assembled by Mabs in 5 out of 6 cases. CONCLUSIONS Mabs is useful for making high-quality genome assemblies. It is available at https://github.com/shelkmike/Mabs.
Collapse
|
5
|
Ramu P, Srivastava RK, Sanyal A, Fengler K, Cao J, Zhang Y, Nimkar M, Gerke J, Shreedharan S, Llaca V, May G, Peterson-Burch B, Lin H, King M, Das S, Bhupesh V, Mandaokar A, Maruthachalam K, Krishnamurthy P, Gandhi H, Rathore A, Gupta R, Chitikineni A, Bajaj P, Gupta SK, Satyavathi CT, Pandravada A, Varshney RK, Babu R. Improved pearl millet genomes representing the global heterotic pool offer a framework for molecular breeding applications. Commun Biol 2023; 6:902. [PMID: 37667032 PMCID: PMC10477261 DOI: 10.1038/s42003-023-05258-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 08/18/2023] [Indexed: 09/06/2023] Open
Abstract
High-quality reference genome assemblies, representative of global heterotic patterns, offer an ideal platform to accurately characterize and utilize genetic variation in the primary gene pool of hybrid crops. Here we report three platinum grade de-novo, near gap-free, chromosome-level reference genome assemblies from the active breeding germplasm in pearl millet with a high degree of contiguity, completeness, and accuracy. An improved Tift genome (Tift23D2B1-P1-P5) assembly has a contig N50 ~ 7,000-fold (126 Mb) compared to the previous version and better alignment in centromeric regions. Comparative genome analyses of these three lines clearly demonstrate a high level of collinearity and multiple structural variations, including inversions greater than 1 Mb. Differential genes in improved Tift genome are enriched for serine O-acetyltransferase and glycerol-3-phosphate metabolic process which play an important role in improving the nutritional quality of seed protein and disease resistance in plants, respectively. Multiple marker-trait associations are identified for a range of agronomic traits, including grain yield through genome-wide association study. Improved genome assemblies and marker resources developed in this study provide a comprehensive framework/platform for future applications such as marker-assisted selection of mono/oligogenic traits as well as whole-genome prediction and haplotype-based breeding of complex traits.
Collapse
Affiliation(s)
- Punna Ramu
- Corteva Agriscience, Hyderabad, Telangana, India
| | - Rakesh K Srivastava
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India.
| | | | | | - Jun Cao
- Corteva Agriscience, Johnston, IA, 50131, USA
| | - Yun Zhang
- Corteva Agriscience, Johnston, IA, 50131, USA
| | | | | | | | | | - Gregory May
- Corteva Agriscience, Johnston, IA, 50131, USA
| | | | - Haining Lin
- Corteva Agriscience, Johnston, IA, 50131, USA
- Moderna, 200 Technology Square, Cambridge, MA, 02139, USA
| | - Matthew King
- Corteva Agriscience, Johnston, IA, 50131, USA
- Natera Inc, San Carlos, CA, 94070, USA
| | - Sayan Das
- Corteva Agriscience, Hyderabad, Telangana, India
| | - Vaid Bhupesh
- Corteva Agriscience, Hyderabad, Telangana, India
| | | | | | | | - Harish Gandhi
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Abhishek Rathore
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
- International Maize and Wheat Improvement Center (CIMMYT), Hyderabad, India
| | - Rajeev Gupta
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
- Cereal Crops Research Unit, Edward T. Schafer Agricultural Research Center, USDA-ARS, Fargo, ND, 58102, USA
| | - Annapurna Chitikineni
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
- Centre for Crop & Food Innovation, State Agricultural Biotechnology Centre, Food Futures Institute, Murdoch University, Murdoch, WA, 6150, Australia
| | - Prasad Bajaj
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
| | - S K Gupta
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India
| | - C Tara Satyavathi
- Indian Council of Agricultural Research - All India Coordinated Research Project on Pearl Millet, Jodhpur, India
| | | | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, Telangana, India.
- Centre for Crop & Food Innovation, State Agricultural Biotechnology Centre, Food Futures Institute, Murdoch University, Murdoch, WA, 6150, Australia.
| | - Raman Babu
- Corteva Agriscience, Hyderabad, Telangana, India.
| |
Collapse
|
6
|
Lehle JD, McCarrey JR. Accelerating the alignment processing speed of the comprehensive end-to-end whole-genome bisulfite sequencing pipeline, wg-blimp. Biol Methods Protoc 2023; 8:bpad012. [PMID: 37431446 PMCID: PMC10329742 DOI: 10.1093/biomethods/bpad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/12/2023] Open
Abstract
Analyzing whole-genome bisulfite and related sequencing datasets is a time-intensive process due to the complexity and size of the input raw sequencing files and lengthy read alignment step requiring correction for conversion of all unmethylated Cs to Ts genome-wide. The objective of this study was to modify the read alignment algorithm associated with the whole-genome bisulfite sequencing methylation analysis pipeline (wg-blimp) to shorten the time required to complete this phase while retaining overall read alignment accuracy. Here, we report an update to the recently published pipeline wg-blimp achieved by replacing the use of the bwa-meth aligner with the faster gemBS aligner. This improvement to the wg-blimp pipeline has led to a more than ×7 acceleration in the processing speed of samples when scaled to larger publicly available FASTQ datasets containing 80-160 million reads while maintaining nearly identical accuracy of properly mapped reads when compared with data from the previous pipeline. The modifications to the wg-blimp pipeline reported here merge the speed and accuracy of the gemBS aligner with the comprehensive analysis and data visualization assets of the wg-blimp pipeline to provide a significantly accelerated workflow that can produce high-quality data much more rapidly without compromising read accuracy at the expense of increasing RAM requirements up to 48 GB.
Collapse
Affiliation(s)
- Jake D Lehle
- Correspondence address. Department of Neurosciences, Developmental and Regenerative Biology, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX 78249, USA. Tel: +1 (512)-992-8144; E-mail:
| | - John R McCarrey
- Department of Neuroscience, Developmental and Regenerative Biology, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| |
Collapse
|
7
|
Mokhtar MM, Abd-Elhalim HM, El Allali A. A large-scale assessment of the quality of plant genome assemblies using the LTR assembly index. AOB PLANTS 2023; 15:plad015. [PMID: 37197714 PMCID: PMC10184434 DOI: 10.1093/aobpla/plad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/01/2023] [Indexed: 05/19/2023]
Abstract
Recent advances in genome sequencing have led to an increase in the number of sequenced genomes. However, the presence of repetitive sequences complicates the assembly of plant genomes. The LTR assembly index (LAI) has recently been widely used to assess the quality of genome assembly, as a higher LAI is associated with a higher quality of assembly. Here, we assessed the quality of assembled genomes of 1664 plant and algal genomes using LAI and reported the results as data repository called PlantLAI (https://bioinformatics.um6p.ma/PlantLAI). A number of 55 117 586 pseudomolecules/scaffolds with a total length of 988.11 gigabase-pairs were examined using the LAI workflow. A total of 46 583 551 accurate LTR-RTs were discovered, including 2 263 188 Copia, 2 933 052 Gypsy, and 1 387 311 unknown superfamilies. Consequently, only 1136 plant genomes are suitable for LAI calculation, with values ranging from 0 to 31.59. Based on the quality classification system, 476 diploid genomes were classified as draft, 472 as reference, and 135 as gold genomes. We also provide a free webtool to calculate the LAI of newly assembled genomes and the ability to save the result in the repository. The data repository is designed to fill in the gaps in the reported LAI of existing genomes, while the webtool is designed to help researchers calculate the LAI of their newly sequenced genomes.
Collapse
Affiliation(s)
| | - Haytham M Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt
| | | |
Collapse
|
8
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
9
|
Hotaling S, Wilcox ER, Heckenhauer J, Stewart RJ, Frandsen PB. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics 2023; 24:117. [PMID: 36927511 PMCID: PMC10018877 DOI: 10.1186/s12864-023-09193-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.
Collapse
Affiliation(s)
- Scott Hotaling
- Department of Watershed Sciences, Utah State University, Logan, UT, USA.
| | - Edward R Wilcox
- DNA Sequencing Center, Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325, Frankfurt, Germany
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA.
- Data Science Lab, Smithsonian Institution, Washington, DC, USA.
| |
Collapse
|
10
|
Shi J, Tian Z, Lai J, Huang X. Plant pan-genomics and its applications. MOLECULAR PLANT 2023; 16:168-186. [PMID: 36523157 DOI: 10.1016/j.molp.2022.12.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 12/07/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals. The variable DNA sequences, along with the conserved core sequences, compose the more sophisticated pan-genome that represents the collection of all non-redundant DNA in a species. With rapid progress in genome sequencing technologies, pan-genome research in plants is now accelerating. Here we review recent advances in plant pan-genomics, including major driving forces of structural variations that constitute the variable sequences, methodological innovations for representing the pan-genome, and major successes in constructing plant pan-genomes. We also summarize recent efforts toward decoding the remaining dark matter in telomere-to-telomere or gapless plant genomes. These new genome resources, which have remarkable advantages over numerous previously assembled less-than-perfect genomes, are expected to become new references for genetic studies and plant breeding.
Collapse
Affiliation(s)
- Junpeng Shi
- State Key Laboratory of Biocontrol, School of Agriculture, Sun Yat-sen University, Shenzhen 518107, China.
| | - Zhixi Tian
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China.
| |
Collapse
|
11
|
Rabanal FA, Gräff M, Lanz C, Fritschi K, Llaca V, Lang M, Carbonell-Bejerano P, Henderson I, Weigel D. Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes. Nucleic Acids Res 2022; 50:12309-12327. [PMID: 36453992 PMCID: PMC9757041 DOI: 10.1093/nar/gkac1115] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/13/2022] [Accepted: 11/10/2022] [Indexed: 12/05/2022] Open
Abstract
Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.
Collapse
Affiliation(s)
- Fernando A Rabanal
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Maike Gräff
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Christa Lanz
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Katrin Fritschi
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Victor Llaca
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Michelle Lang
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Pablo Carbonell-Bejerano
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ian Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
12
|
Steenwyk JL, Buida Iii TJ, Gonçalves C, Goltz DC, Morales G, Mead ME, LaBella AL, Chavez CM, Schmitz JE, Hadjifrangiskou M, Li Y, Rokas A. BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data. Genetics 2022; 221:6583183. [PMID: 35536198 PMCID: PMC9252278 DOI: 10.1093/genetics/iyac079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 05/03/2022] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic analysis-such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis-is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | | | - Carla Gonçalves
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal.,UCIBIO-Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
| | | | - Grace Morales
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Matthew E Mead
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Abigail L LaBella
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Christina M Chavez
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Jonathan E Schmitz
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Maria Hadjifrangiskou
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yuanning Li
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
13
|
Zhang H, Li R, Guo Y, Zhang Y, Zhang D, Yang L. LIFE-Seq: a universal Large Integrated DNA Fragment Enrichment Sequencing strategy for deciphering the transgene integration of genetically modified organisms. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:964-976. [PMID: 34990051 PMCID: PMC9055813 DOI: 10.1111/pbi.13776] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/18/2021] [Accepted: 12/30/2021] [Indexed: 06/14/2023]
Abstract
Molecular characterization of genetically modified organisms (GMOs) yields basic information on exogenous DNA integration, including integration sites, entire inserted sequences and structures, flanking sequences and copy number, providing key data for biosafety assessment. However, there are few effective methods for deciphering transgene integration, especially for large DNA fragment integration with complex rearrangement, inversion and tandem repeats. Herein, we developed a universal Large Integrated DNA Fragments Enrichment strategy combined with PacBio Sequencing (LIFE-Seq) for deciphering transgene integration in GMOs. Universal tilling DNA probes targeting transgenic elements and exogenous genes facilitate specific enrichment of large inserted DNA fragments associated with transgenes from plant genomes, followed by PacBio sequencing. LIFE-Seq were evaluated using six GM events and four crop species. Target DNA fragments averaging ~6275 bp were enriched and sequenced, generating ~26 352 high fidelity reads for each sample. Transgene integration structures were determined with high repeatability and sensitivity. Compared with next-generation whole-genome sequencing, LIFE-Seq achieved better data integrity and accuracy, greater universality and lower cost, especially for transgenic crops with complex inserted DNA structures. LIFE-Seq could be applied in molecular characterization of transgenic crops and animals, and complex DNA structure analysis in genetics research.
Collapse
Affiliation(s)
- Hanwen Zhang
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Rong Li
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Yongkun Guo
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Yuchen Zhang
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Dabing Zhang
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Litao Yang
- National Center for the Molecular Characterization of Genetically Modified OrganismsJoint International Research Laboratory of Metabolic and Developmental SciencesSchool of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| |
Collapse
|
14
|
Vargas-Chavez C, Longo Pendy NM, Nsango SE, Aguilera L, Ayala D, González J. Transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzzii. Genome Res 2021; 32:189-202. [PMID: 34965939 PMCID: PMC8744685 DOI: 10.1101/gr.275761.121] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/24/2021] [Indexed: 11/28/2022]
Abstract
Anopheles coluzzii is one of the primary vectors of human malaria in sub-Saharan Africa. Recently, it has spread into the main cities of Central Africa threatening vector control programs. The adaptation of An. coluzzii to urban environments partly results from an increased tolerance to organic pollution and insecticides. Some of the molecular mechanisms for ecological adaptation are known, but the role of transposable elements (TEs) in the adaptive processes of this species has not been studied yet. As a first step toward assessing the role of TEs in rapid urban adaptation, we sequenced using long reads six An. coluzzii genomes from natural breeding sites in two major Central Africa cities. We de novo annotated TEs in these genomes and in an additional high-quality An. coluzzii genome, and we identified 64 new TE families. TEs were nonrandomly distributed throughout the genome with significant differences in the number of insertions of several superfamilies across the studied genomes. We identified seven putatively active families with insertions near genes with functions related to vectorial capacity, and several TEs that may provide promoter and transcription factor binding sites to insecticide resistance and immune-related genes. Overall, the analysis of multiple high-quality genomes allowed us to generate the most comprehensive TE annotation in this species to date and identify several TE insertions that could potentially impact both genome architecture and the regulation of functionally relevant genes. These results provide a basis for future studies of the impact of TEs on the biology of An. coluzzii.
Collapse
Affiliation(s)
- Carlos Vargas-Chavez
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Neil Michel Longo Pendy
- Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), BP 769, Franceville, Gabon.,École Doctorale Régional (EDR) en Infectiologie Tropicale d'Afrique Centrale, BP 876, Franceville, Gabon
| | - Sandrine E Nsango
- Faculté de Médecine et des Sciences Pharmaceutiques, Université de Douala, BP 2701, Douala, Cameroun
| | - Laura Aguilera
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Diego Ayala
- Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), BP 769, Franceville, Gabon.,Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), Université Montpellier, CNRS, IRD, 64501 Montpellier, France
| | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| |
Collapse
|
15
|
Wang W, Chen L, Fengler K, Bolar J, Llaca V, Wang X, Clark CB, Fleury TJ, Myrvold J, Oneal D, van Dyk MM, Hudson A, Munkvold J, Baumgarten A, Thompson J, Cai G, Crasta O, Aggarwal R, Ma J. A giant NLR gene confers broad-spectrum resistance to Phytophthora sojae in soybean. Nat Commun 2021; 12:6263. [PMID: 34741017 PMCID: PMC8571336 DOI: 10.1038/s41467-021-26554-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/06/2021] [Indexed: 11/29/2022] Open
Abstract
Phytophthora root and stem rot caused by P. sojae is a destructive soybean soil-borne disease found worldwide. Discovery of genes conferring broad-spectrum resistance to the pathogen is a need to prevent the outbreak of the disease. Here, we show that soybean Rps11 is a 27.7-kb nucleotide-binding site-leucine-rich repeat (NBS-LRR or NLR) gene conferring broad-spectrum resistance to the pathogen. Rps11 is located in a genomic region harboring a cluster of large NLR genes of a single origin in soybean, and is derived from rounds of unequal recombination. Such events result in promoter fusion and LRR expansion that may contribute to the broad resistance spectrum. The NLR gene cluster exhibits drastic structural diversification among phylogenetically representative varieties, including gene copy number variation ranging from five to 23 copies, and absence of allelic copies of Rps11 in any of the non-Rps11-donor varieties examined, exemplifying innovative evolution of NLR genes and NLR gene clusters.
Collapse
Affiliation(s)
- Weidong Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Liyang Chen
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Kevin Fengler
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Joy Bolar
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Victor Llaca
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Xutong Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Chancelor B Clark
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Tomara J Fleury
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
- Crop Production and Pest Control Research Unit, USDA, ARS, West Lafayette, IN, 47907, USA
| | - Jon Myrvold
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - David Oneal
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | | | - Ashley Hudson
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Jesse Munkvold
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Andy Baumgarten
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Jeff Thompson
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
| | - Guohong Cai
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
- Crop Production and Pest Control Research Unit, USDA, ARS, West Lafayette, IN, 47907, USA
| | - Oswald Crasta
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA
- R&D, Equinom, Inc., Indianapolis, IN, 46268, USA
| | - Rajat Aggarwal
- Research and Development, Corteva Agriscience™, Johnston, IA, 50131, USA.
| | - Jianxin Ma
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
16
|
Bornowski N, Michel KJ, Hamilton JP, Ou S, Seetharam AS, Jenkins J, Grimwood J, Plott C, Shu S, Talag J, Kennedy M, Hundley H, Singan VR, Barry K, Daum C, Yoshinaga Y, Schmutz J, Hirsch CN, Hufford MB, de Leon N, Kaeppler SM, Buell CR. Genomic variation within the maize stiff-stalk heterotic germplasm pool. THE PLANT GENOME 2021; 14:e20114. [PMID: 34275202 DOI: 10.1002/tpg2.20114] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 05/06/2021] [Indexed: 05/28/2023]
Abstract
The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.
Collapse
Affiliation(s)
- Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| | - Kathryn J Michel
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - John P Hamilton
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| | - Shujun Ou
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Arun S Seetharam
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Chris Plott
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Shengqiang Shu
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Jayson Talag
- Arizona Genomics Institute, School of Plant Sciences, Univ. of Arizona, 1657 E Helen Street, Tucson, AZ, 85721, USA
| | - Megan Kennedy
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Hope Hundley
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Vasanth R Singan
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Kerrie Barry
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Chris Daum
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Yuko Yoshinaga
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Candice N Hirsch
- Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 1991 Upper Buford Circle, Saint Paul, MN, 55108, USA
| | - Matthew B Hufford
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Natalia de Leon
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Wisconsin Crop Innovation Center, Univ. of Wisconsin - Madison, 8520 University Green, Middleton, WI, 53562, USA
| | - C Robin Buell
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| |
Collapse
|
17
|
Abstract
In genomics, optical mapping technology provides long-range contiguity information to improve genome sequence assemblies and detect structural variation. Originally a laborious manual process, Bionano Genomics platforms now offer high-throughput, automated optical mapping based on chips packed with nanochannels through which unwound DNA is guided and the fluorescent DNA backbone and specific restriction sites are recorded. Although the raw image data obtained is of high quality, the processing and assembly software accompanying the platforms is closed source and does not seem to make full use of data, labeling approximately half of the measured signals as unusable. Here we introduce two new software tools, independent of Bionano Genomics software, to extract and process molecules from raw images (OptiScan) and to perform molecule-to-molecule and molecule-to-reference alignments using a novel signal-based approach (OptiMap). We demonstrate that the molecules detected by OptiScan can yield better assemblies, and that the approach taken by OptiMap results in higher use of molecules from the raw data. These tools lay the foundation for a suite of open-source methods to process and analyze high-throughput optical mapping data. The Python implementations of the OptiTools are publicly available through http://www.bif.wur.nl/.
Collapse
|
18
|
LeafGo: Leaf to Genome, a quick workflow to produce high-quality de novo plant genomes using long-read sequencing technology. Genome Biol 2021; 22:256. [PMID: 34479618 PMCID: PMC8414726 DOI: 10.1186/s13059-021-02475-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 08/20/2021] [Indexed: 02/06/2023] Open
Abstract
Currently, different sequencing platforms are used to generate plant genomes and no workflow has been properly developed to optimize time, cost, and assembly quality. We present LeafGo, a complete de novo plant genome workflow, that starts from tissue and produces genomes with modest laboratory and bioinformatic resources in approximately 7 days and using one long-read sequencing technology. LeafGo is optimized with ten different plant species, three of which are used to generate high-quality chromosome-level assemblies without any scaffolding technologies. Finally, we report the diploid genomes of Eucalyptus rudis and E. camaldulensis and the allotetraploid genome of Arachis hypogaea.
Collapse
|
19
|
Hufford MB, Seetharam AS, Woodhouse MR, Chougule KM, Ou S, Liu J, Ricci WA, Guo T, Olson A, Qiu Y, Della Coletta R, Tittes S, Hudson AI, Marand AP, Wei S, Lu Z, Wang B, Tello-Ruiz MK, Piri RD, Wang N, Kim DW, Zeng Y, O'Connor CH, Li X, Gilbert AM, Baggs E, Krasileva KV, Portwood JL, Cannon EKS, Andorf CM, Manchanda N, Snodgrass SJ, Hufnagel DE, Jiang Q, Pedersen S, Syring ML, Kudrna DA, Llaca V, Fengler K, Schmitz RJ, Ross-Ibarra J, Yu J, Gent JI, Hirsch CN, Ware D, Dawe RK. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 2021; 373:655-662. [PMID: 34353948 PMCID: PMC8733867 DOI: 10.1126/science.abg5289] [Citation(s) in RCA: 233] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 06/24/2021] [Indexed: 12/24/2022]
Abstract
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
Collapse
Affiliation(s)
- Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Genome Informatics Facility, Iowa State University, Ames, IA 50011, USA
| | - Margaret R Woodhouse
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | | | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Jianing Liu
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - William A Ricci
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Tingting Guo
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Silas Tittes
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Asher I Hudson
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | | | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Rebecca D Piri
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Na Wang
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Dong Won Kim
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Yibing Zeng
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Christine H O'Connor
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA
| | - Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Amanda M Gilbert
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Erin Baggs
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - John L Portwood
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Ethalinda K S Cannon
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carson M Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Nancy Manchanda
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Samantha J Snodgrass
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David E Hufnagel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Qiuhan Jiang
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Sarah Pedersen
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Michael L Syring
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David A Kudrna
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | | | | | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Jeffrey Ross-Ibarra
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Jonathan I Gent
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Doreen Ware
- USDA-ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - R Kelly Dawe
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
20
|
Wick RR, Judd LM, Wyres KL, Holt KE. Recovery of small plasmid sequences via Oxford Nanopore sequencing. Microb Genom 2021; 7:000631. [PMID: 34431763 PMCID: PMC8549360 DOI: 10.1099/mgen.0.000631] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 06/11/2021] [Indexed: 12/13/2022] Open
Abstract
Oxford Nanopore Technologies (ONT) sequencing platforms currently offer two approaches to whole-genome native-DNA library preparation: ligation and rapid. In this study, we compared these two approaches for bacterial whole-genome sequencing, with a specific aim of assessing their ability to recover small plasmid sequences. To do so, we sequenced DNA from seven plasmid-rich bacterial isolates in three different ways: ONT ligation, ONT rapid and Illumina. Using the Illumina read depths to approximate true plasmid abundance, we found that small plasmids (<20 kbp) were underrepresented in ONT ligation read sets (by a mean factor of ~4) but were not underrepresented in ONT rapid read sets. This effect correlated with plasmid size, with the smallest plasmids being the most underrepresented in ONT ligation read sets. We also found lower rates of chimaeric reads in the rapid read sets relative to ligation read sets. These results show that when small plasmid recovery is important, ONT rapid library preparations are preferable to ligation-based protocols.
Collapse
Affiliation(s)
- Ryan R. Wick
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Louise M. Judd
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kelly L. Wyres
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
| | - Kathryn E. Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
21
|
Baiakhmetov E, Guyomar C, Shelest E, Nobis M, Gudkova PD. The first draft genome of feather grasses using SMRT sequencing and its implications in molecular studies of Stipa. Sci Rep 2021; 11:15345. [PMID: 34321531 PMCID: PMC8319324 DOI: 10.1038/s41598-021-94068-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 06/24/2021] [Indexed: 11/22/2022] Open
Abstract
The Eurasian plant Stipa capillata is the most widespread species within feather grasses. Many taxa of the genus are dominants in steppe plant communities and can be used for their classification and in studies related to climate change. Moreover, some species are of economic importance mainly as fodder plants and can be used for soil remediation processes. Although large-scale molecular data has begun to appear, there is still no complete or draft genome for any Stipa species. Thus, here we present a single-molecule long-read sequencing dataset generated using the Pacific Biosciences Sequel System. A draft genome of about 1004 Mb was obtained with a contig N50 length of 351 kb. Importantly, here we report 81,224 annotated protein-coding genes, present 77,614 perfect and 58 unique imperfect SSRs, reveal the putative allopolyploid nature of S. capillata, investigate the evolutionary history of the genus, demonstrate structural heteroplasmy of the chloroplast genome and announce for the first time the mitochondrial genome in Stipa. The assembled nuclear, mitochondrial and chloroplast genomes provide a significant source of genetic data for further works on phylogeny, hybridisation and population studies within Stipa and the grass family Poaceae.
Collapse
Affiliation(s)
- Evgenii Baiakhmetov
- Institute of Botany, Faculty of Biology, Jagiellonian University, Gronostajowa 3, 30-387, Kraków, Poland. .,Research Laboratory 'Herbarium', National Research Tomsk State University, Lenin 36 Ave., Tomsk, 634050, Russia.
| | - Cervin Guyomar
- German Centre for Integrative Biodiversity Research (iDiv), Puschstrasse 4, 04103, Leipzig, Germany.,Institute for Genetics, Environment and Plant Protection (IGEPP), Agrocampus Ouest, INRAE, University of Rennes 1, 35650, Le Rheu, France
| | - Ekaterina Shelest
- German Centre for Integrative Biodiversity Research (iDiv), Puschstrasse 4, 04103, Leipzig, Germany.,Centre for Enzyme Innovation, University of Portsmouth, Portsmouth, PO1 2UP, UK
| | - Marcin Nobis
- Institute of Botany, Faculty of Biology, Jagiellonian University, Gronostajowa 3, 30-387, Kraków, Poland. .,Research Laboratory 'Herbarium', National Research Tomsk State University, Lenin 36 Ave., Tomsk, 634050, Russia.
| | - Polina D Gudkova
- Research Laboratory 'Herbarium', National Research Tomsk State University, Lenin 36 Ave., Tomsk, 634050, Russia.,Department of Biology, Altai State University, Lenin 61 Ave., Barnaul, Russia, 656049
| |
Collapse
|
22
|
Sutton JM, Millwood JD, Case McCormack A, Fierst JL. Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies. GIGABYTE 2021; 2021:gigabyte27. [PMID: 36824342 PMCID: PMC9650304 DOI: 10.46471/gigabyte.27] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/05/2021] [Indexed: 11/09/2022] Open
Abstract
High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.
Collapse
Affiliation(s)
- John M. Sutton
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Joshua D. Millwood
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - A. Case McCormack
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| | - Janna L. Fierst
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487-0344, USA
| |
Collapse
|
23
|
Lin G, He C, Zheng J, Koo DH, Le H, Zheng H, Tamang TM, Lin J, Liu Y, Zhao M, Hao Y, McFraland F, Wang B, Qin Y, Tang H, McCarty DR, Wei H, Cho MJ, Park S, Kaeppler H, Kaeppler SM, Liu Y, Springer N, Schnable PS, Wang G, White FF, Liu S. Chromosome-level genome assembly of a regenerable maize inbred line A188. Genome Biol 2021; 22:175. [PMID: 34108023 PMCID: PMC8188678 DOI: 10.1186/s13059-021-02396-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 05/28/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The maize inbred line A188 is an attractive model for elucidation of gene function and improvement due to its high embryogenic capacity and many contrasting traits to the first maize reference genome, B73, and other elite lines. The lack of a genome assembly of A188 limits its use as a model for functional studies. RESULTS Here, we present a chromosome-level genome assembly of A188 using long reads and optical maps. Comparison of A188 with B73 using both whole-genome alignments and read depths from sequencing reads identify approximately 1.1 Gb of syntenic sequences as well as extensive structural variation, including a 1.8-Mb duplication containing the Gametophyte factor1 locus for unilateral cross-incompatibility, and six inversions of 0.7 Mb or greater. Increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 is associated with elevated expression during seed development. High ccd1 expression in seeds together with low expression of yellow endosperm 1 (y1) reduces carotenoid accumulation, accounting for the white seed phenotype of A188. Furthermore, transcriptome and epigenome analyses reveal enhanced expression of defense pathways and altered DNA methylation patterns of the embryonic callus. CONCLUSIONS The A188 genome assembly provides a high-resolution sequence for a complex genome species and a foundational resource for analyses of genome variation and gene function in maize. The genome, in comparison to B73, contains extensive intra-species structural variations and other genetic differences. Expression and network analyses identify discrete profiles for embryonic callus and other tissues.
Collapse
Affiliation(s)
- Guifang Lin
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Cheng He
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Jun Zheng
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Dal-Hoe Koo
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Ha Le
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Huakun Zheng
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Tej Man Tamang
- Department of Horticulture and Natural Resources, Kansas State University, Manhattan, KS, 66506-5502, USA
| | - Jinguang Lin
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
- Present Address, Corvallis, OR, 97330, USA
| | - Yan Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Mingxia Zhao
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Yangfan Hao
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA
| | - Frank McFraland
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Yang Qin
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Haibao Tang
- Center for Genomics and Biotechnology and Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Donald R McCarty
- Department of Horticulture, University of Florida, Gainesville, FL, 32611-0680, USA
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Myeong-Je Cho
- Innovative Genomics Institute, University of California-Berkeley, Sunnyvale, CA, 94704, USA
| | - Sunghun Park
- Department of Horticulture and Natural Resources, Kansas State University, Manhattan, KS, 66506-5502, USA
| | - Heidi Kaeppler
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Yunjun Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Nathan Springer
- Department of Plant Biology, University of Minnesota, Saint Paul, MN, 55108, USA
| | - Patrick S Schnable
- Department of Agronomy, Iowa State University, Ames, IA, 50011-3605, USA
| | - Guoying Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Frank F White
- Department of Plant Pathology, University of Florida, Gainesville, FL, 32611-0680, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, 4024 Throckmorton Center, Manhattan, KS, 66506-5502, USA.
| |
Collapse
|
24
|
Yang N, Yan J. New genomic approaches for enhancing maize genetic improvement. CURRENT OPINION IN PLANT BIOLOGY 2021; 60:101977. [PMID: 33418269 DOI: 10.1016/j.pbi.2020.11.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/07/2020] [Accepted: 11/16/2020] [Indexed: 05/13/2023]
Abstract
Maize (Zea mays) is one of the most widely grown crops in the world, with an annual global production of over 1147 million tons. Genomics approaches are thought to be the best solution for accelerating yield improvement to meet the challenges of a growing population and global climate change. Here, we review current approaches to the exploration of novel genetic variation in genomes, DNA modifications, and transcription levels of cultivated maize, landraces, and wild relatives. We discuss applications of genetic engineering to maize yield improvement and highlight future directions for maize genomics studies.
Collapse
Affiliation(s)
- Ning Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
25
|
Luo J, Wei Y, Lyu M, Wu Z, Liu X, Luo H, Yan C. A comprehensive review of scaffolding methods in genome assembly. Brief Bioinform 2021; 22:6149347. [PMID: 33634311 DOI: 10.1093/bib/bbab033] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022] Open
Abstract
In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Yawei Wei
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
26
|
Genome assembly and population genomic analysis provide insights into the evolution of modern sweet corn. Nat Commun 2021; 12:1227. [PMID: 33623026 PMCID: PMC7902669 DOI: 10.1038/s41467-021-21380-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 01/26/2021] [Indexed: 01/31/2023] Open
Abstract
Sweet corn is one of the most important vegetables in the United States and Canada. Here, we present a de novo assembly of a sweet corn inbred line Ia453 with the mutated shrunken2-reference allele (Ia453-sh2). This mutation accumulates more sugar and is present in most commercial hybrids developed for the processing and fresh markets. The ten pseudochromosomes cover 92% of the total assembly and 99% of the estimated genome size, with a scaffold N50 of 222.2 Mb. This reference genome completely assembles the large structural variation that created the mutant sh2-R allele. Furthermore, comparative genomics analysis with six field corn genomes highlights differences in single-nucleotide polymorphisms, structural variations, and transposon composition. Phylogenetic analysis of 5,381 diverse maize and teosinte accessions reveals genetic relationships between sweet corn and other types of maize. Our results show evidence for a common origin in northern Mexico for modern sweet corn in the U.S. Finally, population genomic analysis identifies regions of the genome under selection and candidate genes associated with sweet corn traits, such as early flowering, endosperm composition, plant and tassel architecture, and kernel row number. Our study provides a high-quality reference-genome sequence to facilitate comparative genomics, functional studies, and genomic-assisted breeding for sweet corn.
Collapse
|
27
|
Schwartz C, Lenderts B, Feigenbutz L, Barone P, Llaca V, Fengler K, Svitashev S. CRISPR-Cas9-mediated 75.5-Mb inversion in maize. NATURE PLANTS 2020; 6:1427-1431. [PMID: 33299151 DOI: 10.1038/s41477-020-00817-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 11/04/2020] [Indexed: 05/11/2023]
Abstract
CRISPR-Cas is a powerful double-strand-break technology with wide-ranging applications from gene discovery to commercial product development. Thus far, this tool has been almost exclusively used for gene knockouts and deletions, with a few examples of gene edits and targeted gene insertions. Here, we demonstrate the application of CRISPR-Cas9 technology to mediate targeted 75.5-Mb pericentric inversion in chromosome 2 in one of the elite maize inbred lines from Corteva Agriscience. This inversion unlocks a large chromosomal region containing substantial genetic variance for recombination, thus providing opportunities for the development of new maize varieties with improved phenotypes.
Collapse
|
28
|
Hufnagel DE, Hufford MB, Seetharam AS. SequelTools: a suite of tools for working with PacBio Sequel raw sequence data. BMC Bioinformatics 2020; 21:429. [PMID: 33004007 PMCID: PMC7532105 DOI: 10.1186/s12859-020-03751-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 09/11/2020] [Indexed: 12/20/2022] Open
Abstract
Background PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. Results Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. Conclusions SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools.
Collapse
Affiliation(s)
- David E Hufnagel
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, 50011, USA. .,Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA.
| | - Matthew B Hufford
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
29
|
Pham GM, Hamilton JP, Wood JC, Burke JT, Zhao H, Vaillancourt B, Ou S, Jiang J, Buell CR. Construction of a chromosome-scale long-read reference genome assembly for potato. Gigascience 2020; 9:giaa100. [PMID: 32964225 PMCID: PMC7509475 DOI: 10.1093/gigascience/giaa100] [Citation(s) in RCA: 132] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/26/2020] [Accepted: 09/05/2020] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1-3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality chromosome-scale genome assemblies at minimal cost. FINDINGS Here, we present an updated version of the DM1-3 516 R44 genome sequence (v6.1) using Oxford Nanopore Technologies long reads coupled with proximity-by-ligation scaffolding (Hi-C), yielding a chromosome-scale assembly. The new (v6.1) assembly represents 741.6 Mb of sequence (87.8%) of the estimated 844 Mb genome, of which 741.5 Mb is non-gapped with 731.2 Mb anchored to the 12 chromosomes. Use of Oxford Nanopore Technologies full-length complementary DNA sequencing enabled annotation of 32,917 high-confidence protein-coding genes encoding 44,851 gene models that had a significantly improved representation of conserved orthologs compared with the previous annotation. The new assembly has improved contiguity with a 595-fold increase in N50 contig size, 99% reduction in the number of contigs, a 44-fold increase in N50 scaffold size, and an LTR Assembly Index score of 13.56, placing it in the category of reference genome quality. The improved assembly also permitted annotation of the centromeres via alignment to sequencing reads derived from CENH3 nucleosomes. CONCLUSIONS Access to advanced sequencing technologies and improved software permitted generation of a high-quality, long-read, chromosome-scale assembly and improved annotation dataset for the reference genotype of potato that will facilitate research aimed at improving agronomic traits and understanding genome evolution.
Collapse
Affiliation(s)
- Gina M Pham
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - John P Hamilton
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - Joshua C Wood
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - Joseph T Burke
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - Hainan Zhao
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - Brieanne Vaillancourt
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, 2200 Osborne Dr, Ames, IA 50011, USA
| | - Jiming Jiang
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
- Department of Horticulture, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
- MSU AgBioResearch, Michigan State University, 446 W. Circle Drive, East Lansing, MI 48824, USA
| | - C Robin Buell
- Department of Plant Biology, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
- MSU AgBioResearch, Michigan State University, 446 W. Circle Drive, East Lansing, MI 48824, USA
- Plant Resilience Institute, Michigan State University, 612 Wilson Road, East Lansing, MI 48824, USA
| |
Collapse
|
30
|
Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, Fan G, Liu X, Xu X, Deng L, Zhang Y. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 2020; 9:giaa094. [PMID: 32893860 PMCID: PMC7476103 DOI: 10.1093/gigascience/giaa094] [Citation(s) in RCA: 139] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/15/2020] [Accepted: 08/14/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited. FINDINGS We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data. CONCLUSIONS TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.
Collapse
Affiliation(s)
- Mengyang Xu
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Lidong Guo
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- BGI Education Center, University of Chinese Academy of Sciences, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Shengqiang Gu
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- BGI Education Center, University of Chinese Academy of Sciences, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Ou Wang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- MGI, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Rui Zhang
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
| | - Brock A Peters
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- Complete Genomics Inc., 2904 Orchard Pkwy, San Jose, CA 95134, USA
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Xin Liu
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- China National GeneBank, BGI-Shenzhen, Jinsha Road, Dapeng New District, Shenzhen, 518120, China
| | - Xun Xu
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- China National GeneBank, BGI-Shenzhen, Jinsha Road, Dapeng New District, Shenzhen, 518120, China
| | - Li Deng
- BGI-Qingdao, BGI-Shenzhen, 2 Hengyunshan Road, West Coast New Area, Qingdao, 266426, China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Yongwei Zhang
- BGI-Shenzhen, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
- Complete Genomics Inc., 2904 Orchard Pkwy, San Jose, CA 95134, USA
| |
Collapse
|