1
|
Medhi U, Chaliha C, Singh A, Nath BK, Kalita E. Third generation sequencing transforming plant genome research: Current trends and challenges. Gene 2025; 940:149187. [PMID: 39724994 DOI: 10.1016/j.gene.2024.149187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 12/15/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]
Abstract
In recent years, third-generation sequencing (TGS) technologies have transformed genomics and transcriptomics research, providing novel opportunities for significant discoveries. The long-read sequencing platforms, with their unique advantages over next-generation sequencing (NGS), including a definitive protocol, reduced operational time, and real-time sequencing, possess the potential to transform plant genomics. TGS optimizes and enhances the efficiency of data analysis by removing the necessity for time-consuming assembly tools. The current review examines the development and application of bioinformatics tools for data analysis and annotation, driven by the rapid advancement of TGS platforms like Oxford Nanopore Technologies and Pacific Biosciences. Transcriptome analysis utilizing TGS has been extensively employed to elucidate complex plant transcriptomes and genomes, particularly those characterized by high frequencies of duplicated genomes and repetitive sequences. As a result, current methodologies that allow for generating transcriptomes and comprehensive whole-genome sequences of complex plant genomes employing tailored hybrid sequencing techniques that integrate NGS and TGS technologies have been emphasized herein. This paper, thus, articulates a vision for a future in which TGS effectively addresses the challenges faced in plant research, offering a comprehensive understanding of its advantages, applications, limitations, and promising prospects.
Collapse
Affiliation(s)
- Upasana Medhi
- Department of Molecular Biology and Biotechnology, Cotton University, Panbazar, Guwahati, Assam, 781001, India
| | - Chayanika Chaliha
- School of Natural Resource Management, College of Post Graduate Studies in Agricultural Sciences-CAU Imphal, Umiam, Meghalaya, 793104, India
| | - Archana Singh
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi, 110021, India
| | - Bikash K Nath
- Department of Molecular Biology and Biotechnology, Tezpur University, Assam, 784028, India
| | - Eeshan Kalita
- Department of Molecular Biology and Biotechnology, Cotton University, Panbazar, Guwahati, Assam, 781001, India.
| |
Collapse
|
2
|
Hu Y, Jiang K, Xia S, Zhang W, Guo J, Wang H. Amoeba community dynamics and assembly mechanisms in full-scale drinking water distribution networks under various disinfectant regimens. WATER RESEARCH 2025; 271:122861. [PMID: 39615115 DOI: 10.1016/j.watres.2024.122861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 11/24/2024] [Accepted: 11/25/2024] [Indexed: 01/14/2025]
Abstract
Free-living amoebae (FLA) are prevalent in drinking water distribution networks (DWDNs), yet our understanding of FLA community dynamics and assembly mechanisms in DWDNs remains limited. This study characterized the occurrence patterns of amoeba communities and identified key factors influencing their assembly across four full-scale DWDNs in three Chinese cities, each utilizing different disinfectants (chlorine, chloramine, and chlorine dioxide). High-throughput sequencing of full-length 18S rRNA genes revealed highly diverse FLA communities and an array of rare FLA species in DWDNs. Unique FLA community structures and higher gene copy numbers of three amoeba taxa of concern (Vermamoeba vermiformis, Acanthamoeba, and Naegleria fowleri) were observed in the chloraminated DWDN, highlighting the distinct impact of chloramine on shaping the amoeba community. The FLA communities in DWDNs were primarily driven by deterministic processes, with disinfectant and nitrogen compounds (nitrate, nitrite, and ammonia) identified as the main influencing factors. Machine learning models revealed high SHapley Additive exPlanations (SHAP) values of dominant amoeba genera (e.g., Vannella and Vermamoeba), indicating their critical ecological roles in shaping broader bacterial and eukaryotic communities. Correlation analyses between amoeba genera and bacterial taxa revealed that 82 % of the bacterial taxa exhibiting a negative correlation with amoebae were gram-negative, suggesting the preferred predation of amoebae toward gram-negative bacteria. Network analysis revealed the presence of only one to two amoebae in distinct modules, suggesting that individual amoebae might be selective in grazing. These findings provide insight into the amoeba community dynamics, assembly mechanisms and ecological roles of amoebae in drinking water, which can aid in risk assessments and mitigation strategies within DWDNs.
Collapse
Affiliation(s)
- Yuxing Hu
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China; Australian Centre for Water and Environmental Biotechnology (ACWEB, formerly AWMC), The University of Queensland, St Lucia, Queensland 4072, Australia
| | - Kaiyang Jiang
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Siqing Xia
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| | - Weixian Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Jianhua Guo
- Australian Centre for Water and Environmental Biotechnology (ACWEB, formerly AWMC), The University of Queensland, St Lucia, Queensland 4072, Australia
| | - Hong Wang
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| |
Collapse
|
3
|
Wang R, Song N, Zhao L. Chromosome-Level Genome Assembly and Comparative Genomic Analysis of Planiliza haematocheilus: Insights into Environmental Adaptation and Hypoxia Tolerance Mechanisms. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2025; 27:36. [PMID: 39878786 DOI: 10.1007/s10126-025-10419-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Accepted: 01/15/2025] [Indexed: 01/31/2025]
Abstract
Planiliza haematocheilus, a teleostan species noted for its ecological adaptability and economic significance, thrives in both freshwater and marine environments. This study presents a novel chromosome-level genome assembly through Hi-C, PacBio CCS, and Illumina sequencing methods. The assembled genome has a final size of 651.58 Mb, with 24 chromosomes anchoring 91.94% of contigs. Contig N50 and scaffold N50 are respectively measured at 25.52 Mb and 28.59 Mb. Of the 22,476 protein-coding genes identified in the genome, 21,834 have functional annotations. BUSCO (Benchmarking Universal Single-Copy Orthologs) genome and gene annotation assessments yielded scores of 96% and 96.6%, respectively. The genome of P. haematocheilus revealed 228 expanded and 1433 contracted gene families. Comparative genomic analyses highlight adaptations and hypoxia tolerance, linked to protein synthesis, immune response, and metabolic regulation. The high-quality genome assembly supports advanced studies on gene expression patterns under different environmental stressors, contributing to genetic enhancement efforts for this economically important aquaculture species.
Collapse
Affiliation(s)
- Ruizhi Wang
- Ministry of Education, The Key Laboratory of Mariculture (Ocean University of China), Qingdao, 266100, China
| | - Na Song
- Ministry of Education, The Key Laboratory of Mariculture (Ocean University of China), Qingdao, 266100, China.
| | - Linlin Zhao
- Marine Ecology Research Center, Ministry of Natural Resources, First Institute of Oceanography, Qingdao, 266061, China.
| |
Collapse
|
4
|
Guo M, Bi G, Wang H, Ren H, Chen J, Lian Q, Wang X, Fang W, Zhang J, Dong Z, Pang Y, Zhang Q, Huang S, Yan J, Zhao X. Genomes of autotetraploid wild and cultivated Ziziphus mauritiana reveal polyploid evolution and crop domestication. PLANT PHYSIOLOGY 2024; 196:2701-2720. [PMID: 39325737 DOI: 10.1093/plphys/kiae512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/28/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024]
Abstract
Indian jujube (Ziziphus mauritiana) holds a prominent position in the global fruit and pharmaceutical markets. Here, we report the assemblies of haplotype-resolved, telomere-to-telomere genomes of autotetraploid wild and cultivated Indian jujube plants using a 2-stage assembly strategy. The generation of these genomes permitted in-depth investigations into the divergence and evolutionary history of this important fruit crop. Using a graph-based pan-genome constructed from 8 monoploid genomes, we identified structural variation (SV)-FST hotspots and SV hotspots. Gap-free genomes provide a means to obtain a global view of centromere structures. We identified presence-absence variation-related genes in 4 monoploid genomes (cI, cIII, wI, and wIII) and resequencing populations. We also present the population structure and domestication trajectory of the Indian jujube based on the resequencing of 73 wild and cultivated accessions. Metabolomic and transcriptomic analyses of mature fruits of wild and cultivated accessions unveiled the genetic basis underlying loss of fruit astringency during domestication of Indian jujube. This study reveals mechanisms underlying the divergence, evolution, and domestication of the autotetraploid Indian jujube and provides rich and reliable genetic resources for future research.
Collapse
Affiliation(s)
- Mingxin Guo
- College of Life Sciences, Luoyang Normal University, Luoyang 471934, China
| | - Guiqi Bi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Huan Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, and College of Plant Protection, South China Agricultural University, Guangzhou 510642, China
| | - Hui Ren
- Horticultural Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
| | - Jiaying Chen
- South Subtropical Crops Research Institute, Chinese Academy of Tropical Agricultural Sciences, Zhanjiang 524000, China
| | - Qun Lian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Xiaomei Wang
- Horticultural Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
| | - Weikuan Fang
- Horticultural Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
| | - Jiangjiang Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Zhaonian Dong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yi Pang
- College of Life Sciences, Luoyang Normal University, Luoyang 471934, China
| | - Quanling Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Jianbin Yan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Xusheng Zhao
- College of Life Sciences, Luoyang Normal University, Luoyang 471934, China
| |
Collapse
|
5
|
Iyer SV, Goodwin S, McCombie WR. Leveraging the power of long reads for targeted sequencing. Genome Res 2024; 34:1701-1718. [PMID: 39567237 PMCID: PMC11610587 DOI: 10.1101/gr.279168.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/01/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing technologies have improved the contiguity and, as a result, the quality of genome assemblies by generating reads long enough to span and resolve complex or repetitive regions of the genome. Several groups have shown the power of long reads in detecting thousands of genomic and epigenomic features that were previously missed by short-read sequencing approaches. While these studies demonstrate how long reads can help resolve repetitive and complex regions of the genome, they also highlight the throughput and coverage requirements needed to accurately resolve variant alleles across large populations using these platforms. At the time of this review, whole-genome long-read sequencing is more expensive than short-read sequencing on the highest throughput short-read instruments; thus, achieving sufficient coverage to detect low-frequency variants (such as somatic variation) in heterogenous samples remains challenging. Targeted sequencing, on the other hand, provides the depth necessary to detect these low-frequency variants in heterogeneous populations. Here, we review currently used and recently developed targeted sequencing strategies that leverage existing long-read technologies to increase the resolution with which we can look at nucleic acids in a variety of biological contexts.
Collapse
Affiliation(s)
- Shruti V Iyer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | |
Collapse
|
6
|
Xu B, Chen J, Song P, Gu H, Jiang F, Li B, Wei Q, Zhang T. A high-quality chromosome-level reference genome assembly of Tibetan antelope (Pantholops hodgsonii). Sci Data 2024; 11:1215. [PMID: 39532915 PMCID: PMC11557879 DOI: 10.1038/s41597-024-04089-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024] Open
Abstract
Tibetan antelope (Pantholops hodgsonii), a wild ruminant endemic to the Qinghai-Tibetan Plateau (QTP) in China, has evolved a series of genetic and physiological adaptation strategies to thrive in the harsh plateau environments. However, limited research on the genome of this species exists. Here, we established a high-quality chromosome-level reference genome assembly of the Tibetan antelope using PacBio HiFi, DNBSEQ, and Hi-C sequencing data. The assembly, totaling 3.13 GB, consists of 31 chromosomes (29 + X + partial Y), with a Scaffold N50 length of 92.23 Mb. The quality value (QV) and Benchmarking Universal Single-Copy Ortholog (BUSCO) score were 70.14 and 98.20%, respectively, indicating that our genome sequence is of high quality and completeness. Our genome not only contribute to the genetic conservation of Tibetan antelope but also provides a valuable resource for genetic, ecological, and evolutionary research within the sub-family Caprinae.
Collapse
Affiliation(s)
- Bo Xu
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China
| | - Jiarui Chen
- College of Ecological and Environmental Engineering, Qinghai University 10743, Xining, 810016, Qinghai, China
| | - Pengfei Song
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China
| | - Haifeng Gu
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China
| | - Feng Jiang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China
| | - Bin Li
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China
| | - Qing Wei
- College of Ecological and Environmental Engineering, Qinghai University 10743, Xining, 810016, Qinghai, China.
| | - Tongzuo Zhang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China.
- Qinghai Provincial Key Laboratory of Animal Ecological Genomics, Xining, 810008, Qinghai, China.
| |
Collapse
|
7
|
Tiwari VK, Saripalli G, Sharma PK, Poland J. Wheat genomics: genomes, pangenomes, and beyond. Trends Genet 2024; 40:982-992. [PMID: 39191555 DOI: 10.1016/j.tig.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 07/18/2024] [Accepted: 07/19/2024] [Indexed: 08/29/2024]
Abstract
There is an urgent need to improve wheat for upcoming challenges, including biotic and abiotic stresses. Sustainable wheat improvement requires the introduction of new genes and alleles in high-yielding wheat cultivars. Using new approaches, tools, and technologies to identify and introduce new genes in wheat cultivars is critical. High-quality genomes, transcriptomes, and pangenomes provide essential resources and tools to examine wheat closely to identify and manipulate new and targeted genes and alleles. Wheat genomics has improved excellently in the past 5 years, generating multiple genomes, pangenomes, and transcriptomes. Leveraging these resources allows us to accelerate our crop improvement pipelines. This review summarizes the progress made in wheat genomics and trait discovery in the past 5 years.
Collapse
Affiliation(s)
- Vijay K Tiwari
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA.
| | - Gautam Saripalli
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA; Department of Plant and Environmental Sciences, Pee Dee Research and Education Center, Clemson University, Florence, SC 29506, USA
| | - Parva K Sharma
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA
| | - Jesse Poland
- Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
8
|
Rodilla C, Núñez-Moreno G, Benitez Y, Romero R, Fernández-Caballero L, Mínguez P, Corton M, Ayuso C. Cas9-targeted-based long-read sequencing for genetic screening of RPE65 locus. Front Genet 2024; 15:1439153. [PMID: 39469149 PMCID: PMC11513366 DOI: 10.3389/fgene.2024.1439153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 08/06/2024] [Indexed: 10/30/2024] Open
Abstract
Introduction Long-read sequencing (LRS) enables accurate structural variant detection and variant phasing. When a molecular diagnosis is suspected, target enrichment can reduce the cost and duration of sequencing. Methods LRS was conducted in five inherited retinal dystrophy (IRD) patients harboring a monoallelic variant in RPE65 that remained uncharacterized after clinical exome sequencing (CES). CRISPR-Cas9 guide RNA probes were designed to target a 31 kb region, including the entire RPE65 locus. The DNA was sequenced on a MinION platform. Short-read ×30 whole-genome sequencing (WGS) was performed for five patients to validate nanopore results. Results The nanopore sequencing process yielded a median of 271 reads within the targeted region, with a mean depth of 109 and a median read size of 8 kb. All variants identified by CES have been detected using this approach, and no additional RPE65 gene causative variants were found. Nanopore variant detection demonstrated performance akin to short-read WGS at similar coverage levels, although exhibiting increased false positive calls at lower coverage. Discussion In this study, we explore the advantages of using a targeted approach together with long-read sequencing to identify variants associated with IRD. The results underscore the utility of targeted long reads for characterizing patients affected by rare diseases when first-tier diagnostic tests are non-conclusive.
Collapse
Affiliation(s)
- Cristina Rodilla
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Gonzalo Núñez-Moreno
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Yolanda Benitez
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Raquel Romero
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Lidia Fernández-Caballero
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Pablo Mínguez
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- Bioinformatics Unit, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Marta Corton
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Carmen Ayuso
- Department of Genetics and Genomics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
9
|
Wang R, Zheng Y, Zhang Z, Song K, Wu E, Zhu X, Wu TP, Ding J. MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell. Nat Commun 2024; 15:8798. [PMID: 39394211 PMCID: PMC11470080 DOI: 10.1038/s41467-024-53114-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align multi-mapping reads to either 'best-mapped' or 'random-mapped' locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.
Collapse
Affiliation(s)
- Ruohan Wang
- School of Computer Science, McGill University, Montreal, Quebec, Canada
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Yumin Zheng
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Zijian Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Kailu Song
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Erxi Wu
- Department of Neurosurgery, Baylor College of Medicine, Temple, TX, USA
- College of Medicine and Irma Lerma Rangel College of Pharmacy, Texas A&M University, College Station, TX, USA
- LIVESTRONG Cancer Institutes and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX, USA
| | | | - Tao P Wu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Jun Ding
- School of Computer Science, McGill University, Montreal, Quebec, Canada.
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada.
- Department of Medicine, McGill University, Montreal, Quebec, Canada.
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada.
- Mila-Quebec AI Institue, Montreal, Quebec, Canada.
| |
Collapse
|
10
|
Sellinger T, Johannes F, Tellier A. Improved inference of population histories by integrating genomic and epigenomic data. eLife 2024; 12:RP89470. [PMID: 39264367 PMCID: PMC11392530 DOI: 10.7554/elife.89470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
With the availability of high-quality full genome polymorphism (SNPs) data, it becomes feasible to study the past demographic and selective history of populations in exquisite detail. However, such inferences still suffer from a lack of statistical resolution for recent, for example bottlenecks, events, and/or for populations with small nucleotide diversity. Additional heritable (epi)genetic markers, such as indels, transposable elements, microsatellites, or cytosine methylation, may provide further, yet untapped, information on the recent past population history. We extend the Sequential Markovian Coalescent (SMC) framework to jointly use SNPs and other hyper-mutable markers. We are able to (1) improve the accuracy of demographic inference in recent times, (2) uncover past demographic events hidden to SNP-based inference methods, and (3) infer the hyper-mutable marker mutation rates under a finite site model. As a proof of principle, we focus on demographic inference in Arabidopsis thaliana using DNA methylation diversity data from 10 European natural accessions. We demonstrate that segregating single methylated polymorphisms (SMPs) satisfy the modeling assumptions of the SMC framework, while differentially methylated regions (DMRs) are not suitable as their length exceeds that of the genomic distance between two recombination events. Combining SNPs and SMPs while accounting for site- and region-level epimutation processes, we provide new estimates of the glacial age bottleneck and post-glacial population expansion of the European A. thaliana population. Our SMC framework readily accounts for a wide range of heritable genomic markers, thus paving the way for next-generation inference of evolutionary history by combining information from several genetic and epigenetic markers.
Collapse
Affiliation(s)
- Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
- Department of Environment and Biodiversity, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Frank Johannes
- Professorship for Plant Epigenomics, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Munich, Germany
| |
Collapse
|
11
|
Zeng Z, Zhang Z, Tso N, Zhang S, Chen Y, Shu Q, Li J, Liang Z, Wang R, Wang J, Qiong L. Complete mitochondrial genome of Hippophae tibetana: insights into adaptation to high-altitude environments. FRONTIERS IN PLANT SCIENCE 2024; 15:1449606. [PMID: 39170791 PMCID: PMC11335646 DOI: 10.3389/fpls.2024.1449606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 07/17/2024] [Indexed: 08/23/2024]
Abstract
Hippophae tibetana, belonging to the Elaeagnaceae family, is an endemic plant species of the Qinghai-Tibet Plateau, valued for its remarkable ecological restoration capabilities, as well as medicinal and edible properties. Despite being acknowledged as a useful species, its mitochondrial genome data and those of other species of the Elaeagnaceae family are lacking to date. In this study, we, for the first time, successfully assembled the mitochondrial genome of H. tibetana, which is 464,208 bp long and comprises 31 tRNA genes, 3 rRNA genes, 37 protein-coding genes, and 3 pseudogenes. Analysis of the genome revealed a high copy number of the trnM-CAT gene and a high prevalence of repetitive sequences, both of which likely contribute to genome rearrangement and adaptive evolution. Through nucleotide diversity and codon usage bias analyses, we identified specific genes that are crucial for adaptation to high-altitude conditions. Notably, genes such as atp6, ccmB, nad4L, and nad7 exhibited signs of positive selection, indicating the presence of unique adaptive traits for survival in extreme environments. Phylogenetic analysis confirmed the close relationship between the Elaeagnaceae family and other related families, whereas intergenomic sequence transfer analysis revealed a substantial presence of homologous fragments among the mitochondrial, chloroplast, and whole genomes, which may be linked to the high-altitude adaptation mechanisms of H. tibetana. The findings of this study not only enrich our knowledge of H. tibetana molecular biology but also advance our understanding of the adaptive evolution of plants on the Qinghai-Tibet Plateau. This study provides a solid scientific foundation for the molecular breeding, conservation, and utilization of H. tibetana genetic resources.
Collapse
Affiliation(s)
- Zhefei Zeng
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
- Yani Observation and Research Station for Wetland Ecosystem of the Tibet (Xizang) Autonomous Region, Tibet University, Lhasa, China
| | - Zhengyan Zhang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, School of Life Sciences, Institute of Biodiversity Science, Fudan University, Shanghai, China
| | - Norzin Tso
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Shutong Zhang
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Yan Chen
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Qi Shu
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Junru Li
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Ziyi Liang
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
| | - Ruoqiu Wang
- Tech X Academy, Shenzhen Polytechnic University, Shenzhen, China
| | - Junwei Wang
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
- Yani Observation and Research Station for Wetland Ecosystem of the Tibet (Xizang) Autonomous Region, Tibet University, Lhasa, China
| | - La Qiong
- Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Tibet University, Lhasa, China
- Yani Observation and Research Station for Wetland Ecosystem of the Tibet (Xizang) Autonomous Region, Tibet University, Lhasa, China
| |
Collapse
|
12
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Hwang JP, Bhamidipati SV, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. CELL GENOMICS 2024; 4:100590. [PMID: 38908378 PMCID: PMC11293582 DOI: 10.1016/j.xgen.2024.100590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/27/2023] [Accepted: 05/31/2024] [Indexed: 06/24/2024]
Abstract
The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.
Collapse
Affiliation(s)
| | | | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Ming Yin Lun
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | | | - KyungHee Park
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eva Benito
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Medhat Mahmoud
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - James Paul Hwang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sravya V Bhamidipati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | - Sissel Juul
- Oxford Nanopore Technologies, New York, NY 10013, USA
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden; Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Fritz J Sedlazeck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Computer Science, Rice University, Houston TX 77030, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA
| | | |
Collapse
|
13
|
Luan T, Commichaux S, Hoffmann M, Jayeola V, Jang JH, Pop M, Rand H, Luo Y. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates. BMC Genomics 2024; 25:679. [PMID: 38978005 PMCID: PMC11232133 DOI: 10.1186/s12864-024-10582-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Seth Commichaux
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Maria Hoffmann
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Victor Jayeola
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Jae Hee Jang
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Yan Luo
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| |
Collapse
|
14
|
Stanojević D, Li Z, Bakić S, Foo R, Šikić M. Rockfish: A transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing. Nat Commun 2024; 15:5580. [PMID: 38961062 PMCID: PMC11222435 DOI: 10.1038/s41467-024-49847-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 06/19/2024] [Indexed: 07/05/2024] Open
Abstract
DNA methylation plays an important role in various biological processes, including cell differentiation, ageing, and cancer development. The most important methylation in mammals is 5-methylcytosine mostly occurring in the context of CpG dinucleotides. Sequencing methods such as whole-genome bisulfite sequencing successfully detect 5-methylcytosine DNA modifications. However, they suffer from the serious drawbacks of short read lengths and might introduce an amplification bias. Here we present Rockfish, a deep learning algorithm that significantly improves read-level 5-methylcytosine detection by using Nanopore sequencing. Rockfish is compared with other methods based on Nanopore sequencing on R9.4.1 and R10.4.1 datasets. There is an increase in the single-base accuracy and the F1 measure of up to 5 percentage points on R.9.4.1 datasets, and up to 0.82 percentage points on R10.4.1 datasets. Moreover, Rockfish shows a high correlation with whole-genome bisulfite sequencing, requires lower read depth, and achieves higher confidence in biologically important regions such as CpG-rich promoters while being computationally efficient. Its superior performance in human and mouse samples highlights its versatility for studying 5-methylcytosine methylation across varied organisms and diseases. Finally, its adaptable architecture ensures compatibility with new versions of pores and chemistry as well as modification types.
Collapse
Affiliation(s)
- Dominik Stanojević
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Zhe Li
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Sara Bakić
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Roger Foo
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Mile Šikić
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
| |
Collapse
|
15
|
Liu B, Shen CC, Xia SW, Song SS, Su LH, Li Y, Hao Q, Liu YJ, Guan DL, Wang N, Wang WJ, Zhao X, Li HX, Li XX, Lai YS. A nanopore-based cucumber genome assembly reveals structural variations at two QTLs controlling hypocotyl elongation. PLANT PHYSIOLOGY 2024; 195:970-985. [PMID: 38478469 DOI: 10.1093/plphys/kiae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 02/06/2024] [Indexed: 06/02/2024]
Abstract
The Xishuangbanna (XIS) cucumber (Cucumis sativus var. xishuangbannanesis) is a semiwild variety that has many distinct agronomic traits. Here, long reads generated by Nanopore sequencing technology helped assembling a high-quality genome (contig N50 = 8.7 Mb) of landrace XIS49. A total of 10,036 structural/sequence variations (SVs) were identified when comparing with Chinese Long (CL), and known SVs controlling spines, tubercles, and carpel number were confirmed in XIS49 genome. Two QTLs of hypocotyl elongation under low light, SH3.1 and SH6.1, were fine-mapped using introgression lines (donor parent, XIS49; recurrent parent, CL). SH3.1 encodes a red-light receptor Phytochrome B (PhyB, CsaV3_3G015190). A ∼4 kb region with large deletion and highly divergent regions (HDRs) were identified in the promoter of the PhyB gene in XIS49. Loss of function of this PhyB caused a super-long hypocotyl phenotype. SH6.1 encodes a CCCH-type zinc finger protein FRIGIDA-ESSENTIAL LIKE (FEL, CsaV3_6G050300). FEL negatively regulated hypocotyl elongation but it was transcriptionally suppressed by long terminal repeats retrotransposon insertion in CL cucumber. Mechanistically, FEL physically binds to the promoter of CONSTITUTIVE PHOTOMORPHOGENIC 1a (COP1a), regulating the expression of COP1a and the downstream hypocotyl elongation. These above results demonstrate the genetic mechanism of cucumber hypocotyl elongation under low light.
Collapse
Affiliation(s)
- Bin Liu
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
- Hami-melon Research Center, Xinjiang Academy of Agricultural Sciences, 830091 Urumqi, China
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Cheng-Cheng Shen
- College of Horticulture, Shanxi Agricultural University, 030801 Jinzhong, China
| | - Shi-Wei Xia
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Shan-Shan Song
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Li-Hong Su
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Yu Li
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Qian Hao
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Yan-Jun Liu
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Dai-Lu Guan
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Ning Wang
- College of Horticulture, Shanxi Agricultural University, 030801 Jinzhong, China
| | - Wen-Jiao Wang
- College of Horticulture, Shanxi Agricultural University, 030801 Jinzhong, China
| | - Xiang Zhao
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Huan-Xiu Li
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| | - Xi-Xiang Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, 100080 Beijing, China
| | - Yun-Song Lai
- College of Horticulture, Sichuan Agricultural University, 611130 Chengdu, China
| |
Collapse
|
16
|
Hu J, Wang Z, Liang F, Liu SL, Ye K, Wang DP. NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzad009. [PMID: 38862426 DOI: 10.1093/gpbjnl/qzad009] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/14/2023] [Accepted: 10/31/2023] [Indexed: 06/13/2024]
Abstract
The high-fidelity (HiFi) long-read sequencing technology developed by PacBio has greatly improved the base-level accuracy of genome assemblies. However, these assemblies still contain base-level errors, particularly within the error-prone regions of HiFi long reads. Existing genome polishing tools usually introduce overcorrections and haplotype switch errors when correcting errors in genomes assembled from HiFi long reads. Here, we describe an upgraded genome polishing tool - NextPolish2, which can fix base errors remaining in those "highly accurate" genomes assembled from HiFi long reads without introducing excessive overcorrections and haplotype switch errors. We believe that NextPolish2 has a great significance to further improve the accuracy of telomere-to-telomere (T2T) genomes. NextPolish2 is freely available at https://github.com/Nextomics/NextPolish2.
Collapse
Affiliation(s)
- Jiang Hu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- GrandOmics Biosciences, Beijing 102206, China
| | - Zhuo Wang
- GrandOmics Biosciences, Beijing 102206, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing 102206, China
| | - Shan-Lin Liu
- Department of Entomology, College of Plant Protection, China Agricultural University, Beijing 100193, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | | |
Collapse
|
17
|
Mascher M, Marone MP, Schreiber M, Stein N. Are cereal grasses a single genetic system? NATURE PLANTS 2024; 10:719-731. [PMID: 38605239 PMCID: PMC7616769 DOI: 10.1038/s41477-024-01674-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 03/17/2024] [Indexed: 04/13/2024]
Abstract
In 1993, a passionate and provocative call to arms urged cereal researchers to consider the taxon they study as a single genetic system and collaborate with each other. Since then, that group of scientists has seen their discipline blossom. In an attempt to understand what unity of genetic systems means and how the notion was borne out by later research, we survey the progress and prospects of cereal genomics: sequence assemblies, population-scale sequencing, resistance gene cloning and domestication genetics. Gene order may not be as extraordinarily well conserved in the grasses as once thought. Still, several recurring themes have emerged. The same ancestral molecular pathways defining plant architecture have been co-opted in the evolution of different cereal crops. Such genetic convergence as much as cross-fertilization of ideas between cereal geneticists has led to a rich harvest of genes that, it is hoped, will lead to improved varieties.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Marina Püpke Marone
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany
| | - Mona Schreiber
- University of Marburg, Department of Biology, Marburg, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany.
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| |
Collapse
|
18
|
Darian JC, Kundu R, Rajaby R, Sung WK. Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly. Nat Methods 2024; 21:574-583. [PMID: 38459383 DOI: 10.1038/s41592-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/30/2023] [Indexed: 03/10/2024]
Abstract
Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
Collapse
Affiliation(s)
| | - Ritu Kundu
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.
- Genome Institute of Singapore, Singapore, Singapore.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong, China.
| |
Collapse
|
19
|
Xie L, Gong X, Yang K, Huang Y, Zhang S, Shen L, Sun Y, Wu D, Ye C, Zhu QH, Fan L. Technology-enabled great leap in deciphering plant genomes. NATURE PLANTS 2024; 10:551-566. [PMID: 38509222 DOI: 10.1038/s41477-024-01655-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Plant genomes provide essential and vital basic resources for studying many aspects of plant biology and applications (for example, breeding). From 2000 to 2020, 1,144 genomes of 782 plant species were sequenced. In the past three years (2021-2023), 2,373 genomes of 1,031 plant species, including 793 newly sequenced species, have been assembled, representing a great leap. The 2,373 newly assembled genomes, of which 63 are telomere-to-telomere assemblies and 921 have been generated in pan-genome projects, cover the major phylogenetic clades. Substantial advances in read length, throughput, accuracy and cost-effectiveness have notably simplified the achievement of high-quality assemblies. Moreover, the development of multiple software tools using different algorithms offers the opportunity to generate more complete and complex assemblies. A database named N3: plants, genomes, technologies has been developed to accommodate the metadata associated with the 3,517 genomes that have been sequenced from 1,575 plant species since 2000. We also provide an outlook for emerging opportunities in plant genome sequencing.
Collapse
Affiliation(s)
- Lingjuan Xie
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Xiaojiao Gong
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Kun Yang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Yujie Huang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Shiyu Zhang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Leti Shen
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Yanqing Sun
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Dongya Wu
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Chuyu Ye
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Black Mountain Laboratories, Canberra, Australia
| | - Longjiang Fan
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China.
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China.
| |
Collapse
|
20
|
Le MH, Morgan B, Lu MY, Moctezuma V, Burgos O, Huang JP. The genomes of Hercules beetles reveal putative adaptive loci and distinct demographic histories in pristine North American forests. Mol Ecol Resour 2024; 24:e13908. [PMID: 38063363 DOI: 10.1111/1755-0998.13908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/14/2023] [Accepted: 11/20/2023] [Indexed: 01/12/2024]
Abstract
Beetles, despite their remarkable biodiversity and a long history of research, remain lacking in reference genomes annotated with structural variations in loci of adaptive significance. We sequenced and assembled high-quality chromosome-level genomes of four Hercules beetles which exhibit divergence in male horn size and shape and body colouration. The four Hercules beetle genomes were assembled to 11 pseudo-chromosomes, where the three genomes assembled using Nanopore data (Dynastes grantii, D. hyllus and D. tityus) were mapped to the genome assembled using PacBio + Hi-C data (D. maya). We demonstrated a striking similarity in genome structure among the four species. This conservative genome structure may be attributed to our use of the D. maya assembly as the reference; however, it is worth noting that such a conservative genome structure is a recurring phenomenon among scarab beetles. We further identified homologues of nine and three candidate-gene families that may be associated with the evolution of horn structure and body colouration respectively. Structural variations in Scr and Ebony2 were detected and discussed for their putative impacts on generating morphological diversity in beetles. We also reconstructed the demographic histories of the four Hercules beetles using heterozygosity information from the diploid genomes. We found that the demographic histories of the beetles closely recapitulated historical changes in suitable forest habitats driven by climate shifts.
Collapse
Affiliation(s)
- My-Hanh Le
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Brett Morgan
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
- Smithsonian Environmental Research Center, Edgewater, Maryland, USA
| | - Mei-Yeh Lu
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Victor Moctezuma
- Centro Tlaxcala de Biología de la Conducta, Universidad Autónoma de Tlaxcala, Tlaxcala de Xicohténcatl, Tlaxcala, Mexico
| | - Oscar Burgos
- Centro de Investigaciones Biológicas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Jen-Pan Huang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
21
|
Scalabrin S, Magris G, Liva M, Vitulo N, Vidotto M, Scaglione D, Del Terra L, Ruosi MR, Navarini L, Pellegrino G, Berny Mier Y Teran JC, Toniutti L, Suggi Liverani F, Cerutti M, Di Gaspero G, Morgante M. A chromosome-scale assembly reveals chromosomal aberrations and exchanges generating genetic diversity in Coffea arabica germplasm. Nat Commun 2024; 15:463. [PMID: 38263403 PMCID: PMC10805892 DOI: 10.1038/s41467-023-44449-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/13/2023] [Indexed: 01/25/2024] Open
Abstract
In order to better understand the mechanisms generating genetic diversity in the recent allotetraploid species Coffea arabica, here we present a chromosome-level assembly obtained with long read technology. Two genomic compartments with different structural and functional properties are identified in the two homoeologous genomes. The resequencing data from a large set of accessions reveals low intraspecific diversity in the center of origin of the species. Across a limited number of genomic regions, diversity increases in some cultivated genotypes to levels similar to those observed within one of the progenitor species, Coffea canephora, presumably as a consequence of introgressions deriving from the so-called Timor hybrid. It also reveals that, in addition to few, early-occurring exchanges between homoeologous chromosomes, there are numerous recent chromosomal aberrations including aneuploidies, deletions, duplications and exchanges. These events are still polymorphic in the germplasm and could represent a fundamental source of genetic variation in such a lowly variable species.
Collapse
Affiliation(s)
| | - Gabriele Magris
- Istituto di Genomica Applicata, 33100, Udine, Italy
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, 33100, Udine, Italy
| | - Mario Liva
- IGA Technology Services, 33100, Udine, Italy
- Istituto di Genomica Applicata, 33100, Udine, Italy
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, 33100, Udine, Italy
| | - Nicola Vitulo
- Department of Biotechnology, University of Verona, 37134, Verona, Italy
| | | | | | | | | | | | | | | | - Lucile Toniutti
- World Coffee Research, Portland, 97225, OR, USA
- CIRAD, UMR AGAP Institut, 97130, Capesterre-Belle-Eau, Guadeloupe, France
- UMR AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro, 34060, Montpellier, France
| | | | | | | | - Michele Morgante
- Istituto di Genomica Applicata, 33100, Udine, Italy.
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, 33100, Udine, Italy.
| |
Collapse
|
22
|
Goldberg JK, Olcerst A, McKibben M, Hare JD, Barker MS, Bronstein JL. A de novo long-read genome assembly of the sacred datura plant (Datura wrightii) reveals a role of tandem gene duplications in the evolution of herbivore-defense response. BMC Genomics 2024; 25:15. [PMID: 38166627 PMCID: PMC10759348 DOI: 10.1186/s12864-023-09894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024] Open
Abstract
The sacred datura plant (Solanales: Solanaceae: Datura wrightii) has been used to study plant-herbivore interactions for decades. The wealth of information that has resulted leads it to have potential as a model system for studying the ecological and evolutionary genomics of these interactions. We present a de novo Datura wrightii genome assembled using PacBio HiFi long-reads. Our assembly is highly complete and contiguous (N50 = 179Mb, BUSCO Complete = 97.6%). We successfully detected a previously documented ancient whole genome duplication using our assembly and have classified the gene duplication history that generated its coding sequence content. We use it as the basis for a genome-guided differential expression analysis to identify the induced responses of this plant to one of its specialized herbivores (Coleoptera: Chrysomelidae: Lema daturaphila). We find over 3000 differentially expressed genes associated with herbivory and that elevated expression levels of over 200 genes last for several days. We also combined our analyses to determine the role that different gene duplication categories have played in the evolution of Datura-herbivore interactions. We find that tandem duplications have expanded multiple functional groups of herbivore responsive genes with defensive functions, including UGT-glycosyltranserases, oxidoreductase enzymes, and peptidase inhibitors. Overall, our results expand our knowledge of herbivore-induced plant transcriptional responses and the evolutionary history of the underlying herbivore-response genes.
Collapse
Affiliation(s)
- Jay K Goldberg
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.
| | - Aaron Olcerst
- Department of Entomology, University of California Riverside, Riverside, CA, USA
| | - Michael McKibben
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - J Daniel Hare
- Department of Entomology, University of California Riverside, Riverside, CA, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Judith L Bronstein
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
23
|
Weng YM, Shashank PR, Godfrey RK, Plotkin D, Parker BM, Wist T, Kawahara AY. Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes. Gigascience 2024; 13:giad103. [PMID: 38165153 PMCID: PMC10759296 DOI: 10.1093/gigascience/giad103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 08/01/2023] [Accepted: 11/15/2023] [Indexed: 01/03/2024] Open
Abstract
BACKGROUND Understanding the genotype of pest species provides an important baseline for designing integrated pest management (IPM) strategies. Recently developed long-read sequence technologies make it possible to compare genomic features of nonmodel pest species to disclose the evolutionary path underlying the pest species profiles. Here we sequenced and assembled genomes for 3 agricultural pest gelechiid moths: Phthorimaea absoluta (tomato leafminer), Keiferia lycopersicella (tomato pinworm), and Scrobipalpa atriplicella (goosefoot groundling moth). We also compared genomes of tomato leafminer and tomato pinworm with published genomes of Phthorimaea operculella and Pectinophora gossypiella to investigate the gene family evolution related to the pest species profiles. RESULTS We found that the 3 solanaceous feeding species, P. absoluta, K. lycopersicella, and P. operculella, are clustered together. Gene family evolution analyses with the 4 species show clear gene family expansions on host plant-associated genes for the 3 solanaceous feeding species. These genes are involved in host compound sensing (e.g., gustatory receptors), detoxification (e.g., ABC transporter C family, cytochrome P450, glucose-methanol-choline oxidoreductase, insect cuticle proteins, and UDP-glucuronosyl), and digestion (e.g., serine proteases and peptidase family S1). A gene ontology enrichment analysis of rapid evolving genes also suggests enriched functions in host sensing and immunity. CONCLUSIONS Our results of family evolution analyses indicate that host plant adaptation and pathogen defense could be important drivers in species diversification among gelechiid moths.
Collapse
Affiliation(s)
- Yi-Ming Weng
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Pathour R Shashank
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
- Division of Entomology, ICAR-Indian Agricultural Research Institute, Pusa, New Delhi 110012, India
| | - R Keating Godfrey
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - David Plotkin
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Brandon M Parker
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Tyler Wist
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0×2, Canada
| | - Akito Y Kawahara
- McGuire Center for Lepidoptera & Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
24
|
Huang H, Zou H, Lin H, Dai Y, Lin J. Molecular insights into the mechanisms of a leaf color mutant in Anoectochilus roxburghii by gene mapping and transcriptome profiling based on PacBio Sequel II. Sci Rep 2023; 13:22751. [PMID: 38123722 PMCID: PMC10733416 DOI: 10.1038/s41598-023-50352-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 12/19/2023] [Indexed: 12/23/2023] Open
Abstract
Plants with partial or complete loss of chlorophylls and other pigments are frequently occurring in nature but not commonly found. In the present study, we characterize a leaf color mutant 'arly01' with an albino stripe in the middle of the leaf, which is an uncommon ornamental trait in Anoectochilus roxburghii. The albino "mutant" middle portion and green "normal" leaf parts were observed by transmission electron microscopy (TEM), and their pigment contents were determined. The mutant portion exhibited underdevelopment of plastids and had reduced chlorophyll and other pigment (carotenoid, anthocyanin, and flavonoid) content compared to the normal portion. Meanwhile, comparative transcript analysis and metabolic pathways mapping showed that a total of 599 differentially expressed genes were mapped to 78 KEGG pathways, most of which were down-regulated in the mutant portion. The five most affected metabolic pathways were determined to be oxidative phosphorylation, photosynthesis system, carbon fixation & starch and sucrose metabolism, porphyrin and chlorophyll metabolism, and flavonoid biosynthesis. Our findings suggested that the mutant 'arly01' was a partial albinism of A. roxburghii, characterized by the underdevelopment of chloroplasts, low contents of photosynthetic and other color pigments, and a number of down-regulated genes and metabolites. With the emergence of ornamental A. roxburghii in southern China, 'arly01' could become a popular cultivar due to its unique aesthetics.
Collapse
Affiliation(s)
- Huiming Huang
- Institute of Subtropical Agriculture, Fujian Academy of Agricultural Sciences, 1499 Jiulong Avenue, Zhangzhou, 363005, Fujian, China
| | - Hui Zou
- Institute of Subtropical Agriculture, Fujian Academy of Agricultural Sciences, 1499 Jiulong Avenue, Zhangzhou, 363005, Fujian, China
| | - Hongting Lin
- Zhangzhou Fourth Municipal Hospital of Fujian Province, 41 Baiyun Village, Zhangzhou, 363100, Fujian, China
| | - Yimin Dai
- Institute of Subtropical Agriculture, Fujian Academy of Agricultural Sciences, 1499 Jiulong Avenue, Zhangzhou, 363005, Fujian, China
| | - Jiangbo Lin
- Institute of Subtropical Agriculture, Fujian Academy of Agricultural Sciences, 1499 Jiulong Avenue, Zhangzhou, 363005, Fujian, China.
| |
Collapse
|
25
|
Yu SY, Xi YL, Xu FQ, Zhang J, Liu YS. Application of long read sequencing in rare diseases: The longer, the better? Eur J Med Genet 2023; 66:104871. [PMID: 38832911 DOI: 10.1016/j.ejmg.2023.104871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/11/2023] [Accepted: 10/22/2023] [Indexed: 06/06/2024]
Abstract
Rare diseases encompass a diverse group of genetic disorders that affect a small proportion of the population. Identifying the underlying genetic causes of these conditions presents significant challenges due to their genetic heterogeneity and complexity. Conventional short-read sequencing (SRS) techniques have been widely used in diagnosing and investigating of rare diseases, with limitations due to the nature of short-read lengths. In recent years, long read sequencing (LRS) technologies have emerged as a valuable tool in overcoming these limitations. This minireview provides a concise overview of the applications of LRS in rare disease research and diagnosis, including the identification of disease-causing tandem repeat expansions, structural variations, and comprehensive analysis of pathogenic variants with LRS.
Collapse
Affiliation(s)
- Si-Yan Yu
- Department of Pediatric Laboratory, Affiliated Children's Hospital of Jiangnan University (Wuxi Children's Hospital), Wuxi, Jiangsu, China; The First School of Clinical Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yu-Lin Xi
- Wuxi School of Medicine, Jiangnan University, Wuxi, Jiangsu, China
| | - Fu-Qiang Xu
- Department of Gynecology, Beijing Youan Hospital, Capital Medical University, Beijing, China
| | - Jian Zhang
- Department of Medical Laboratory, Affiliated Children's Hospital of Jiangnan University (Wuxi Children's Hospital), Wuxi, Jiangsu, China.
| | - Yan-Shan Liu
- Department of Pediatric Laboratory, Affiliated Children's Hospital of Jiangnan University (Wuxi Children's Hospital), Wuxi, Jiangsu, China; Wuxi School of Medicine, Jiangnan University, Wuxi, Jiangsu, China.
| |
Collapse
|
26
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito-Garagorri E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Break-induced replication underlies formation of inverted triplications and generates unexpected diversity in haplotype structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.02.560172. [PMID: 37873367 PMCID: PMC10592851 DOI: 10.1101/2023.10.02.560172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Background The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .
Collapse
|
27
|
Hao J, Wang X, Shi Y, Li L, Chu J, Li J, Lin W, Yu T, Hou D. Integrated omic profiling of the medicinal mushroom Inonotus obliquus under submerged conditions. BMC Genomics 2023; 24:554. [PMID: 37726686 PMCID: PMC10507853 DOI: 10.1186/s12864-023-09656-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 09/06/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND The Inonotus obliquus mushroom, a wondrous fungus boasting edible and medicinal qualities, has been widely used as a folk medicine and shown to have many potential pharmacological secondary metabolites. The purpose of this study was to supply a global landscape of genome-based integrated omic analysis of the fungus under lab-growth conditions. RESULTS This study presented a genome with high accuracy and completeness using the Pacbio Sequel II third-generation sequencing method. The de novo assembled fungal genome was 36.13 Mb, and contained 8352 predicted protein-coding genes, of which 365 carbohydrate-active enzyme (CAZyme)-coding genes and 19 biosynthetic gene clusters (BCGs) for secondary metabolites were identified. Comparative transcriptomic and proteomic analysis revealed a global view of differential metabolic change between seed and fermentation culture, and demonstrated positive correlations between transcription and expression levels of 157 differentially expressed genes involved in the metabolism of amino acids, fatty acids, secondary metabolites, antioxidant and immune responses. Facilitated by the widely targeted metabolomic approach, a total of 307 secondary substances were identified and quantified, with a significant increase in the production of antioxidant polyphenols. CONCLUSION This study provided the comprehensive analysis of the fungus Inonotus obliquus, and supplied fundamental information for further screening of promising target metabolites and exploring the link between the genome and metabolites.
Collapse
Affiliation(s)
- Jinghua Hao
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China
| | - Xiaoli Wang
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China
| | - Yanhua Shi
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China
| | - Lingjun Li
- School of Modern Agriculture and Environment, Weifang Institute of Technology, Weifang, 261053, China
| | - Jinxin Chu
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China
| | - Junjie Li
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China
| | - Weiping Lin
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China.
| | - Tao Yu
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China.
| | - Dianhai Hou
- School of Bioscience and Technology, Weifang Medical University, Weifang, 261053, China.
| |
Collapse
|
28
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
29
|
Pinto BJ, Gamble T, Smith CH, Wilson MA. A lizard is never late: Squamate genomics as a recent catalyst for understanding sex chromosome and microchromosome evolution. J Hered 2023; 114:445-458. [PMID: 37018459 PMCID: PMC10445521 DOI: 10.1093/jhered/esad023] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/03/2023] [Indexed: 04/07/2023] Open
Abstract
In 2011, the first high-quality genome assembly of a squamate reptile (lizard or snake) was published for the green anole. Dozens of genome assemblies were subsequently published over the next decade, yet these assemblies were largely inadequate for answering fundamental questions regarding genome evolution in squamates due to their lack of contiguity or annotation. As the "genomics age" was beginning to hit its stride in many organismal study systems, progress in squamates was largely stagnant following the publication of the green anole genome. In fact, zero high-quality (chromosome-level) squamate genomes were published between the years 2012 and 2017. However, since 2018, an exponential increase in high-quality genome assemblies has materialized with 24 additional high-quality genomes published for species across the squamate tree of life. As the field of squamate genomics is rapidly evolving, we provide a systematic review from an evolutionary genomics perspective. We collated a near-complete list of publicly available squamate genome assemblies from more than half-a-dozen international and third-party repositories and systematically evaluated them with regard to their overall quality, phylogenetic breadth, and usefulness for continuing to provide accurate and efficient insights into genome evolution across squamate reptiles. This review both highlights and catalogs the currently available genomic resources in squamates and their ability to address broader questions in vertebrates, specifically sex chromosome and microchromosome evolution, while addressing why squamates may have received less historical focus and has caused their progress in genomics to lag behind peer taxa.
Collapse
Affiliation(s)
- Brendan J Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, United States
| | - Tony Gamble
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, United States
- Department of Biological Sciences, Marquette University, Milwaukee, WI, United States
- Bell Museum of Natural History, University of Minnesota, St Paul, MN, United States
| | - Chase H Smith
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, United States
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ, United States
| |
Collapse
|
30
|
Duan Y, Li Y, Zhang J, Song Y, Jiang Y, Tong X, Bi Y, Wang S, Wang S. Genome Survey and Chromosome-Level Draft Genome Assembly of Glycine max var. Dongfudou 3: Insights into Genome Characteristics and Protein Deficiencies. PLANTS (BASEL, SWITZERLAND) 2023; 12:2994. [PMID: 37631204 PMCID: PMC10459189 DOI: 10.3390/plants12162994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/11/2023] [Accepted: 08/17/2023] [Indexed: 08/27/2023]
Abstract
Dongfudou 3 is a highly sought-after soybean variety due to its lack of beany flavor. To support molecular breeding efforts, we conducted a genomic survey using next-generation sequencing. We determined the genome size, complexity, and characteristics of Dongfudou 3. Furthermore, we constructed a chromosome-level draft genome and speculated on the molecular basis of protein deficiency in GmLOX1, GmLOX2, and GmLOX3. These findings set the stage for high-quality genome analysis using third-generation sequencing. The estimated genome size is approximately 1.07 Gb, with repetitive sequences accounting for 72.50%. The genome is homozygous and devoid of microbial contamination. The draft genome consists of 916.00 Mb anchored onto 20 chromosomes, with annotations of 46,446 genes and 77,391 transcripts, achieving Benchmarking Single-Copy Orthologue (BUSCO) completeness of 99.5% for genome completeness and 99.1% for annotation. Deletions and substitutions were identified in the three GmLox genes, and they also lack corresponding active proteins. Our proposed approach, involving k-mer analysis after filtering out organellar DNA sequences, is applicable to genome surveys of all plant species, allowing for accurate assessments of size and complexity. Moreover, the process of constructing chromosome-level draft genomes using closely related reference genomes offers cost-effective access to valuable information, maximizing data utilization.
Collapse
Affiliation(s)
- Yajuan Duan
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Yue Li
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Jing Zhang
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Yongze Song
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Yan Jiang
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Xiaohong Tong
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Yingdong Bi
- Institute of Crop Cultivation and Tillage, Heilongjiang Academy of Agricultural Sciences, Harbin 150028, China
| | - Shaodong Wang
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| | - Sui Wang
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, 600 Changjiang Road, Harbin 150030, China; (Y.D.); (Y.L.)
| |
Collapse
|
31
|
Alejo-Jacuinde G, Nájera-González HR, Chávez Montes RA, Gutierrez Reyes CD, Barragán-Rosillo AC, Perez Sanchez B, Mechref Y, López-Arredondo D, Yong-Villalobos L, Herrera-Estrella L. Multi-omic analyses reveal the unique properties of chia (Salvia hispanica) seed metabolism. Commun Biol 2023; 6:820. [PMID: 37550387 PMCID: PMC10406817 DOI: 10.1038/s42003-023-05192-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 07/28/2023] [Indexed: 08/09/2023] Open
Abstract
Chia (Salvia hispanica) is an emerging crop considered a functional food containing important substances with multiple potential applications. However, the molecular basis of some relevant chia traits, such as seed mucilage and polyphenol content, remains to be discovered. This study generates an improved chromosome-level reference of the chia genome, resolving some highly repetitive regions, describing methylation patterns, and refining genome annotation. Transcriptomic analysis shows that seeds exhibit a unique expression pattern compared to other organs and tissues. Thus, a metabolic and proteomic approach is implemented to study seed composition and seed-produced mucilage. The chia genome exhibits a significant expansion in mucilage synthesis genes (compared to Arabidopsis), and gene network analysis reveals potential regulators controlling seed mucilage production. Rosmarinic acid, a compound with enormous therapeutic potential, was classified as the most abundant polyphenol in seeds, and candidate genes for its complex pathway are described. Overall, this study provides important insights into the molecular basis for the unique characteristics of chia seeds.
Collapse
Affiliation(s)
- Gerardo Alejo-Jacuinde
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Héctor-Rogelio Nájera-González
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Ricardo A Chávez Montes
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | | | - Alfonso Carlos Barragán-Rosillo
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Benjamin Perez Sanchez
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Yehia Mechref
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX, 79409, USA
| | - Damar López-Arredondo
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA
| | - Lenin Yong-Villalobos
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA.
| | - Luis Herrera-Estrella
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX, 79409, USA.
- Unidad de Genómica Avanzada/Langebio, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Gto., 36821, Mexico.
| |
Collapse
|
32
|
Patnaik HH, Sang MK, Park JE, Song DK, Jeong JY, Hong CE, Kim YT, Shin HJ, Ziwei L, Hwang HJ, Park SY, Kang SW, Ko JH, Lee JS, Park HS, Jo YH, Han YS, Patnaik BB, Lee YS. A review of the endangered mollusks transcriptome under the threatened species initiative of Korea. Genes Genomics 2023; 45:969-987. [PMID: 37405596 DOI: 10.1007/s13258-023-01389-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 04/09/2023] [Indexed: 07/06/2023]
Abstract
Transcriptome studies for conservation of endangered mollusks is a proactive approach towards managing threats and uncertainties facing these species in natural environments. The population of these species is declining due to habitat destruction, illicit wildlife trade, and global climate change. These activities risk the free movement of species across the wild landscape, loss of breeding grounds, and restrictions in displaying the physiological attributes so crucial for faunal welfare. Gastropods face the most negative ecological effects and have been enlisted under Korea's protective species consortium based on their population dynamics in the last few years. Moreover, with the genetic resources restricted for such species, conservation by informed planning is not possible. This review provides insights into the activities under the threatened species initiative of Korea with special reference to the transcriptome assemblies of endangered mollusks. The gastropods such as Ellobium chinense, Aegista chejuensis, Aegista quelpartensis, Incilaria fruhstorferi, Koreanohadra kurodana, Satsuma myomphala, and Clithon retropictus have been represented. Moreover, the transcriptome summary of bivalve Cristaria plicata and Caenogastropoda Charonia lampas sauliae is also discussed. Sequencing, de novo assembly, and annotation identified transcripts or homologs for the species and, based on an understanding of the biochemical and molecular pathways, were ascribed to predictive gene function. Mining for simple sequence repeats from the transcriptome have successfully assisted genetic polymorphism studies. A comparison of the transcriptome scheme of Korean endangered mollusks with the genomic resources of other endangered mollusks have been discussed with homologies and analogies for dictating future research.
Collapse
Affiliation(s)
- Hongray Howrelia Patnaik
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
| | - Min Kyu Sang
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Research Support Center for Bio-Bigdata Analysis and Utilization of Biological Resources, Soonchunhyang University, Asan, Chungnam, South Korea
| | - Jie Eun Park
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Research Support Center for Bio-Bigdata Analysis and Utilization of Biological Resources, Soonchunhyang University, Asan, Chungnam, South Korea
| | - Dae Kwon Song
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Research Support Center for Bio-Bigdata Analysis and Utilization of Biological Resources, Soonchunhyang University, Asan, Chungnam, South Korea
| | - Jun Yang Jeong
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Chan Eui Hong
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Yong Tae Kim
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Hyeon Jun Shin
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Liu Ziwei
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Hee Ju Hwang
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - So Young Park
- Biodiversity Research Team, Animal & Plant Research Department, Nakdonggang National Institute of Biological Resources, Sangju, Gyeongbuk, 37242, South Korea
| | - Se Won Kang
- Biological Resource Center (BRC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Jeongeup, Jeonbuk, 56212, South Korea
| | - Jung Ho Ko
- Police Science Institute, Korean National Police University, Asan, Chungnam, 31539, South Korea
| | - Jun Sang Lee
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
| | - Hong Seog Park
- Research Institute, GnC BIO Co., LTD., 621-6 Banseok-dong, Yuseong-gu, Daejeon, 34069, South Korea
| | - Yong Hun Jo
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea
| | - Yeon Soo Han
- College of Agriculture and Life Science, Chonnam National University, 77 Yongbong-ro, Buk-gu, Gwangju, 61186, South Korea
| | - Bharat Bhusan Patnaik
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea
- P.G Department of Biosciences and Biotechnology, Fakir Mohan University, Odisha, 756089, Nuapadhi, Balasore, India
| | - Yong Seok Lee
- Korea Native Animal Resources Utilization Convergence Research Institute (KNAR), Soonchunhyang University, Asan, Chungnam, South Korea.
- Research Support Center for Bio-Bigdata Analysis and Utilization of Biological Resources, Soonchunhyang University, Asan, Chungnam, South Korea.
- Department of Biology, College of Natural Sciences, Soonchunhyang University, Asan,, Chungnam, 31538, South Korea.
| |
Collapse
|
33
|
Zhao X, Yi L, Zuo Y, Gao F, Cheng Y, Zhang H, Zhou Y, Jia X, Su S, Zhang D, Zhang X, Ren Y, Mu Y, Jin X, Li Q, Bateer S, Lu Z. High-Quality Genome Assembly and Genome-Wide Association Study of Male Sterility Provide Resources for Flax Improvement. PLANTS (BASEL, SWITZERLAND) 2023; 12:2773. [PMID: 37570928 PMCID: PMC10421198 DOI: 10.3390/plants12152773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/21/2023] [Accepted: 07/23/2023] [Indexed: 08/13/2023]
Abstract
Flax is an economic crop with a long history. It is grown worldwide and is mainly used for edible oil, industry, and textiles. Here, we reported a high-quality genome assembly for "Neiya No. 9", a popular variety widely grown in China. Combining PacBio long reads, Hi-C sequencing, and a genetic map reported previously, a genome assembly of 473.55 Mb was constructed, which covers ~94.7% of the flax genome. These sequences were anchored onto 15 chromosomes. The N50 lengths of the contig and scaffold were 0.91 Mb and 31.72 Mb, respectively. A total of 32,786 protein-coding genes were annotated, and 95.9% of complete BUSCOs were found. Through morphological and cytological observation, the male sterility of flax was considered dominant nuclear sterility. Through GWAS analysis, the gene LUSG00017705 (cysteine synthase gene) was found to be closest to the most significant SNP, and the expression level of this gene was significantly lower in male sterile plants than in fertile plants. Among the significant SNPs identified in the GWAS analysis, only two were located in the coding region, and these two SNPs caused changes in the protein encoded by LUSG00017565 (cysteine protease gene). It was speculated that these two genes may be related to male sterility in flax. This is the first time the molecular mechanism of male sterility in flax has been reported. The high-quality genome assembly and the male sterility genes revealed, provided a solid foundation for flax breeding.
Collapse
Affiliation(s)
- Xiaoqing Zhao
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Liuxi Yi
- Agricultural College, Inner Mongolia Agricultural University, Hohhot 010019, China
| | - Yongchun Zuo
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Hohhot 010019, China
| | - Fengyun Gao
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Yuchen Cheng
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control, Hohhot 010031, China
- Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010031, China
| | - Hui Zhang
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Yu Zhou
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Xiaoyun Jia
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Shaofeng Su
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Dejian Zhang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control, Hohhot 010031, China
- Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010031, China
| | - Xiangqian Zhang
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control, Hohhot 010031, China
- Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010031, China
| | - Yongfeng Ren
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control, Hohhot 010031, China
- Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010031, China
| | - Yanxin Mu
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Xiaolei Jin
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Qiang Li
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
| | - Siqin Bateer
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Zhanyuan Lu
- Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Inner Mongolia Key Laboratory of Degradation Farmland Ecological Restoration and Pollution Control, Hohhot 010031, China
- Inner Mongolia Conservation Tillage Engineering Technology Research Center, Hohhot 010031, China
| |
Collapse
|
34
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
35
|
Esteller-Cucala P, Palmada-Flores M, Kuderna LFK, Fontsere C, Serres-Armero A, Dabad M, Torralvo M, Faella A, Ferrández-Peral L, Llovera L, Fornas O, Julià E, Ramírez E, González I, Hecht J, Lizano E, Juan D, Marquès-Bonet T. Y chromosome sequence and epigenomic reconstruction across human populations. Commun Biol 2023; 6:623. [PMID: 37296226 PMCID: PMC10256797 DOI: 10.1038/s42003-023-05004-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/31/2023] [Indexed: 06/12/2023] Open
Abstract
Recent advances in long-read sequencing technologies have allowed the generation and curation of more complete genome assemblies, enabling the analysis of traditionally neglected chromosomes, such as the human Y chromosome (chrY). Native DNA was sequenced on a MinION Oxford Nanopore Technologies sequencing device to generate genome assemblies for seven major chrY human haplogroups. We analyzed and compared the chrY enrichment of sequencing data obtained using two different selective sequencing approaches: adaptive sampling and flow cytometry chromosome sorting. We show that adaptive sampling can produce data to create assemblies comparable to chromosome sorting while being a less expensive and time-consuming technique. We also assessed haplogroup-specific structural variants, which would be otherwise difficult to study using short-read sequencing data only. Finally, we took advantage of this technology to detect and profile epigenetic modifications among the considered haplogroups. Altogether, we provide a framework to study complex genomic regions with a simple, fast, and affordable methodology that could be applied to larger population genomics datasets.
Collapse
Affiliation(s)
- Paula Esteller-Cucala
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
| | - Marc Palmada-Flores
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Lukas F K Kuderna
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Claudia Fontsere
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Aitor Serres-Armero
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain
| | - María Torralvo
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Armida Faella
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Luis Ferrández-Peral
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Laia Llovera
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Oscar Fornas
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain
| | - Eva Julià
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Erika Ramírez
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Irene González
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Esther Lizano
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Tomàs Marquès-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain.
| |
Collapse
|
36
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
37
|
Pinto BJ, Gamble T, Smith CH, Wilson MA. A lizard is never late: squamate genomics as a recent catalyst for understanding sex chromosome and microchromosome evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.20.524006. [PMID: 37034614 PMCID: PMC10081179 DOI: 10.1101/2023.01.20.524006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In 2011, the first high-quality genome assembly of a squamate reptile (lizard or snake) was published for the green anole. Dozens of genome assemblies were subsequently published over the next decade, yet these assemblies were largely inadequate for answering fundamental questions regarding genome evolution in squamates due to their lack of contiguity or annotation. As the "genomics age" was beginning to hit its stride in many organismal study systems, progress in squamates was largely stagnant following the publication of the green anole genome. In fact, zero high-quality (chromosome-level) squamate genomes were published between the years 2012-2017. However, since 2018, an exponential increase in high-quality genome assemblies has materialized with 24 additional high-quality genomes published for species across the squamate tree of life. As the field of squamate genomics is rapidly evolving, we provide a systematic review from an evolutionary genomics perspective. We collated a near-complete list of publicly available squamate genome assemblies from more than half-a-dozen international and third-party repositories and systematically evaluated them with regard to their overall quality, phylogenetic breadth, and usefulness for continuing to provide accurate and efficient insights into genome evolution across squamate reptiles. This review both highlights and catalogs the currently available genomic resources in squamates and their ability to address broader questions in vertebrates, specifically sex chromosome and microchromosome evolution, while addressing why squamates may have received less historical focus and has caused their progress in genomics to lag behind peer taxa.
Collapse
Affiliation(s)
- Brendan J Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ USA
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI USA
| | - Tony Gamble
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI USA
- Department of Biological Sciences, Marquette University, Milwaukee WI USA
- Bell Museum of Natural History, University of Minnesota, St Paul, MN USA
| | - Chase H Smith
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ USA
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ USA
| |
Collapse
|
38
|
Lee H, Kim J, Lee J. Benchmarking datasets for assembly-based variant calling using high-fidelity long reads. BMC Genomics 2023; 24:148. [PMID: 36973656 PMCID: PMC10045170 DOI: 10.1186/s12864-023-09255-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 03/17/2023] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. RESULTS We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. CONCLUSIONS By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level.
Collapse
Affiliation(s)
- Hyunji Lee
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 08826 Korea
- Department of Biological Sciences, Seoul National University, Seoul, 08826 Korea
| | - Jun Kim
- Department of Biological Sciences, Seoul National University, Seoul, 08826 Korea
- Research Institute of Basic Sciences, Seoul National University, Seoul, 08826 Korea
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134 Korea
| | - Junho Lee
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 08826 Korea
- Department of Biological Sciences, Seoul National University, Seoul, 08826 Korea
- Research Institute of Basic Sciences, Seoul National University, Seoul, 08826 Korea
| |
Collapse
|
39
|
Hotaling S, Wilcox ER, Heckenhauer J, Stewart RJ, Frandsen PB. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics 2023; 24:117. [PMID: 36927511 PMCID: PMC10018877 DOI: 10.1186/s12864-023-09193-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.
Collapse
Affiliation(s)
- Scott Hotaling
- Department of Watershed Sciences, Utah State University, Logan, UT, USA.
| | - Edward R Wilcox
- DNA Sequencing Center, Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325, Frankfurt, Germany
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA.
- Data Science Lab, Smithsonian Institution, Washington, DC, USA.
| |
Collapse
|
40
|
Dunn T, Blaauw D, Das R, Narayanasamy S. nPoRe: n-polymer realigner for improved pileup-based variant calling. BMC Bioinformatics 2023; 24:98. [PMID: 36927439 PMCID: PMC10022090 DOI: 10.1186/s12859-023-05193-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/19/2023] [Indexed: 03/18/2023] Open
Abstract
Despite recent improvements in nanopore basecalling accuracy, germline variant calling of small insertions and deletions (INDELs) remains poor. Although precision and recall for single nucleotide polymorphisms (SNPs) now exceeds 99.5%, INDEL recall remains below 80% for standard R9.4.1 flow cells. We show that read phasing and realignment can recover a significant portion of false negative INDELs. In particular, we extend Needleman-Wunsch affine gap alignment by introducing new gap penalties for more accurately aligning repeated n-polymer sequences such as homopolymers ([Formula: see text]) and tandem repeats ([Formula: see text]). At the same precision, haplotype phasing improves INDEL recall from 63.76 to [Formula: see text] and nPoRe realignment improves it further to [Formula: see text].
Collapse
Affiliation(s)
- Tim Dunn
- University of Michigan, Ann Arbor, USA
| | | | | | | |
Collapse
|
41
|
Bruijnesteijn J. HLA/MHC and KIR characterization in humans and non-human primates using Oxford Nanopore Technologies and Pacific Biosciences sequencing platforms. HLA 2023; 101:205-221. [PMID: 36583332 DOI: 10.1111/tan.14957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022]
Abstract
The gene products of the HLA/MHC and KIR multigene families are important modulators of the immune system and are associated with health and disease. Characterization of the genes encoding these receptors has been integrated into different biomedical applications, including transplantation and reproduction biology, immune therapies and in fundamental research into disease susceptibility or resistance. Conventional short-read sequencing strategies have shown their value in high throughput typing, but are insufficient to uncover the entire complexity of the highly polymorphic HLA/MHC and KIR gene systems. The implementation of single-molecule and real-time sequencing platforms, offered by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), revolutionized the fields of genomics and transcriptomics. Using fundamentally distinct principles, these platforms generate long-read data that can unwire the plasticity of the HLA/MHC and KIR genes, including high-resolution characterization of genes, alleles, phased haplotypes, transcription levels and epigenetics modification patterns. These insights might have profound clinical relevance, such as improved matching of donors and patients in clinical transplantation, but could also lift disease association studies to a higher level. Even more, a comprehensive characterization may refine animal models in preclinical studies. In this review, the different HLA/MHC and KIR characterization approaches using PacBio and ONT platforms are described and discussed.
Collapse
Affiliation(s)
- Jesse Bruijnesteijn
- Department of Comparative Genetics and Refinement, Biomedical Primate Research Centre, Rijswijk, The Netherlands
| |
Collapse
|
42
|
Improved Assembly of Metagenome-Assembled Genomes and Viruses in Tibetan Saline Lake Sediment by HiFi Metagenomic Sequencing. Microbiol Spectr 2023; 11:e0332822. [PMID: 36475839 PMCID: PMC9927493 DOI: 10.1128/spectrum.03328-22] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
With the development and reduced costs of high-throughput sequencing technology, environmental dark matter, such as novel metagenome-assembled genomes (MAGs) and viruses, is now being discovered easily. However, due to read length limitations, MAGs and viromes often suffer from genome discontinuity and deficiencies in key functional elements. Here, by applying long-read sequencing technology to sediment samples from a Tibetan saline lake, we comprehensively analyzed the performance of high-fidelity (HiFi) reads and the possibility of integration with short-read next-generation sequencing (NGS) data. In total, 207 full-length nonredundant 16S rRNA gene sequences and 19 full-length nonredundant 18S rRNA genes were directly obtained from HiFi reads, which greatly surpassed the retrieval performance of NGS technology. We carried out a cross-sectional comparison among multiple assembly strategies, referred to as 'NGS', 'Hybrid (NGS+HiFi)', and 'HiFi'. Two MAGs and 29 viruses with circular genomes were reconstructed using HiFi reads alone, indicating the great power of the 'HiFi' approach to assemble high-quality microbial genomes. Among the 3 strategies, the 'Hybrid' approach produced the highest number of medium/high-quality MAGs and viral genomes, while the ratio of MAGs containing 16S rRNA genes was significantly improved in the 'HiFi' assembly results. Overall, our study provides a practical metagenomic resolution for analyzing complex environmental samples by taking advantage of both the short-read and HiFi long-read sequencing methods to extract the maximum amount of information, including data on prokaryotes, eukaryotes, and viruses, via the 'Hybrid' approach. IMPORTANCE To expand the understanding of microbial dark matter in the environment, we did the first comparative evaluation of multiple assembly strategies based on high-throughput short-read and HiFi data from lake sediments metagenomic sequencing. The results demonstrated great improvement of the 'Hybrid' assembly method (short-read next-generation sequencing data plus HiFi data) in the recovery of medium/high-quality MAGs and viral genomes. Further analysis showed that HiFi data is important to retrieve the complete circular prokaryotic and viral genomes. Meanwhile, hundreds of full-length 16S/18S rRNA genes were assembled directly from HiFi data, which facilitated the species composition studies of complex environmental samples, especially for understanding micro-eukaryotes. Therefore, the application of the latest HiFi long-read sequencing could greatly improve the metagenomic assembly integrity and promote environmental microbiome research.
Collapse
|
43
|
Cai ZF, Hu JY, Yin TT, Wang D, Shen QK, Ma C, Ou DQ, Xu MM, Shi X, Li QL, Wu RN, Ajuma L, Adeola AC, Zhang YP, Peng MS. Long amplicon HiFi sequencing for mitochondrial DNA genomes. Mol Ecol Resour 2023. [PMID: 36756726 DOI: 10.1111/1755-0998.13765] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 01/11/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023]
Abstract
Long-read sequencing technology is a powerful approach with application in various genetic and genomic research. Herein, we developed the pipeline for long amplicon high-fidelity (HiFi) sequencing and then applied it for sequencing mitochondrial DNA (mtDNA) genomes from pools of 79 Tibetan Mastiffs. We amplified the mtDNA genome with long-range PCR using two pairs of primers. Two rounds of circular consensus sequencing (CCS) were conducted and their accuracy was evaluated. The results indicate that the second round of CCS can improve the accuracy of HiFi reads. In addition, the analysis of 79 high-quality mtDNA genomes shows the Tibetan Mastiffs from outside of the Tibetan Plateau experienced hybridization with other dogs. The high quality reads generator (HQGR) software is provided to facilitate data analyses, which is publicly accessible on GitHub (https://github.com/Caizf-script/HQGR). Our long amplicon HiFi sequencing pipeline can also be applied in various target enrichment strategies for small genomes and candidate genes.
Collapse
Affiliation(s)
- Zheng-Fei Cai
- State Key Laboratory for Conservation and Utilization of Bio-resources in Yunnan, Yunnan University, Kunming, China.,State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Ji-Yuan Hu
- School of Software, Yunnan University, Kunming, China
| | - Ting-Ting Yin
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Da Wang
- School of Software, Yunnan University, Kunming, China
| | - Quan-Kuan Shen
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Cheng Ma
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Ding-Qin Ou
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, China
| | - Ming-Min Xu
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xian Shi
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Qing-Long Li
- State Key Laboratory for Conservation and Utilization of Bio-resources in Yunnan, Yunnan University, Kunming, China.,State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Ru-Nian Wu
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Lameck Ajuma
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Adeniyi C Adeola
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Ya-Ping Zhang
- State Key Laboratory for Conservation and Utilization of Bio-resources in Yunnan, Yunnan University, Kunming, China.,State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Min-Sheng Peng
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
44
|
Gladman N, Goodwin S, Chougule K, Richard McCombie W, Ware D. Era of gapless plant genomes: innovations in sequencing and mapping technologies revolutionize genomics and breeding. Curr Opin Biotechnol 2023; 79:102886. [PMID: 36640454 PMCID: PMC9899316 DOI: 10.1016/j.copbio.2022.102886] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 12/03/2022] [Accepted: 12/13/2022] [Indexed: 01/15/2023]
Abstract
Whole-genome sequencing and assembly have revolutionized plant genetics and molecular biology over the last two decades. However, significant shortcomings in first- and second-generation technology resulted in imperfect reference genomes: numerous and large gaps of low quality or undeterminable sequence in areas of highly repetitive DNA along with limited chromosomal phasing restricted the ability of researchers to characterize regulatory noncoding elements and genic regions that underwent recent duplication events. Recently, advances in long-read sequencing have resulted in the first gapless, telomere-to-telomere (T2T) assemblies of plant genomes. This leap forward has the potential to increase the speed and confidence of genomics and molecular experimentation while reducing costs for the research community.
Collapse
Affiliation(s)
- Nicholas Gladman
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, 538 Tower Rd, Ithaca, NY 14853, USA; Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724 , USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724 , USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724 , USA
| | | | - Doreen Ware
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, 538 Tower Rd, Ithaca, NY 14853, USA; Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724 , USA.
| |
Collapse
|
45
|
Liang C, Wagstaff J, Aharony N, Schmit V, Manheim D. Managing the Transition to Widespread Metagenomic Monitoring: Policy Considerations for Future Biosurveillance. Health Secur 2023; 21:34-45. [PMID: 36629860 PMCID: PMC9940815 DOI: 10.1089/hs.2022.0029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The technological possibilities and future public health importance of metagenomic sequencing have received extensive attention, but there has been little discussion about the policy and regulatory issues that need to be addressed if metagenomic sequencing is adopted as a key technology for biosurveillance. In this article, we introduce metagenomic monitoring as a possible path to eventually replacing current infectious disease monitoring models. Many key enablers are technological, whereas others are not. We therefore highlight key policy challenges and implementation questions that need to be addressed for "widespread metagenomic monitoring" to be possible. Policymakers must address pitfalls like fragmentation of the technological base, private capture of benefits, privacy concerns, the usefulness of the system during nonpandemic times, and how the future systems will enable better response. If these challenges are addressed, the technological and public health promise of metagenomic sequencing can be realized.
Collapse
Affiliation(s)
- Chelsea Liang
- Chelsea Liang is an Independent Researcher, University of New South Wales, School of Biotechnology and Biomolecular Sciences, Sydney, Australia
| | - James Wagstaff
- James Wagstaff, PhD, is a Research Fellow, Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Noga Aharony
- Noga Aharony, MS, is a PhD Student, Department of Systems Biology, Columbia University, New York, NY
| | - Virginia Schmit
- Virginia Schmit, PhD, is Director of Research, 1DatSooner, DE, and a Policy Specialist, National Institute of Allergy and Infectious Diseases, Bethesda, MD
| | - David Manheim
- David Manheim, PhD, is Head of Policy and Research, ALTER, Rehovot, Israel; Lead Researcher, 1DaySooner, Claymont, DE,Visiting Researcher, Humanities and Arts Department, Technion – Israel Institute of Technology, Haifa, Israel.,Address correspondence to: David B. Manheim, 8734 First Avenue, Silver Spring, MD 20910
| |
Collapse
|
46
|
Ferguson S, McLay T, Andrew RL, Bruhl JJ, Schwessinger B, Borevitz J, Jones A. Species-specific basecallers improve actual accuracy of nanopore sequencing in plants. PLANT METHODS 2022; 18:137. [PMID: 36517904 PMCID: PMC9749173 DOI: 10.1186/s13007-022-00971-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT's improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. RESULTS Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy's plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). CONCLUSIONS The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.
Collapse
Affiliation(s)
- Scott Ferguson
- Research School of Biology, Australian National University, Canberra, ACT, Australia.
| | - Todd McLay
- National Herbarium of Victoria, Royal Botanic Gardens Victoria, South Yarra, Victoria, 3004, Australia
- School of Biosciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Rose L Andrew
- Botany & N.C.W. Beadle Herbarium, School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Jeremy J Bruhl
- Botany & N.C.W. Beadle Herbarium, School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia
| | - Benjamin Schwessinger
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Justin Borevitz
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Ashley Jones
- Research School of Biology, Australian National University, Canberra, ACT, Australia.
| |
Collapse
|
47
|
Rabanal FA, Gräff M, Lanz C, Fritschi K, Llaca V, Lang M, Carbonell-Bejerano P, Henderson I, Weigel D. Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes. Nucleic Acids Res 2022; 50:12309-12327. [PMID: 36453992 PMCID: PMC9757041 DOI: 10.1093/nar/gkac1115] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/13/2022] [Accepted: 11/10/2022] [Indexed: 12/05/2022] Open
Abstract
Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.
Collapse
Affiliation(s)
- Fernando A Rabanal
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Maike Gräff
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Christa Lanz
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Katrin Fritschi
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Victor Llaca
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Michelle Lang
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Pablo Carbonell-Bejerano
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ian Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
48
|
Hiltunen M, Ament-Velásquez SL, Ryberg M, Johannesson H. Stage-specific transposon activity in the life cycle of the fairy-ring mushroom Marasmius oreades. Proc Natl Acad Sci U S A 2022; 119:e2208575119. [PMID: 36343254 PMCID: PMC9674265 DOI: 10.1073/pnas.2208575119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 10/02/2022] [Indexed: 11/09/2022] Open
Abstract
Genetic variability can be generated by different mechanisms, and across the life cycle. Many basidiomycete fungi have an extended somatic stage, during which each cell carries two genetically distinct haploid nuclei (dikaryosis), resulting from fusion of two compatible monokaryotic individuals. Recent findings have revealed remarkable genome stability at the nucleotide level during dikaryotic growth in these organisms, but whether this pattern extends to mutations affecting large genomic regions remains unknown. Furthermore, despite high genome integrity during dikaryosis, basidiomycete populations are not devoid of genetic diversity, begging the question of when this diversity is introduced. Here, we used a Marasmius oreades fairy ring to investigate the rise of large-scale variants during mono- and dikaryosis. By separating the two nuclear genotypes from four fruiting bodies and generating complete genome assemblies, we gained access to investigate genomic changes of any size. We found that during dikaryotic growth in nature the genome stayed intact, but after separating the nucleotypes into monokaryons, a considerable amount of structural variation started to accumulate, driven to large extent by transposons. Transposon insertions were also found in monokaryotic single-meiospore isolates. Hence, we show that genome integrity in basidiomycetes can be interrupted during monokaryosis, leading to genomic rearrangements and increased activity of transposable elements. We suggest that genetic diversification is disproportionate between life cycle stages in mushroom-forming fungi, so that the short-lived monokaryotic growth stage is more prone to genetic changes than the dikaryotic stage.
Collapse
Affiliation(s)
- Markus Hiltunen
- Department of Organismal Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | | | - Martin Ryberg
- Department of Organismal Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Hanna Johannesson
- Department of Organismal Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| |
Collapse
|
49
|
Wienert B, Cromer MK. CRISPR nuclease off-target activity and mitigation strategies. Front Genome Ed 2022; 4:1050507. [PMID: 36439866 PMCID: PMC9685173 DOI: 10.3389/fgeed.2022.1050507] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 10/26/2022] [Indexed: 11/11/2022] Open
Abstract
The discovery of CRISPR has allowed site-specific genomic modification to become a reality and this technology is now being applied in a number of human clinical trials. While this technology has demonstrated impressive efficacy in the clinic to date, there remains the potential for unintended on- and off-target effects of CRISPR nuclease activity. A variety of in silico-based prediction tools and empirically derived experimental methods have been developed to identify the most common unintended effect-small insertions and deletions at genomic sites with homology to the guide RNA. However, large-scale aberrations have recently been reported such as translocations, inversions, deletions, and even chromothripsis. These are more difficult to detect using current workflows indicating a major unmet need in the field. In this review we summarize potential sequencing-based solutions that may be able to detect these large-scale effects even at low frequencies of occurrence. In addition, many of the current clinical trials using CRISPR involve ex vivo isolation of a patient's own stem cells, modification, and re-transplantation. However, there is growing interest in direct, in vivo delivery of genome editing tools. While this strategy has the potential to address disease in cell types that are not amenable to ex vivo manipulation, in vivo editing has only one desired outcome-on-target editing in the cell type of interest. CRISPR activity in unintended cell types (both on- and off-target) is therefore a major safety as well as ethical concern in tissues that could enable germline transmission. In this review, we have summarized the strengths and weaknesses of current editing and delivery tools and potential improvements to off-target and off-tissue CRISPR activity detection. We have also outlined potential mitigation strategies that will ensure that the safety of CRISPR keeps pace with efficacy, a necessary requirement if this technology is to realize its full translational potential.
Collapse
Affiliation(s)
- Beeke Wienert
- Graphite Bio, Inc., South San Francisco, CA, United States
| | - M. Kyle Cromer
- Department of Surgery, University of California, San Francisco, San Francisco, CA, United States
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States
- Eli and Edythe Broad Center for Regeneration Medicine, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
50
|
Cell-Free DNA Fragmentation Patterns in a Cancer Cell Line. Diagnostics (Basel) 2022; 12:diagnostics12081896. [PMID: 36010246 PMCID: PMC9406536 DOI: 10.3390/diagnostics12081896] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/29/2022] [Accepted: 08/03/2022] [Indexed: 12/20/2022] Open
Abstract
Unique bits of genetic, biological and pathological information occur in differently sized cell-free DNA (cfDNA) populations. This is a significant discovery, but much of the phenomenon remains to be explored. We investigated cfDNA fragmentation patterns in cultured human bone cancer (143B) cells using increasingly sensitive electrophoresis assays, including four automated microfluidic capillary electrophoresis assays from Agilent, i.e., DNA 1000, High Sensitivity DNA, dsDNA 915 and dsDNA 930, and an optimized manual agarose gel electrophoresis protocol. This comparison showed that (i) as the sensitivity and resolution of the sizing methods increase incrementally, additional nucleosomal multiples are revealed (hepta-nucleosomes were detectable with manual agarose gel electrophoresis), while the estimated size range of high molecular weight (HMW) cfDNA fragments narrow correspondingly; (ii) the cfDNA laddering pattern extends well beyond the 1–3 nucleosomal multiples detected by commonly used methods; and (iii) the modal size of HMW cfDNA populations is exaggerated due to the limited resolving power of electrophoresis, and instead consists of several poly-nucleosomal subpopulations that continue the series of DNA laddering. Furthermore, the most sensitive automated assay used in this study (Agilent dsDNA 930) revealed an exponential decay in the relative contribution of increasingly longer cfDNA populations. This power-law distribution suggests the involvement of a stochastic inter-nucleosomal DNA cleavage process, wherein shorter populations accumulate rapidly as they are fed by the degradation of all larger populations. This may explain why similar size profiles have historically been reported for cfDNA populations originating from different processes, such as apoptosis, necrosis, accidental cell lysis and purported active release. These results not only demonstrate the diversity of size profiles generated by different methods, but also highlight the importance of caution when drawing conclusions on the mechanisms that generate different cfDNA size populations, especially when only a single method is used for sizing.
Collapse
|