1
|
Ejaz MR, Badr K, Hassan ZU, Al-Thani R, Jaoua S. Metagenomic approaches and opportunities in arid soil research. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176173. [PMID: 39260494 DOI: 10.1016/j.scitotenv.2024.176173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 09/04/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
Arid soils present unique challenges and opportunities for studying microbial diversity and bioactive potential due to the extreme environmental conditions they bear. This review article investigates soil metagenomics as an emerging tool to explore complex microbial dynamics and unexplored bioactive potential in harsh environments. Utilizing advanced metagenomic techniques, diverse microbial populations that grow under extreme conditions such as high temperatures, salinity, high pH levels, and exposure to metals and radiation can be studied. The use of extremophiles to discover novel natural products and biocatalysts emphasizes the role of functional metagenomics in identifying enzymes and secondary metabolites for industrial and pharmaceutical purposes. Metagenomic sequencing uncovers a complex network of microbial diversity, offering significant potential for discovering new bioactive compounds. Functional metagenomics, connecting taxonomic diversity to genetic capabilities, provides a pathway to identify microbes' mechanisms to synthesize valuable secondary metabolites and other bioactive substances. Contrary to the common perception of desert soil as barren land, the metagenomic analysis reveals a rich diversity of life forms adept at extreme survival. It provides valuable findings into their resilience and potential applications in biotechnology. Moreover, the challenges associated with metagenomics in arid soils, such as low microbial biomass, high DNA degradation rates, and DNA extraction inhibitors and strategies to overcome these issues, outline the latest advancements in extraction methods, high-throughput sequencing, and bioinformatics. The importance of metagenomics for investigating diverse environments opens the way for future research to develop sustainable solutions in agriculture, industry, and medicine. Extensive studies are necessary to utilize the full potential of these powerful microbial communities. This research will significantly improve our understanding of microbial ecology and biotechnology in arid environments.
Collapse
Affiliation(s)
- Muhammad Riaz Ejaz
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Kareem Badr
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Zahoor Ul Hassan
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Roda Al-Thani
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Samir Jaoua
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar.
| |
Collapse
|
2
|
Versoza CJ, Pfeifer SP. A hybrid genome assembly of the endangered aye-aye (Daubentonia madagascariensis). G3 (BETHESDA, MD.) 2024; 14:jkae185. [PMID: 39109845 PMCID: PMC11457058 DOI: 10.1093/g3journal/jkae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 06/24/2024] [Indexed: 10/08/2024]
Abstract
The aye-aye (Daubentonia madagascariensis) is the only extant member of the Daubentoniidae primate family. Although several reference genomes exist for this endangered strepsirrhine primate, the predominant usage of short-read sequencing has resulted in limited assembly contiguity and completeness, and no protein-coding gene annotations have yet been released. Here, we present a novel, fully annotated, chromosome-level hybrid de novo assembly for the species based on a combination of Oxford Nanopore Technologies long reads and Illumina short reads and scaffolded using genome-wide chromatin interaction data-a community resource that will improve future conservation efforts as well as primate comparative analyses.
Collapse
Affiliation(s)
- Cyril J Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| | - Susanne P Pfeifer
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
3
|
Kang X, Zhang W, Li Y, Luo X, Schönhuth A. HyLight: Strain aware assembly of low coverage metagenomes. Nat Commun 2024; 15:8665. [PMID: 39375348 DOI: 10.1038/s41467-024-52907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 09/23/2024] [Indexed: 10/09/2024] Open
Abstract
Different strains of identical species can vary substantially in terms of their spectrum of biomedically relevant phenotypes. Reconstructing the genomes of microbial communities at the level of their strains poses significant challenges, because sequencing errors can obscure strain-specific variants. Next-generation sequencing (NGS) reads are too short to resolve complex genomic regions. Third-generation sequencing (TGS) reads, although longer, are prone to higher error rates or substantially more expensive. Limiting TGS coverage to reduce costs compromises the accuracy of the assemblies. This explains why prior approaches agree on losses in strain awareness, accuracy, tendentially excessive costs, or combinations thereof. We introduce HyLight, a metagenome assembly approach that addresses these challenges by implementing the complementary strengths of TGS and NGS data. HyLight employs strain-resolved overlap graphs (OG) to accurately reconstruct individual strains within microbial communities. Our experiments demonstrate that HyLight produces strain-aware and contiguous assemblies at minimal error content, while significantly reducing costs because utilizing low-coverage TGS data. HyLight achieves an average improvement of 19.05% in preserving strain identity and demonstrates near-complete strain awareness across diverse datasets. In summary, HyLight offers considerable advances in metagenome assembly, insofar as it delivers significantly enhanced strain awareness, contiguity, and accuracy without the typical compromises observed in existing approaches.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Wenhai Zhang
- College of Biology, Hunan University, Changsha, China
| | - Yichen Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
4
|
Kim S, Kim J. Units containing telomeric repeats are prevalent in subtelomeric regions of a Mesorhabditis isolate collected from the Republic of Korea. Genes Genomics 2024:10.1007/s13258-024-01576-w. [PMID: 39367283 DOI: 10.1007/s13258-024-01576-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 09/11/2024] [Indexed: 10/06/2024]
Abstract
BACKGROUND Mesorhabditis is known for its somatic genome being only a small portion of the germline genome due to programmed DNA elimination. This phenotype may be associated with the maintenance of telomeres at the ends of fragmented somatic chromosomes. OBJECTIVE To comprehensively investigate the telomeric regions of Mesorhabditis nematodes at the sequence level, we endeavored to collect a Mesorhabditis nematode in the Republic of Korea and acquire its highly contiguous genome sequences. METHODS We isolated a Mesorhabditis nematode and assembled its 108-Mb draft genome using both 6.3 Gb (53 ×) of short-read and 3.0 Gb (25 × , N50 = 5.7 kb) of nanopore-based long-read sequencing data. Our genome assembly exhibits comparable quality to the public genome of Mesorhabditis belari in terms of contiguity and evolutionary conserved genes. RESULTS Unexpectedly, our Mesorhabditis genome has many more interstitial telomeric sequences (ITSs), specifically subtelomeric ones, compared to the genomes of Caenorhabditis elegans and M. belari. Moreover, several subtelomeric sequences containing ITSs had 4-26 homologous sequences, implying they are highly repetitive. Based on this highly repetitive nature, we hypothesize that subtelomeric ITSs might have accumulated through the action of transposable elements containing ITSs. CONCLUSIONS It still remains elusive whether these ITS-containing units are associated with programmed DNA elimination, but they may facilitate new telomere formation after DNA elimination. Our genomic resources for Mesorhabditis can aid in understanding how its distinct phenotypes have evolved.
Collapse
Affiliation(s)
- Seoyeon Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea.
| |
Collapse
|
5
|
Moon K, Basnet P, Um T, Choi IY. Review of the technology used for structural characterization of the GMO genome using NGS data. Genomics Inform 2024; 22:14. [PMID: 39358775 PMCID: PMC11445869 DOI: 10.1186/s44342-024-00016-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 08/26/2024] [Indexed: 10/04/2024] Open
Abstract
The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.
Collapse
Affiliation(s)
- Kahee Moon
- Department of Agriculture and Life Industry, Kangwon National University, Chuncheon, South Korea
| | - Prakash Basnet
- Department of Agriculture and Life Industry, Kangwon National University, Chuncheon, South Korea
| | - Taeyoung Um
- Department of Agriculture and Life Industry, Kangwon National University, Chuncheon, South Korea
| | - Ik-Young Choi
- Department of Agriculture and Life Industry, Kangwon National University, Chuncheon, South Korea.
- Department of Smart Farm and Agricultural Industry, Kangwon National University, Chuncheon, South Korea.
| |
Collapse
|
6
|
Frampton S, Smith R, Ferson L, Gibson J, Hollox EJ, Cragg MS, Strefford JC. Fc gamma receptors: Their evolution, genomic architecture, genetic variation, and impact on human disease. Immunol Rev 2024. [PMID: 39345014 DOI: 10.1111/imr.13401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Fc gamma receptors (FcγRs) are a family of receptors that bind IgG antibodies and interface at the junction of humoral and innate immunity. Precise regulation of receptor expression provides the necessary balance to achieve healthy immune homeostasis by establishing an appropriate immune threshold to limit autoimmunity but respond effectively to infection. The underlying genetics of the FCGR gene family are central to achieving this immune threshold by regulating affinity for IgG, signaling efficacy, and receptor expression. The FCGR gene locus was duplicated during evolution, retaining very high homology and resulting in a genomic region that is technically difficult to study. Here, we review the recent evolution of the gene family in mammals, its complexity and variation through copy number variation and single-nucleotide polymorphism, and impact of these on disease incidence, resolution, and therapeutic antibody efficacy. We also discuss the progress and limitations of current approaches to study the region and emphasize how new genomics technologies will likely resolve much of the current confusion in the field. This will lead to definitive conclusions on the impact of genetic variation within the FCGR gene locus on immune function and disease.
Collapse
Affiliation(s)
- Sarah Frampton
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Rosanna Smith
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer Immunology, University of Southampton, Southampton, UK
| | - Lili Ferson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Jane Gibson
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| | - Edward J Hollox
- Department of Genetics, Genomics and Cancer Sciences, College of Life Sciences, University of Leicester, Leicester, UK
| | - Mark S Cragg
- Antibody and Vaccine Group, Faculty of Medicine, School of Cancer Sciences, Centre for Cancer Immunology, University of Southampton, Southampton, UK
| | - Jonathan C Strefford
- Cancer Genomics Group, Faculty of Medicine, School of Cancer Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
7
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
8
|
Yano N, Chong PF, Kojima KK, Miyoshi T, Luqmen-Fatah A, Kimura Y, Kora K, Kayaki T, Maizuru K, Hayashi T, Yokoyama A, Ajiro M, Hagiwara M, Kondo T, Kira R, Takita J, Yoshida T. Long-read sequencing identifies an SVA_D retrotransposon insertion deep within the intron of ATP7A as a novel cause of occipital horn syndrome. J Med Genet 2024; 61:950-958. [PMID: 38960580 DOI: 10.1136/jmg-2024-110056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 06/25/2024] [Indexed: 07/05/2024]
Abstract
BACKGROUND SINE-VNTR-Alu (SVA) retrotransposons move from one genomic location to another in a 'copy-and-paste' manner. They continue to move actively and cause monogenic diseases through various mechanisms. Currently, disease-causing SVA retrotransposons are classified into human-specific young SVA_E or SVA_F subfamilies. In this study, we identified an evolutionarily old SVA_D retrotransposon as a novel cause of occipital horn syndrome (OHS). OHS is an X-linked, copper metabolism disorder caused by dysfunction of the copper transporter, ATP7A. METHODS We investigated a 16-year-old boy with OHS whose pathogenic variant could not be detected via routine molecular genetic analyses. RESULTS A 2.8 kb insertion was detected deep within the intron of the patient's ATP7A gene. This insertion caused aberrant mRNA splicing activated by a new donor splice site located within it. Long-read circular consensus sequencing enabled us to accurately read the entire insertion sequence, which contained highly repetitive and GC-rich segments. Consequently, the insertion was identified as an SVA_D retrotransposon. Antisense oligonucleotides (AOs) targeting the new splice site restored the expression of normal transcripts and functional ATP7A proteins. AO treatment alleviated excessive accumulation of copper in patient fibroblasts in a dose-dependent manner. Pedigree analysis revealed that the retrotransposon had moved into the OHS-causing position two generations ago. CONCLUSION This is the first report of a human monogenic disease caused by the SVA_D retrotransposon. The fact that the evolutionarily old SVA_D is still actively transposed, leading to increased copy numbers may make a notable impact on rare genetic disease research.
Collapse
Affiliation(s)
- Naoko Yano
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Pin Fee Chong
- Department of Pediatric Neurology, Fukuoka Children's Hospital, Fukuoka, Japan
- Department of Pediatrics, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Kenji K Kojima
- Genetic Information Research Institute, Cupertino, CA, USA
| | - Tomoichiro Miyoshi
- Laboratory for Retrotransposon Dynamics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Gene Mechanisms, Kyoto University Graduate School of Biostudies, Kyoto, Japan
| | - Ahmad Luqmen-Fatah
- Laboratory for Retrotransposon Dynamics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yu Kimura
- Department of Energy and Hydrocarbon Chemistry, Graduate School of Engineering, Kyoto University, Kyoto, Japan
| | - Kengo Kora
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Taisei Kayaki
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Kanako Maizuru
- Department of Pediatrics, Tenri Yorozu Hospital, Tenri, Japan
| | - Takahiro Hayashi
- Department of Pediatrics, Kurashiki Central Hospital, Kurashiki, Japan
| | - Atsushi Yokoyama
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Masahiko Ajiro
- Division of Cancer RNA Research, National Cancer Center Research Institute, Tokyo, Japan
| | - Masatoshi Hagiwara
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Department of Anatomy and Developmental Biology, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Teruyuki Kondo
- Department of Energy and Hydrocarbon Chemistry, Graduate School of Engineering, Kyoto University, Kyoto, Japan
| | - Ryutaro Kira
- Department of Pediatric Neurology, Fukuoka Children's Hospital, Fukuoka, Japan
| | - Junko Takita
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Takeshi Yoshida
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
9
|
Huang Y, Gao Y, Ly K, Lin L, Lambooij JP, King EG, Janssen A, Wei KHC, Lee YCG. Varying recombination landscapes between individuals are driven by polymorphic transposable elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.17.613564. [PMID: 39345575 PMCID: PMC11429682 DOI: 10.1101/2024.09.17.613564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Meiotic recombination is a prominent force shaping genome evolution, and understanding the causes for varying recombination landscapes within and between species has remained a central, though challenging, question. Recombination rates are widely observed to negatively associate with the abundance of transposable elements (TEs), selfish genetic elements that move between genomic locations. While such associations are usually interpreted as recombination influencing the efficacy of selection at removing TEs, accumulating findings suggest that TEs could instead be the cause rather than the consequence. To test this prediction, we formally investigated the influence of polymorphic, putatively active TEs on recombination rates. We developed and benchmarked a novel approach that uses PacBio long-read sequencing to efficiently, accurately, and cost-effectively identify crossovers (COs), a key recombination product, among large numbers of pooled recombinant individuals. By applying this approach to Drosophila strains with distinct TE insertion profiles, we found that polymorphic TEs, especially RNA-based TEs and TEs with local enrichment of repressive marks, reduce the occurrence of COs. Such an effect leads to different CO frequencies between homologous sequences with and without TEs, contributing to varying CO maps between individuals. The suppressive effect of TEs on CO is further supported by two orthogonal approaches-analyzing the distributions of COs in panels of recombinant inbred lines in relation to TE polymorphism and applying marker-assisted estimations of CO frequencies to isogenic strains with and without transgenically inserted TEs. Our investigations reveal how the constantly changing mobilome can actively modify recombination landscapes, shaping genome evolution within and between species.
Collapse
Affiliation(s)
- Yuheng Huang
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Yi Gao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Kayla Ly
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Leila Lin
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| | - Jan Paul Lambooij
- Center for Molecular Medicine, University Medical Center Utrecht, the Netherlands
| | | | - Aniek Janssen
- Center for Molecular Medicine, University Medical Center Utrecht, the Netherlands
| | - Kevin H.-C. Wei
- Department of Zoology, University of British Columbia, Canada
| | - Yuh Chwen G. Lee
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, USA
| |
Collapse
|
10
|
Deem KD, Brisson JA. Problems with Paralogs: The Promise and Challenges of Gene Duplicates in Evo-Devo Research. Integr Comp Biol 2024; 64:556-564. [PMID: 38565319 PMCID: PMC11406157 DOI: 10.1093/icb/icae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/13/2024] [Accepted: 03/23/2024] [Indexed: 04/04/2024] Open
Abstract
Gene duplicates, or paralogs, serve as a major source of new genetic material and comprise seeds for evolutionary innovation. While originally thought to be quickly lost or nonfunctionalized following duplication, now a vast number of paralogs are known to be retained in a functional state. Daughter paralogs can provide robustness through redundancy, specialize via sub-functionalization, or neo-functionalize to play new roles. Indeed, the duplication and divergence of developmental genes have played a monumental role in the evolution of animal forms (e.g., Hox genes). Still, despite their prevalence and evolutionary importance, the precise detection of gene duplicates in newly sequenced genomes remains technically challenging and often overlooked. This presents an especially pertinent problem for evolutionary developmental biology, where hypothesis testing requires accurate detection of changes in gene expression and function, often in nontraditional model species. Frequently, these analyses rely on molecular reagents designed within coding sequences that may be highly similar in recently duplicated paralogs, leading to cross-reactivity and spurious results. Thus, care is needed to avoid erroneously assigning diverged functions of paralogs to a single gene, and potentially misinterpreting evolutionary history. This perspective aims to overview the prevalence and importance of paralogs and to shed light on the difficulty of their detection and analysis while offering potential solutions.
Collapse
Affiliation(s)
- Kevin D Deem
- Department of Biology, University of Rochester, Rochester, NY, 14620
| | | |
Collapse
|
11
|
Lee Y, Choi K, Kim JE, Cha S, Nam JM. Integrating, Validating, and Expanding Information Space in Single-Molecule Surface-Enhanced Raman Spectroscopy for Biomolecules. ACS NANO 2024; 18:25359-25371. [PMID: 39228259 DOI: 10.1021/acsnano.4c09218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Single-molecule surface-enhanced Raman spectroscopy (SM-SERS) is an ultrahigh-resolution spectroscopic method for directly obtaining the complex vibrational mode information on individual molecules. SM-SERS offers a wide range of submolecular information on the hidden heterogeneity in its functional groups and varying structures, dynamics of conformational changes, binding and reaction kinetics, and interactions with the neighboring molecule and environment. Despite the richness in information on individual molecules and potential of SM-SERS in various detection targets, including large and complex biomolecules, several issues and practical considerations remain to be addressed, such as the requirement of long integration time, challenges in forming reliable and controllable interfaces between nanostructures and biomolecules, difficulty in determining hotspot size and shape, and most importantly, insufficient signal reproducibility and stability. Moreover, utilizing and interpreting SERS spectra is challenging, mainly because of the complexity and dynamic nature of molecular fingerprint Raman spectra, and this leads to fragmentary analysis and incomplete understanding of the spectra. In this Perspective, we discuss the current challenges and future opportunities of SM-SERS in views of system approaches by integrating molecules of interest, Raman dyes, plasmonic nanostructures, and artificial intelligence, particularly for detecting and analyzing biomolecules to realize the validation and expansion of information space in SM-SERS.
Collapse
Affiliation(s)
- Yeonhee Lee
- Department of Chemistry, Seoul National University, Seoul 08826, South Korea
| | - Kyungin Choi
- Department of Chemistry, Seoul National University, Seoul 08826, South Korea
| | - Ji-Eun Kim
- Department of Chemistry, Seoul National University, Seoul 08826, South Korea
| | - Seungsang Cha
- Department of Chemistry, Seoul National University, Seoul 08826, South Korea
| | - Jwa-Min Nam
- Department of Chemistry, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
12
|
Wang S, Wang L, Wei M, Wang L, Yang Z, Chen C, Ma X, Chu Y, Wu H, Zhou G. An accurate haplotyping method using multiplex pyrosequencing with AS-PCR to detect ABCB1 haplotypes associated with rivaroxaban-derived hemorrhagic events. Talanta 2024; 281:126861. [PMID: 39260257 DOI: 10.1016/j.talanta.2024.126861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 09/05/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
In clinical practice, owing to the comprehensive genetic insights they offer, haplotypes have attracted greater attention than individual single nucleotide polymorphisms (SNPs). Due to the long distances across SNP locations, detecting the haplotype using genomic DNA is challenging. Current haplotyping methods are either expensive and labor-intensive (high-throughput DNA sequencing), or haplotyping a single clinical sample (computational approach) is impossible. Herein, we propose using mRNA as a haplotyping target to minimize the distance among SNPs and employing allele-specific PCR (AS-PCR) to pick up a desired haplotype, followed by multiplex pyrosequencing to type the alleles at the SNP location of interest. AS-PCR was improved by combining an additional 3'-phosphorylated modified probe to achieve the specific separation of two closely similar templates. Only the sample with more than two heterozygotes needs to be haplotyped; therefore, we propose a stratification strategy to screen the samples for further haplotyping. This method was evaluated by associating ABCB1 haplotypes with the rivaroxaban-derived side effect in a cohort of 505 patients with nephrotic syndrome, focusing on the SNPs of ABCB1: rs1236C > T, rs2677G > T/A, and rs3435C > T. We successfully identified five bleeding-related haplotypes: rs1236T-rs2677T-rs3435T, rs1236C-rs2677G-rs3435T, rs1236T-rs2677G-rs3435C, rs1236C-rs2677G-rs3435C, and rs1236T-rs2677T-rs3435C. We compared the results with those from the conventional computational algorithm PHASE and observed that PHASE results dismissed the impact of rs1236C-rs2677G-rs3435C and rs1236C-rs2677G-rs3435T on bleeding risk and erroneously suggested a false positive association of rs1236C-rs2677A-rs3435T with increased bleeding risk.
Collapse
Affiliation(s)
- Shanshan Wang
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, 210009, China; Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China
| | - Liteng Wang
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China; School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Meng Wei
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China
| | - Lingfei Wang
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China; School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Ziyun Yang
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China; School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Chen Chen
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China
| | - Xueping Ma
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China
| | - Yana Chu
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China
| | - Haiping Wu
- Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China.
| | - Guohua Zhou
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, 210009, China; Department of Clinical Pharmacy, Jinling Hospital, Nanjing, 210002, China.
| |
Collapse
|
13
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
14
|
Zhang Z, Zhang J, Kang L, Qiu X, Xu S, Xu J, Guo Y, Niu Z, Niu B, Bi A, Zhao X, Xu D, Wang J, Yin C, Lu F. Structural variation discovery in wheat using PacBio high-fidelity sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 39239888 DOI: 10.1111/tpj.17011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 08/09/2024] [Accepted: 08/22/2024] [Indexed: 09/07/2024]
Abstract
Structural variations (SVs) pervade plant genomes and contribute substantially to the phenotypic diversity. However, most SVs were ineffectively assayed due to their complex nature and the limitations of early genomic technologies. By applying the PacBio high-fidelity (HiFi) sequencing for wheat genomes, we performed a comprehensive evaluation of mainstream long-read aligners and SV callers in SV detection. The results indicated that the accuracy of deletion discovery is markedly influenced by callers, accounting for 87.73% of the variance, whereas both aligners (38.25%) and callers (49.32%) contributed substantially to the accuracy variance for insertions. Among the aligners, Winnowmap2 and NGMLR excelled in detecting deletions and insertions, respectively. For SV callers, SVIM achieved the best performance. We demonstrated that combining the aligners and callers mentioned above is optimal for SV detection. Furthermore, we evaluated the effect of sequencing depth on the accuracy of SV detection, revealing that low-coverage HiFi sequencing is sufficiently robust for high-quality SV discovery. This study thoroughly evaluated SV discovery approaches and established optimal workflows for investigating structural variations using low-coverage HiFi sequencing in the wheat genome, which will advance SV discovery and decipher the biological functions of SVs in wheat and many other plants.
Collapse
Affiliation(s)
- Zhiliang Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jijin Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lipeng Kang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebing Qiu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Song Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jun Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yafei Guo
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zelin Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Beirui Niu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Aoyue Bi
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuebo Zhao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daxing Xu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Changbin Yin
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Fei Lu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
15
|
Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, Baier LJ. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences. Genome Biol Evol 2024; 16:evae188. [PMID: 39190003 PMCID: PMC11384899 DOI: 10.1093/gbe/evae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 05/17/2024] [Accepted: 08/22/2024] [Indexed: 08/28/2024] Open
Abstract
There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.
Collapse
Affiliation(s)
- Çiğdem Köroğlu
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Peng Chen
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Michael Traurig
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Serdar Altok
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Clifton Bogardus
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| | - Leslie J Baier
- Diabetes Molecular Genetics Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ 85004, USA
| |
Collapse
|
16
|
Record CJ, Pipis M, Skorupinska M, Blake J, Poh R, Polke JM, Eggleton K, Nanji T, Zuchner S, Cortese A, Houlden H, Rossor AM, Laura M, Reilly MM. Whole genome sequencing increases the diagnostic rate in Charcot-Marie-Tooth disease. Brain 2024; 147:3144-3156. [PMID: 38481354 PMCID: PMC11370804 DOI: 10.1093/brain/awae064] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/17/2024] [Accepted: 02/07/2024] [Indexed: 09/04/2024] Open
Abstract
Charcot-Marie-Tooth disease (CMT) is one of the most common and genetically heterogeneous inherited neurological diseases, with more than 130 disease-causing genes. Whole genome sequencing (WGS) has improved diagnosis across genetic diseases, but the diagnostic impact in CMT is yet to be fully reported. We present the diagnostic results from a single specialist inherited neuropathy centre, including the impact of WGS diagnostic testing. Patients were assessed at our specialist inherited neuropathy centre from 2009 to 2023. Genetic testing was performed using single gene testing, next-generation sequencing targeted panels, research whole exome sequencing and WGS and, latterly, WGS through the UK National Health Service. Variants were assessed using the American College of Medical Genetics and Genomics and Association for Clinical Genomic Science criteria. Excluding patients with hereditary ATTR amyloidosis, 1515 patients with a clinical diagnosis of CMT and related disorders were recruited. In summary, 621 patients had CMT1 (41.0%), 294 CMT2 (19.4%), 205 intermediate CMT (CMTi, 13.5%), 139 hereditary motor neuropathy (HMN, 9.2%), 93 hereditary sensory neuropathy (HSN, 6.1%), 38 sensory ataxic neuropathy (2.5%), 72 hereditary neuropathy with liability to pressure palsies (HNPP, 4.8%) and 53 'complex' neuropathy (3.5%). Overall, a genetic diagnosis was reached in 76.9% (1165/1515). A diagnosis was most likely in CMT1 (96.8%, 601/621), followed by CMTi (81.0%, 166/205) and then HSN (69.9%, 65/93). Diagnostic rates remained less than 50% in CMT2, HMN and complex neuropathies. The most common genetic diagnosis was PMP22 duplication (CMT1A; 505/1165, 43.3%), then GJB1 (CMTX1; 151/1165, 13.0%), PMP22 deletion (HNPP; 72/1165, 6.2%) and MFN2 (CMT2A; 46/1165, 3.9%). We recruited 233 cases to the UK 100 000 Genomes Project (100KGP), of which 74 (31.8%) achieved a diagnosis; 28 had been otherwise diagnosed since recruitment, leaving a true diagnostic rate of WGS through the 100KGP of 19.7% (46/233). However, almost half of the solved cases (35/74) received a negative report from the study, and the diagnosis was made through our research access to the WGS data. The overall diagnostic uplift of WGS for the entire cohort was 3.5%. Our diagnostic rate is the highest reported from a single centre and has benefitted from the use of WGS, particularly access to the raw data. However, almost one-quarter of all cases remain unsolved, and a new reference genome and novel technologies will be important to narrow the 'diagnostic gap'.
Collapse
Affiliation(s)
- Christopher J Record
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Menelaos Pipis
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Mariola Skorupinska
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Julian Blake
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Department of Clinical Neurophysiology, Norfolk and Norwich University Hospital, Norwich NR4 7UY, UK
| | - Roy Poh
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - James M Polke
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Kelly Eggleton
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Tina Nanji
- Neurogenetics Laboratory, National Hospital for Neurology and Neurosurgery, London WC1N 3BG, UK
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL 33136, USA
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Andrea Cortese
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Henry Houlden
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Alexander M Rossor
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Matilde Laura
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| | - Mary M Reilly
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
| |
Collapse
|
17
|
Ahmad F, Muhmood T. Clinical translation of nanomedicine with integrated digital medicine and machine learning interventions. Colloids Surf B Biointerfaces 2024; 241:114041. [PMID: 38897022 DOI: 10.1016/j.colsurfb.2024.114041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]
Abstract
Nanomaterials based therapeutics transform the ways of disease prevention, diagnosis and treatment with increasing sophistications in nanotechnology at a breakneck pace, but very few could reach to the clinic due to inconsistencies in preclinical studies followed by regulatory hinderances. To tackle this, integrating the nanomedicine discovery with digital medicine provide technologies as tools of specific biological activity measurement. Hence, overcome the redundancies in nanomedicine discovery by the on-site data acquisition and analytics through integrating intelligent sensors and artificial intelligence (AI) or machine learning (ML). Integrated AI/ML wearable sensors directly gather clinically relevant biochemical information from the subject's body and process data for physicians to make right clinical decision(s) in a time and cost-effective way. This review summarizes insights and recommend the infusion of actionable big data computation enabled sensors in burgeoning field of nanomedicine at academia, research institutes, and pharmaceutical industries, with a potential of clinical translation. Furthermore, many blind spots are present in modern clinically relevant computation, one of which could prevent ML-guided low-cost new nanomedicine development from being successfully translated into the clinic was also discussed.
Collapse
Affiliation(s)
- Farooq Ahmad
- State Key Laboratory of Chemistry and Utilization of Carbon Based Energy Resources, College of Chemistry, Xinjiang University, Urumqi 830017, China.
| | - Tahir Muhmood
- International Iberian Nanotechnology Laboratory (INL), Avenida Mestre José Veiga, Braga 4715-330, Portugal.
| |
Collapse
|
18
|
Wang ZY, Ge LP, Ouyang Y, Jin X, Jiang YZ. Targeting transposable elements in cancer: developments and opportunities. Biochim Biophys Acta Rev Cancer 2024; 1879:189143. [PMID: 38936517 DOI: 10.1016/j.bbcan.2024.189143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 05/23/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
Transposable elements (TEs), comprising nearly 50% of the human genome, have transitioned from being perceived as "genomic junk" to key players in cancer progression. Contemporary research links TE regulatory disruptions with cancer development, underscoring their therapeutic potential. Advances in long-read sequencing, computational analytics, single-cell sequencing, proteomics, and CRISPR-Cas9 technologies have enriched our understanding of TEs' clinical implications, notably their impact on genome architecture, gene regulation, and evolutionary processes. In cancer, TEs, including long interspersed element-1 (LINE-1), Alus, and long terminal repeat (LTR) elements, demonstrate altered patterns, influencing both tumorigenic and tumor-suppressive mechanisms. TE-derived nucleic acids and tumor antigens play critical roles in tumor immunity, bridging innate and adaptive responses. Given their central role in oncology, TE-targeted therapies, particularly through reverse transcriptase inhibitors and epigenetic modulators, represent a novel avenue in cancer treatment. Combining these TE-focused strategies with existing chemotherapy or immunotherapy regimens could enhance efficacy and offer a new dimension in cancer treatment. This review delves into recent TE detection advancements, explores their multifaceted roles in tumorigenesis and immune regulation, discusses emerging diagnostic and therapeutic approaches centered on TEs, and anticipates future directions in cancer research.
Collapse
Affiliation(s)
- Zi-Yu Wang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Li-Ping Ge
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yang Ouyang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xi Jin
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yi-Zhou Jiang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China.
| |
Collapse
|
19
|
Tang X, Berger MF, Solit DB. Precision oncology: current and future platforms for treatment selection. Trends Cancer 2024; 10:781-791. [PMID: 39030146 DOI: 10.1016/j.trecan.2024.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/20/2024] [Accepted: 06/21/2024] [Indexed: 07/21/2024]
Abstract
Genomic profiling of hundreds of cancer-associated genes is now a component of routine cancer care. DNA sequencing can identify mutations, mutational signatures, and structural alterations predictive of therapy response and assess for heritable cancer risk, but it has been less useful for identifying predictive biomarkers of sensitivity to cytotoxic chemotherapies, antibody drug conjugates, and immunotherapies. The clinical adoption of molecular profiling platforms such as RNA sequencing better suited to identifying those patients most likely to respond to immunotherapies and drug combinations will be critical to expanding the benefits of precision oncology. This review discusses the potential advantages of innovative molecular and functional profiling platforms designed to replace or complement targeted DNA sequencing and the major hurdles to their clinical adoption.
Collapse
Affiliation(s)
- Xinran Tang
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Michael F Berger
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David B Solit
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
20
|
Tsai CY, Hsu JSJ, Chen PL, Wu CC. Implementing next-generation sequencing for diagnosis and management of hereditary hearing impairment: a comprehensive review. Expert Rev Mol Diagn 2024; 24:753-765. [PMID: 39194060 DOI: 10.1080/14737159.2024.2396866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/22/2024] [Indexed: 08/29/2024]
Abstract
INTRODUCTION Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS) offers high-throughput screening and high-sensitivity detection of genetic etiologies of SNHI, enabling clinicians to make informed medical decisions, provide tailored treatments, and improve prognostic outcomes. AREAS COVERED This review covers the diverse etiologies of HHI and the utility of different NGS modalities (targeted sequencing and whole exome/genome sequencing), and includes HHI-related studies on newborn screening, genetic counseling, prognostic prediction, and personalized treatment. Challenges such as the trade-off between cost and diagnostic yield, detection of structural variants, and exploration of the non-coding genome are also highlighted. EXPERT OPINION In the current landscape of NGS-based diagnostics for HHI, there are both challenges (e.g. detection of structural variants and non-coding genome variants) and opportunities (e.g. the emergence of medical artificial intelligence tools). The authors advocate the use of technological advances such as long-read sequencing for structural variant detection, multi-omics analysis for non-coding variant exploration, and medical artificial intelligence for pathogenicity assessment and outcome prediction. By integrating these innovations into clinical practice, precision medicine in the diagnosis and management of HHI can be further improved.
Collapse
Affiliation(s)
- Cheng-Yu Tsai
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
| | - Jacob Shu-Jui Hsu
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Chen-Chi Wu
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Research, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| |
Collapse
|
21
|
Zheng J, Li T, Ye H, Jiang Z, Jiang W, Yang H, Wu Z, Xie Z. Comprehensive identification of pathogenic variants in retinoblastoma by long- and short-read sequencing. Cancer Lett 2024; 598:217121. [PMID: 39009069 DOI: 10.1016/j.canlet.2024.217121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 06/16/2024] [Accepted: 07/11/2024] [Indexed: 07/17/2024]
Abstract
Retinoblastoma (RB) is the most common intraocular malignancy in childhood. The causal variants in RB are mostly characterized by previously used short-read sequencing (SRS) analysis, which has technical limitations in identifying structural variants (SVs) and phasing information. Long-read sequencing (LRS) technology has advantages over SRS in detecting SVs, phased genetic variants, and methylation. In this study, we comprehensively characterized the genetic landscape of RB using combinatorial LRS and SRS of 16 RB tumors and 16 matched blood samples. We detected a total of 232 somatic SVs, with an average of 14.5 SVs per sample across the whole genome in our cohort. We identified 20 distinct pathogenic variants disrupting RB1 gene, including three novel small variants and five somatic SVs. We found more somatic SVs were detected from LRS than SRS (140 vs. 122) in RB samples with WGS data, particularly the insertions (18 vs. 1). Furthermore, our analysis shows that, with the exception of one sample who lacked the methylation data, all samples presented biallelic inactivation of RB1 in various forms, including two cases with the biallelic hypermethylated promoter and four cases with compound heterozygous mutations which were missing in SRS analysis. By inferring relative timing of somatic events, we reveal the genetic progression that RB1 disruption early and followed by copy number changes, including amplifications of Chr2p and deletions of Chr16q, during RB tumorigenesis. Altogether, we characterize the comprehensive genetic landscape of RB, providing novel insights into the genetic alterations and mechanisms contributing to RB initiation and development. Our work also establishes a framework to analyze genomic landscape of cancers based on LRS data.
Collapse
Affiliation(s)
- Jingjing Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huijing Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehang Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Wenbing Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Huasheng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
22
|
Negi S, Stenton SL, Berger SI, McNulty B, Violich I, Gardner J, Hillaker T, O'Rourke SM, O'Leary MC, Carbonell E, Austin-Tse C, Lemire G, Serrano J, Mangilog B, VanNoy G, Kolmogorov M, Vilain E, O'Donnell-Luria A, Délot E, Miga KH, Monlong J, Paten B. Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.22.24312327. [PMID: 39228712 PMCID: PMC11370519 DOI: 10.1101/2024.08.22.24312327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ∼36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ∼280 genes and ∼5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos . Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Collapse
|
23
|
Gong B, Li D, Łabaj PP, Pan B, Novoradovskaya N, Thierry-Mieg D, Thierry-Mieg J, Chen G, Bergstrom Lucas A, LoCoco JS, Richmond TA, Tseng E, Kusko R, Happe S, Mercer TR, Pabón-Peña C, Salmans M, Tilgner HU, Xiao W, Johann DJ, Jones W, Tong W, Mason CE, Kreil DP, Xu J. Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing. Sci Data 2024; 11:892. [PMID: 39152166 PMCID: PMC11329654 DOI: 10.1038/s41597-024-03741-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/05/2024] [Indexed: 08/19/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized genomic research by enabling high-throughput, cost-effective genome and transcriptome sequencing accelerating personalized medicine for complex diseases, including cancer. Whole genome/transcriptome sequencing (WGS/WTS) provides comprehensive insights, while targeted sequencing is more cost-effective and sensitive. In comparison to short-read sequencing, which still dominates the field due to high speed and cost-effectiveness, long-read sequencing can overcome alignment limitations and better discriminate similar sequences from alternative transcripts or repetitive regions. Hybrid sequencing combines the best strengths of different technologies for a more comprehensive view of genomic/transcriptomic variations. Understanding each technology's strengths and limitations is critical for translating cutting-edge technologies into clinical applications. In this study, we sequenced DNA and RNA libraries of reference samples using various targeted DNA and RNA panels and the whole transcriptome on both short-read and long-read platforms. This study design enables a comprehensive analysis of sequencing technologies, targeting protocols, and library preparation methods. Our expanded profiling landscape establishes a reference point for assessing current sequencing technologies, facilitating informed decision-making in genomic research and precision medicine.
Collapse
Affiliation(s)
- Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Paweł P Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria
| | - Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Guangchun Chen
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Anne Bergstrom Lucas
- Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Todd A Richmond
- Market & Application Development Bioinformatics, Roche Sequencing Solutions Inc., 4300 Hacienda Dr., Pleasanton, CA, 94588, USA
| | | | - Rebecca Kusko
- Cellino Bio, 750 Main Street, Cambridge, MA, 02143, USA
| | - Scott Happe
- Agilent Technologies, Inc., 1834 State Hwy 71 West, Cedar Creek, TX, 78612, USA
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD, Australia
| | - Carlos Pabón-Peña
- Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Donald J Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301W Markham St., Little Rock, AR, 72205, USA
| | - Wendell Jones
- Q squared Solutions Genomics, 2400 Elis Road, Durham, NC, 27703, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| | - David P Kreil
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria.
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
24
|
Luo C, Liu YH, Zhou XM. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2024; 15:6956. [PMID: 39138168 PMCID: PMC11322167 DOI: 10.1038/s41467-024-51282-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.
Collapse
Affiliation(s)
- Can Luo
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
25
|
Liu S, Obert C, Yu YP, Zhao J, Ren BG, Liu JJ, Wiseman K, Krajacich BJ, Wang W, Metcalfe K, Smith M, Ben-Yehezkel T, Luo JH. Utility analyses of AVITI sequencing chemistry. BMC Genomics 2024; 25:778. [PMID: 39127634 DOI: 10.1186/s12864-024-10686-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 08/02/2024] [Indexed: 08/12/2024] Open
Abstract
BACKGROUND DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. RESULTS Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences' AVITI and Illumina's NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina's NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms' respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. CONCLUSION These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA.
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, USA.
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, USA.
| | - Caroline Obert
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Yan-Ping Yu
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, USA
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, USA
| | - Junhua Zhao
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Bao-Guo Ren
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, USA
| | - Jia-Jun Liu
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, USA
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, USA
| | - Kelly Wiseman
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Benjamin J Krajacich
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Wenjia Wang
- Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, USA
| | - Kyle Metcalfe
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Mat Smith
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Tuval Ben-Yehezkel
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA, 92121, USA
| | - Jian-Hua Luo
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA.
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, USA.
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, USA.
| |
Collapse
|
26
|
Qi G, Battle A. Computational methods for allele-specific expression in single cells. Trends Genet 2024:S0168-9525(24)00169-0. [PMID: 39127549 DOI: 10.1016/j.tig.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/12/2024]
Abstract
Allele-specific expression (ASE) is a powerful signal that can be used to investigate multiple molecular mechanisms, such as cis-regulatory effects and imprinting. Single-cell RNA-sequencing (scRNA-seq) enables ASE characterization at the resolution of individual cells. In this review, we highlight the computational methods for processing and analyzing single-cell ASE data. We first describe a bioinformatics pipeline to obtain ASE counts from raw reads synthesized from previous literature. We then discuss statistical methods for detecting allelic imbalance and its variability across conditions using scRNA-seq data. In addition, we describe other methods that use single-cell ASE to address specific biological questions. Finally, we discuss future directions and emphasize the need for an integrated, optimized bioinformatics pipeline, and further development of statistical methods for different technologies.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
27
|
Plender EG, Prodanov T, Hsieh P, Nizamis E, Harvey WT, Sulovari A, Munson KM, Kaufman EJ, O'Neal WK, Valdmanis PN, Marschall T, Bloom JD, Eichler EE. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. Am J Hum Genet 2024; 111:1700-1716. [PMID: 38991590 PMCID: PMC11344006 DOI: 10.1016/j.ajhg.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/14/2024] [Accepted: 06/17/2024] [Indexed: 07/13/2024] Open
Abstract
The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
Collapse
Affiliation(s)
- Elizabeth G Plender
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Genetics, Cell Biology, and Development, University of Minnesota Medical School, Minneapolis, MN 55455, USA
| | - Evangelos Nizamis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Eli J Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Wanda K O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Paul N Valdmanis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany; Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - Jesse D Bloom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
28
|
Lewis SA, Ruttenberg A, Iyiyol T, Kong N, Jin SC, Kruer MC. Potential clinical applications of advanced genomic analysis in cerebral palsy. EBioMedicine 2024; 106:105229. [PMID: 38970919 PMCID: PMC11282942 DOI: 10.1016/j.ebiom.2024.105229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/26/2024] [Accepted: 06/20/2024] [Indexed: 07/08/2024] Open
Abstract
Cerebral palsy (CP) has historically been attributed to acquired insults, but emerging research suggests that genetic variations are also important causes of CP. While microarray and whole-exome sequencing based studies have been the primary methods for establishing new CP-gene relationships and providing a genetic etiology for individual patients, the cause of their condition remains unknown for many patients with CP. Recent advancements in genomic technologies offer additional opportunities to uncover variations in human genomes, transcriptomes, and epigenomes that have previously escaped detection. In this review, we outline the use of these state-of-the-art technologies to address the molecular diagnostic challenges experienced by individuals with CP. We also explore the importance of identifying a molecular etiology whenever possible, given the potential for genomic medicine to provide opportunities to treat patients with CP in new and more precise ways.
Collapse
Affiliation(s)
- Sara A Lewis
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States
| | - Andrew Ruttenberg
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Tuğçe Iyiyol
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Nahyun Kong
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States
| | - Sheng Chih Jin
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, United States; Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, United States.
| | - Michael C Kruer
- Pediatric Movement Disorders Program, Barrow Neurological Institute, Phoenix Children's Hospital, Phoenix, AZ, United States; Departments of Child Health, Neurology, and Cellular & Molecular Medicine and Program in Genetics, University of Arizona College of Medicine, Phoenix, AZ, United States; Programs in Neuroscience and Molecular & Cellular Biology, School of Life Sciences, Arizona State University, Tempe, AZ, United States.
| |
Collapse
|
29
|
Huq A, Thompson B, Winship I. Clinical application of whole genome sequencing in young onset dementia: challenges and opportunities. Expert Rev Mol Diagn 2024; 24:659-675. [PMID: 39135326 DOI: 10.1080/14737159.2024.2388765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 08/01/2024] [Indexed: 08/30/2024]
Abstract
INTRODUCTION Young onset dementia (YOD) by its nature is difficult to diagnose. Despite involvement of multidisciplinary neurogenetics services, patients with YOD and their families face significant diagnostic delays. Genetic testing for people with YOD currently involves a staggered, iterative approach. There is currently no optimal single genetic investigation that simultaneously identifies the different genetic variants resulting in YOD. AREAS COVERED This review discusses the advances in clinical genomic testing for people with YOD. Whole genome sequencing (WGS) can be employed as a 'one stop shop' genomic test for YOD. In addition to single nucleotide variants, WGS can reliably detect structural variants, short tandem repeat expansions, mitochondrial genetic variants as well as capture single nucleotide polymorphisms for the calculation of polygenic risk scores. EXPERT OPINION WGS, when used as the initial genetic test, can enhance the likelihood of a precision diagnosis and curtail the time taken to reach this. Finding a clinical diagnosis using WGS can reduce invasive and expensive investigations and could be cost effective. These advances need to be balanced against the limitations of the technology and the genetic counseling needs for these vulnerable patients and their families.
Collapse
Affiliation(s)
- Aamira Huq
- Department of Genomic Medicine, Royal Melbourne Hospital, Parkville, Victoria, Australia
- Department of Medicine, University of Melbourne, Parkville, Victoria, Australia
| | - Bryony Thompson
- Department of Medicine, Royal Melbourne Hospital, Parkville, Victoria, Australia
- Department of Pathology, University of Melbourne, Parkville, Victoria, Australia
| | - Ingrid Winship
- Department of Genomic Medicine, Royal Melbourne Hospital, Parkville, Victoria, Australia
- Department of Medicine, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
30
|
Boukoura S, Larsen DH. Nucleolar organization and ribosomal DNA stability in response to DNA damage. Curr Opin Cell Biol 2024; 89:102380. [PMID: 38861757 DOI: 10.1016/j.ceb.2024.102380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Eukaryotic nuclei are structured into sub-compartments orchestrating various cellular functions. The nucleolus is the largest nuclear organelle: a biomolecular condensate with an architecture composed of immiscible fluids facilitating ribosome biogenesis. The nucleolus forms upon the transcription of the repetitive ribosomal RNA genes (rDNA) that cluster in this compartment. rDNA is intrinsically unstable and prone to rearrangements and copy number variation. Upon DNA damage, a specialized nucleolar-DNA Damage Response (n-DDR) is activated: nucleolar transcription is inhibited, the architecture is rearranged, and rDNA is relocated to the nucleolar periphery. Recent data have highlighted how the composition of nucleoli, its structure, chemical and physical properties, contribute to rDNA stability. In this mini-review we focus on recent data that start to reveal how nucleolar composition and the n-DDR work together to ensure rDNA integrity.
Collapse
Affiliation(s)
- Stavroula Boukoura
- Nucleolar Stress and Disease Group, Danish Cancer Institute, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Dorthe Helena Larsen
- Nucleolar Stress and Disease Group, Danish Cancer Institute, Strandboulevarden 49, 2100 Copenhagen, Denmark.
| |
Collapse
|
31
|
Liu L, Zhan J, Yan J. Engineering the future cereal crops with big biological data: toward intelligence-driven breeding by design. J Genet Genomics 2024; 51:781-789. [PMID: 38531485 DOI: 10.1016/j.jgg.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 03/17/2024] [Accepted: 03/17/2024] [Indexed: 03/28/2024]
Abstract
How to feed 10 billion human populations is one of the challenges that need to be addressed in the following decades, especially under an unpredicted climate change. Crop breeding, initiating from the phenotype-based selection by local farmers and developing into current biotechnology-based breeding, has played a critical role in securing the global food supply. However, regarding the changing environment and ever-increasing human population, can we breed outstanding crop varieties fast enough to achieve high productivity, good quality, and widespread adaptability? This review outlines the recent achievements in understanding cereal crop breeding, including the current knowledge about crop agronomic traits, newly developed techniques, crop big biological data research, and the possibility of integrating them for intelligence-driven breeding by design, which ushers in a new era of crop breeding practice and shapes the novel architecture of future crops. This review focuses on the major cereal crops, including rice, maize, and wheat, to explain how intelligence-driven breeding by design is becoming a reality.
Collapse
Affiliation(s)
- Lei Liu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| | - Jimin Zhan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| |
Collapse
|
32
|
Duan J, Pan S, Ye Y, Hu Z, Chen L, Liang D, Fu T, Zhan L, Li Z, Liao J, Zhao X. Uncovering hidden genetic variations: long-read sequencing reveals new insights into tuberous sclerosis complex. Front Cell Dev Biol 2024; 12:1415258. [PMID: 39144255 PMCID: PMC11321964 DOI: 10.3389/fcell.2024.1415258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 07/10/2024] [Indexed: 08/16/2024] Open
Abstract
Background Tuberous sclerosis is a multi-system disorder caused by mutations in either TSC1 or TSC2. The majority of affected patients (85%-90%) have heterozygous variants, and a smaller number (around 5%) have mosaic variants. Despite using various techniques, some patients still have "no mutation identified" (NMI). Methods We hypothesized that the causal variants of patients with NMI may be structural variants or deep intronic variants. To investigate this, we sequenced the DNA of 26 tuberous sclerosis patients with NMI using targeted long-read sequencing. Results We identified likely pathogenic/pathogenic variants in 13 of the cases, of which 6 were large deletions, four were InDels, two were deep intronic variants, one had retrotransposon insertion in either TSC1 or TSC2, and one was complex rearrangement. Furthermore, there was a de novo Alu element insertion with a high suspicion of pathogenicity that was classified as a variant of unknown significance. Conclusion Our findings expand the current knowledge of known pathogenic variants related to tuberous sclerosis, particularly uncovering mosaic complex structural variations and retrotransposon insertions that have not been previously reported in tuberous sclerosis. Our findings suggest a higher prevalence of mosaicism among tuberous sclerosis patients than previously recognized. Our results indicate that long-read sequencing is a valuable approach for tuberous sclerosis cases with no mutation identified (NMI).
Collapse
Affiliation(s)
- Jing Duan
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| | | | - Yuanzhen Ye
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| | - Zhanqi Hu
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| | - Li Chen
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| | - Dachao Liang
- Shenzhen A-Smart Medical Research Center, Shenzhen Research Institute of the Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Tao Fu
- Shenzhen A-Smart Medical Research Center, Shenzhen Research Institute of the Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | | | - Zhuo Li
- Shenzhen A-Smart Medical Research Center, Shenzhen Research Institute of the Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Jianxiang Liao
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| | - Xia Zhao
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, Guangdong, China
| |
Collapse
|
33
|
Liu C, Wu P, Wu X, Zhao X, Chen F, Cheng X, Zhu H, Wang O, Xu M. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline. Front Genet 2024; 15:1421565. [PMID: 39130747 PMCID: PMC11310137 DOI: 10.3389/fgene.2024.1421565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Collapse
Affiliation(s)
- Chao Liu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Pei Wu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Xue Wu
- BGI Research, Shenzhen, China
| | | | | | | | - Hongmei Zhu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Ou Wang
- BGI Research, Shenzhen, China
| | - Mengyang Xu
- BGI Research, Shenzhen, China
- BGI Research, Qingdao, China
| |
Collapse
|
34
|
Cornejo-Corona I, Boland DJ, Devarenne TP. Method for isolation of high molecular weight genomic DNA from Botryococcus biomass. PLoS One 2024; 19:e0301680. [PMID: 39046949 PMCID: PMC11268603 DOI: 10.1371/journal.pone.0301680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/19/2024] [Indexed: 07/27/2024] Open
Abstract
The development of high molecular weight (HMW) genomic DNA (gDNA) extraction protocols for non-model species is essential to fully exploit long-read sequencing technologies in order to generate genome assemblies that can help answer complex questions about these organisms. Obtaining enough high-quality HMW gDNA can be challenging for these species, especially for tissues rich in polysaccharides such as biomass from species within the Botryococcus genus. The existing protocols based on column-based DNA extraction and biochemical lysis kits can be inefficient and may not be useful due to variations in biomass polysaccharide content. We developed an optimized protocol for the efficient extraction of HMW gDNA from Botryococcus biomass for use in long-read sequencing technologies. The protocol utilized an initial wash step with sorbitol to remove polysaccharides and yielded HMW gDNA concentrations up to 220 ng/μL with high purity. We then demonstrated the suitability of the HMW gDNA isolated from this protocol for long-read sequencing on the Oxford Nanopore PromethION platform for three Botryococcus species. Our protocol can be used as a standard for efficient HMW gDNA extraction in microalgae rich in polysaccharides and may be adapted for other challenging species.
Collapse
Affiliation(s)
- Ivette Cornejo-Corona
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Devon J. Boland
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Timothy P. Devarenne
- Biochemistry and Biophysics, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
35
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
36
|
Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol 2024; 25:188. [PMID: 39010145 PMCID: PMC11247875 DOI: 10.1186/s13059-024-03324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/26/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.
Collapse
Affiliation(s)
- Zhi Liu
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine, The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
37
|
Kolesnikov A, Cook D, Nattestad M, Brambrink L, McNulty B, Gorzynski J, Goenka S, Ashley EA, Jain M, Miga KH, Paten B, Chang PC, Carroll A, Shafin K. Local read haplotagging enables accurate long-read small variant calling. Nat Commun 2024; 15:5907. [PMID: 39003259 PMCID: PMC11246426 DOI: 10.1038/s41467-024-50079-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/28/2024] [Indexed: 07/15/2024] Open
Abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
Collapse
Affiliation(s)
| | - Daniel Cook
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | | | | | - Brandy McNulty
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Miten Jain
- Northeastern university, Boston, MA, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.
| |
Collapse
|
38
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
39
|
Luan T, Commichaux S, Hoffmann M, Jayeola V, Jang JH, Pop M, Rand H, Luo Y. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates. BMC Genomics 2024; 25:679. [PMID: 38978005 PMCID: PMC11232133 DOI: 10.1186/s12864-024-10582-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Seth Commichaux
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Maria Hoffmann
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Victor Jayeola
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Jae Hee Jang
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Yan Luo
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| |
Collapse
|
40
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
41
|
Jia H, Tan S, Cai Y, Guo Y, Shen J, Zhang Y, Ma H, Zhang Q, Chen J, Qiao G, Ruan J, Zhang YE. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat Commun 2024; 15:5644. [PMID: 38969648 PMCID: PMC11226609 DOI: 10.1038/s41467-024-49992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 06/20/2024] [Indexed: 07/07/2024] Open
Abstract
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Collapse
Affiliation(s)
- Hangxing Jia
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Yingao Cai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanyan Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qingzhu Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Chen
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Gexia Qiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
42
|
Jia H, Tan S, Zhang YE. Chasing Sequencing Perfection: Marching Toward Higher Accuracy and Lower Costs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae024. [PMID: 38991976 PMCID: PMC11423848 DOI: 10.1093/gpbjnl/qzae024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 01/25/2024] [Accepted: 01/29/2024] [Indexed: 07/13/2024]
Abstract
Next-generation sequencing (NGS), represented by Illumina platforms, has been an essential cornerstone of basic and applied research. However, the sequencing error rate of 1 per 1000 bp (10-3) represents a serious hurdle for research areas focusing on rare mutations, such as somatic mosaicism or microbe heterogeneity. By examining the high-fidelity sequencing methods developed in the past decade, we summarized three major factors underlying errors and the corresponding 12 strategies mitigating these errors. We then proposed a novel framework to classify 11 preexisting representative methods according to the corresponding combinatory strategies and identified three trends that emerged during methodological developments. We further extended this analysis to eight long-read sequencing methods, emphasizing error reduction strategies. Finally, we suggest two promising future directions that could achieve comparable or even higher accuracy with lower costs in both NGS and long-read sequencing.
Collapse
Affiliation(s)
- Hangxing Jia
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shengjun Tan
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yong E Zhang
- CAS Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
43
|
Umair M, Alharbi M, Aloyouni E, Al Abdulrahman A, Aldrees M, Al Tuwaijri A, Bilal M, Alfadhel M. Mutated neuron navigator 3 as a candidate gene for a rare neurodevelopmental disorder. Mol Genet Genomic Med 2024; 12:e2473. [PMID: 39038237 PMCID: PMC11262617 DOI: 10.1002/mgg3.2473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 05/13/2024] [Accepted: 05/16/2024] [Indexed: 07/24/2024] Open
Abstract
BACKGROUND Neuron navigator 3 (NAV3) is characterized as one of the neuron navigator family (NAV1, NAV2, NAV3) proteins predominantly expressed in the nervous system. The NAV3-encoded protein comprises a conserved AAA and coiled-coil domains characteristic of ATPases, which are associated with different cellular activities. METHODS We describe a Saudi proband presenting a complex recessive neurodevelopmental disorder (NDD). Whole exome sequencing (WES) followed by Sanger sequencing, 3D protein modeling and RT-qPCR was performed. RESULTS WES revealed a bi-allelic frameshift variant (c.2604_2605delAG; p.Val870SerfsTer12) in exon 12 of the NAV3 gene. Furthermore, RT-qPCR revealed a significant decrease in the NAV3 mRNA expression in the patient sample, and 3D protein modeling revealed disruption of the overall secondary structure. CONCLUSION For the time, we associate a bi-allelic variant in the NAV3 gene causing NDD in humans.
Collapse
Affiliation(s)
- Muhammad Umair
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Meshael Alharbi
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Essra Aloyouni
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Abdulkareem Al Abdulrahman
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Mohammed Aldrees
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
| | - Abeer Al Tuwaijri
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
- Clinical Laboratory Sciences DepartmentCollege of Applied Medical Sciences, KSAU‐HSRiyadhSaudi Arabia
| | - Muhammad Bilal
- Department of Pathology and Laboratory MedicineAga Khan UniversityKarachiPakistan
| | - Majid Alfadhel
- Medical Genomics Research Department, King Abdullah International Medical Research Center (KAIMRC)King Saud Bin Abdulaziz University for Health Sciences (KSAU‐HS), Ministry of National Guard Health Affairs (MNGH)RiyadhSaudi Arabia
- Genetics and Precision Medicine DepartmentKing Abdullah Specialized Children Hospital (KASCH), MNGHARiyadhSaudi Arabia
| |
Collapse
|
44
|
Brankovic M, Ivanovic V, Basta I, Khang R, Lee E, Stevic Z, Ralic B, Tubic R, Seo G, Markovic V, Bozovic I, Svetel M, Marjanovic A, Veselinovic N, Mesaros S, Jankovic M, Savic-Pavicevic D, Jovin Z, Novakovic I, Lee H, Peric S. Whole exome sequencing in Serbian patients with hereditary spastic paraplegia. Neurogenetics 2024; 25:165-177. [PMID: 38499745 DOI: 10.1007/s10048-024-00755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/08/2024] [Indexed: 03/20/2024]
Abstract
Hereditary spastic paraplegia (HSP) is a group of neurodegenerative diseases with a high genetic and clinical heterogeneity. Numerous HSP patients remain genetically undiagnosed despite screening for known genetic causes of HSP. Therefore, identification of novel variants and genes is needed. Our previous study analyzed 74 adult Serbian HSP patients from 65 families using panel of the 13 most common HSP genes in combination with a copy number variation analysis. Conclusive genetic findings were established in 23 patients from 19 families (29%). In the present study, nine patients from nine families previously negative on the HSP gene panel were selected for the whole exome sequencing (WES). Further, 44 newly diagnosed adult HSP patients from 44 families were sent to WES directly, since many studies showed WES may be used as the first step in HSP diagnosis. WES analysis of cohort 1 revealed a likely genetic cause in five (56%) of nine HSP families, including variants in the ETHE1, ZFYVE26, RNF170, CAPN1, and WASHC5 genes. In cohort 2, possible causative variants were found in seven (16%) of 44 patients (later updated to 27% when other diagnosis were excluded), comprising six different genes: SPAST, SPG11, WASCH5, KIF1A, KIF5A, and ABCD1. These results expand the genetic spectrum of HSP patients in Serbia and the region with implications for molecular genetic diagnosis and future causative therapies. Wide HSP panel can be the first step in diagnosis, alongside with the copy number variation (CNV) analysis, while WES should be performed after.
Collapse
Affiliation(s)
- Marija Brankovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia.
| | - Vukan Ivanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Ivana Basta
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | | | - Zorica Stevic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | - Radoje Tubic
- Institute for Oncology and Radiology of Serbia, Belgrade, Serbia
| | | | - Vladana Markovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivo Bozovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Marina Svetel
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ana Marjanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Nikola Veselinovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Sarlota Mesaros
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Milena Jankovic
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Dusanka Savic-Pavicevic
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | - Zita Jovin
- Neurology Clinic, University Clinical Center of Vojvodina, Novi Sad, Serbia
| | - Ivana Novakovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Hane Lee
- 3Billion, Inc., Seoul, South Korea
| | - Stojan Peric
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| |
Collapse
|
45
|
Yuen ZWS, Shanmuganandam S, Stanley M, Jiang S, Hein N, Daniel R, McNevin D, Jack C, Eyras E. Profiling age and body fluid DNA methylation markers using nanopore adaptive sampling. Forensic Sci Int Genet 2024; 71:103048. [PMID: 38640705 DOI: 10.1016/j.fsigen.2024.103048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 04/07/2024] [Accepted: 04/11/2024] [Indexed: 04/21/2024]
Abstract
DNA methylation plays essential roles in regulating physiological processes, from tissue and organ development to gene expression and aging processes and has emerged as a widely used biomarker for the identification of body fluids and age prediction. Currently, methylation markers are targeted independently at specific CpG sites as part of a multiplexed assay rather than through a unified assay. Methylation detection is also dependent on divergent methodologies, ranging from enzyme digestion and affinity enrichment to bisulfite treatment, alongside various technologies for high-throughput profiling, including microarray and sequencing. In this pilot study, we test the simultaneous identification of age-associated and body fluid-specific methylation markers using a single technology, nanopore adaptive sampling. This innovative approach enables the profiling of multiple CpG marker sites across entire gene regions from a single sample without the need for specialized DNA preparation or additional biochemical treatments. Our study demonstrates that adaptive sampling achieves sufficient coverage in regions of interest to accurately determine the methylation status, shows a robust consistency with whole-genome bisulfite sequencing data, and corroborates known CpG markers of age and body fluids. Our work also resulted in the identification of new sites strongly correlated with age, suggesting new possible age methylation markers. This study lays the groundwork for the systematic development of nanopore-based methodologies in both age prediction and body fluid identification, highlighting the feasibility and potential of nanopore adaptive sampling while acknowledging the need for further validation and expansion in future research.
Collapse
Affiliation(s)
- Zaka Wing-Sze Yuen
- EMBL Australia Partner Laboratory Network, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Shine-Dalgarno Centre for RNA Innovation, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Centre for Computational Biomedical Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - Somasundhari Shanmuganandam
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Maurice Stanley
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Simon Jiang
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia; Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia; Department of Renal Medicine, The Canberra Hospital, Canberra, ACT 2605, Australia
| | - Nadine Hein
- ACRF Department of Cancer Biology and Therapeutics and Division of Genome Sciences and Cancer, John Curtin School of Medical Research, Australian National University, Acton, Canberra, Australia
| | - Runa Daniel
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology, Queensland, Australia
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical & Physical Sciences, Faculty of Science, University of Technology Sydney, Sydney, Australia
| | - Cameron Jack
- ANU Bioinformatics Consultancy, John Curtin School of Medical Research, The Australian National University, Canberra, Australia
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Shine-Dalgarno Centre for RNA Innovation, John Curtin School of Medical Research, The Australian National University, Canberra, Australia; The Centre for Computational Biomedical Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, Australia.
| |
Collapse
|
46
|
Trégouët DA, Morange PE. Next-generation sequencing strategies in venous thromboembolism: in whom and for what purpose? J Thromb Haemost 2024; 22:1826-1834. [PMID: 38641321 DOI: 10.1016/j.jtha.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
This invited review follows the oral presentation "To Sequence or Not to Sequence, That Is Not the Question; But 'When, Who, Which and What For?' Is" given during the State of the Art session "Translational Genomics in Thrombosis: From OMICs to Clinics" of the International Society on Thrombosis and Haemostasis 2023 Congress. Emphasizing the power of next-generation sequencing technologies and the diverse strategies associated with DNA variant analysis, this review highlights the unresolved questions and challenges in their implementation both for the clinical diagnosis of venous thromboembolism and in translational research.
Collapse
Affiliation(s)
- David-Alexandre Trégouët
- University of Bordeaux, Institut National de la Santé et de la Recherche Médicale, Bordeaux Population Health Research Center, Unité Mixte de Recherche 1219, Bordeaux, France.
| | - Pierre-Emmanuel Morange
- Cardiovascular and Nutrition Research Center (Centre de Recherche en CardioVasculaire et Nutrition), Institut National de la Santé et de la Recherche Médicale, Institut National de Recherche pour l'agriculture, l' Alimentation et l'Environnement, Aix-Marseille University, Marseille, France
| |
Collapse
|
47
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
48
|
Liu S, Obert C, Yu YP, Zhao J, Ren BG, Liu JJ, Wiseman K, Krajacich BJ, Wang W, Metcalfe K, Smith M, Ben-Yehezkel T, Luo JH. Utility Analyses of AVITI Sequencing Chemistry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590136. [PMID: 38712138 PMCID: PMC11071311 DOI: 10.1101/2024.04.18.590136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. Results Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences' AVITI and Illumina's NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina's NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms' respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. Conclusion These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Caroline Obert
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Yan-Ping Yu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Junhua Zhao
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Bao-Guo Ren
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
| | - Jia-Jun Liu
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| | - Kelly Wiseman
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Benjamin J Krajacich
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Wenjia Wang
- Department of Biostatistics, University of Pittsburgh School of Public Health, United States
| | - Kyle Metcalfe
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Mat Smith
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Tuval Ben-Yehezkel
- Element Biosciences Inc, 10055 Barnes Canyon Road, Suite 100, San Diego, CA 92121, United States
| | - Jian-Hua Luo
- Department of Pathology, University of Pittsburgh School of Medicine, United States
- High Throughput Genome Center, University of Pittsburgh School of Medicine, United States
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, United States
| |
Collapse
|
49
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
50
|
Wu X, Lu M, Yun D, Gao S, Sun F. Long-read single-cell sequencing reveals the transcriptional landscape of spermatogenesis in obstructive azoospermia and Sertoli cell-only patients. QJM 2024; 117:422-435. [PMID: 38192002 DOI: 10.1093/qjmed/hcae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 12/16/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND High-throughput single-cell RNA sequencing (scRNA-seq) is widely used in spermatogenesis. However, it only reveals short reads in germ and somatic cells, limiting the discovery of novel transcripts and genes. AIM This study shows the long-read transcriptional landscape of spermatogenesis in obstructive azoospermia (OA) and Sertoli cell-only patients. DESIGN Single cells were isolated from testicular biopsies of OA and non-obstructive azoospermia (NOA) patients. Cell culture was identified by comparing PacBio long-read single-cell sequencing (OA n = 3, NOA n = 3) with short-read scRNA-seq (OA n = 6, NOA n = 6). Ten germ cell types and eight somatic cell types were classified based on known markers. METHODS PacBio long-read single-cell sequencing, short-read scRNA-seq, polymerase chain reaction. RESULTS A total of 130 426 long-read transcripts (100 517 novel transcripts and 29 909 known transcripts) and 49 508 long-read transcripts (26 002 novel transcripts and 23 506 known transcripts) have been detected in OA and NOA patients, respectively. Moreover, 36 373 and 1642 new genes are identified in OA and NOA patients, respectively. Importantly, specific expressions of long-read transcripts were detected in germ and stomatic cells during normal spermatogenesis. CONCLUSION We have identified total full-length transcripts in OA and NOA, and new genes were found. Furthermore, specific expressed full-length transcripts were detected, and the genomic structure of transcripts was mapped in different cell types. These findings may provide valuable information on human spermatogenesis and the treatment of male infertility.
Collapse
Affiliation(s)
- X Wu
- Department of Urology and Andrology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - M Lu
- Department of Urology and Andrology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - D Yun
- Institute of Reproductive Medicine, Medical School of Nantong University, Nantong, Jiangsu, China
| | - S Gao
- Institute of Reproductive Medicine, Medical School of Nantong University, Nantong, Jiangsu, China
| | - F Sun
- Department of Urology and Andrology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| |
Collapse
|