1
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Hwang JP, Bhamidipati SV, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. CELL GENOMICS 2024; 4:100590. [PMID: 38908378 PMCID: PMC11293582 DOI: 10.1016/j.xgen.2024.100590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/27/2023] [Accepted: 05/31/2024] [Indexed: 06/24/2024]
Abstract
The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.
Collapse
Affiliation(s)
| | | | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Ming Yin Lun
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | | | - KyungHee Park
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eva Benito
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Medhat Mahmoud
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - James Paul Hwang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sravya V Bhamidipati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | - Sissel Juul
- Oxford Nanopore Technologies, New York, NY 10013, USA
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden; Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Fritz J Sedlazeck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Computer Science, Rice University, Houston TX 77030, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA
| | | |
Collapse
|
2
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
3
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
4
|
Ding W, Li X, Zhang J, Ji M, Zhang M, Zhong X, Cao Y, Liu X, Li C, Xiao C, Wang J, Li T, Yu Q, Mo F, Zhang B, Qi J, Yang JC, Qi J, Tian L, Xu X, Peng Q, Zhou WZ, Liu Z, Fu A, Zhang X, Zhang JJ, Sun Y, Hu B, An NA, Zhang L, Li CY. Adaptive functions of structural variants in human brain development. SCIENCE ADVANCES 2024; 10:eadl4600. [PMID: 38579006 DOI: 10.1126/sciadv.adl4600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 03/01/2024] [Indexed: 04/07/2024]
Abstract
Quantifying the structural variants (SVs) in nonhuman primates could provide a niche to clarify the genetic backgrounds underlying human-specific traits, but such resource is largely lacking. Here, we report an accurate SV map in a population of 562 rhesus macaques, verified by in-house benchmarks of eight macaque genomes with long-read sequencing and another one with genome assembly. This map indicates stronger selective constrains on inversions at regulatory regions, suggesting a strategy for prioritizing them with the most important functions. Accordingly, we identified 75 human-specific inversions and prioritized them. The top-ranked inversions have substantially shaped the human transcriptome, through their dual effects of reconfiguring the ancestral genomic architecture and introducing regional mutation hotspots at the inverted regions. As a proof of concept, we linked APCDD1, located on one of these inversions and down-regulated specifically in humans, to neuronal maturation and cognitive ability. We thus highlight inversions in shaping the human uniqueness in brain development.
Collapse
Affiliation(s)
- Wanqiu Ding
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xiangshang Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mingjun Ji
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mengling Zhang
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Xiaoming Zhong
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Yong Cao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 119S Fourth Ring Rd W, Fengtai District, Beijing, China
| | - Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunqiong Li
- Chinese Institute for Brain Research, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jiaxin Wang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Ting Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qing Yu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Boya Zhang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jianhuan Qi
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jie-Chun Yang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Juntian Qi
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Lu Tian
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qi Peng
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhijin Liu
- College of Life Sciences, Capital Normal University, Beijing, China
| | - Aisi Fu
- Wuhan Dgensee Clinical Laboratory, Wuhan, China
| | - Xiuqin Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jian-Jun Zhang
- Shanxi Key Laboratory of Chinese Medicine Encephalopathy, National International Joint Research Center for Molecular Chinese Medicine, Shanxi University of Chinese Medicine, Jinzhong, China
| | - Yujie Sun
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
- Southwest United Graduate School, Kunming 650092, China
| |
Collapse
|
5
|
Mahmoud M, Harting J, Corbitt H, Chen X, Jhangiani SN, Doddapaneni H, Meng Q, Han T, Lambert C, Zhang S, Baybayan P, Henno G, Shen H, Hu J, Han Y, Riegler C, Metcalf G, Henno G, Chinn IK, Eberle MA, Kingan S, Farinholt T, Carvalho CM, Gibbs RA, Kronenberg Z, Muzny D, Sedlazeck FJ. Closing the gap: Solving complex medically relevant genes at scale. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.14.24304179. [PMID: 38562723 PMCID: PMC10984040 DOI: 10.1101/2024.03.14.24304179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes. TADGP revealed several candidate variants across all cases and provided insight into LPA diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., IKZF1, KCNE1). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants. Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - John Harting
- Pacific Biosciences, Menlo Park, California, USA
| | | | - Xiao Chen
- Pacific Biosciences, Menlo Park, California, USA
| | | | - Harsha Doddapaneni
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Qingchang Meng
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Tina Han
- Twist Bioscience, South San Francisco, USA
| | | | - Siyuan Zhang
- Pacific Biosciences, Menlo Park, California, USA
| | | | - Geoff Henno
- Pacific Biosciences, Menlo Park, California, USA
| | - Hua Shen
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Jianhong Hu
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Yi Han
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | | | - Ginger Metcalf
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Geoff Henno
- Pacific Biosciences, Menlo Park, California, USA
| | - Ivan K. Chinn
- Department of Pediatrics, Section of Immunology Allergy and Rheumatology, Center for Human Immunobiology, Texas Children’s Hospital and Baylor College of Medicine, Houston, Texas, USA
| | | | - Sarah Kingan
- Pacific Biosciences, Menlo Park, California, USA
| | | | | | - Richard A. Gibbs
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | | | - Donna Muzny
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
| | - Fritz J. Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| |
Collapse
|
6
|
Holt JM, Saunders CT, Rowell WJ, Kronenberg Z, Wenger AM, Eberle M. HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 2024; 40:btae042. [PMID: 38269623 PMCID: PMC10868326 DOI: 10.1093/bioinformatics/btae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/13/2023] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants. However, current phasing tools typically only phase small variants, leaving larger variants unphased. RESULTS We developed HiPhase, a tool that jointly phases SNVs, indels, structural, and tandem repeat variants. The main benefits of HiPhase are (i) dual mode allele assignment for detecting large variants, (ii) a novel application of the A*-algorithm to phasing, and (iii) logic allowing phase blocks to span breaks caused by alignment issues around reference gaps and homozygous deletions. In our assessment, HiPhase produced an average phase block NG50 of 480 kb with 929 switchflip errors and fully phased 93.8% of genes, improving over the current state of the art. Additionally, HiPhase jointly phases SNVs, indels, structural, and tandem repeat variants and includes innate multi-threading, statistics gathering, and concurrent phased alignment output generation. AVAILABILITY AND IMPLEMENTATION HiPhase is available as source code and a pre-compiled Linux binary with a user guide at https://github.com/PacificBiosciences/HiPhase.
Collapse
Affiliation(s)
- James M Holt
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | | | - William J Rowell
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Zev Kronenberg
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Aaron M Wenger
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Michael Eberle
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| |
Collapse
|
7
|
Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, Handsaker RE, Hall S, Pionzio A, Schatz MC, Talkowski ME, Eichler EE, Levy SE, Sedlazeck FJ. Utility of long-read sequencing for All of Us. Nat Commun 2024; 15:837. [PMID: 38281971 PMCID: PMC10822842 DOI: 10.1038/s41467-024-44804-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/03/2024] [Indexed: 01/30/2024] Open
Abstract
The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.
Collapse
Affiliation(s)
- M Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Y Huang
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - K Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - P A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - W Wan
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - N Prasad
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - R E Handsaker
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
| | - S Hall
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - A Pionzio
- Discovery Life Sciences, Huntsville, AL, 35806, USA
| | - M C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - M E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - E E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - S E Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - F J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
8
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024:10.1038/s41587-023-02024-y. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
9
|
English A, Dolzhenko E, Jam HZ, Mckenzie S, Olson ND, De Coster W, Park J, Gu B, Wagner J, Eberle MA, Gymrek M, Chaisson MJP, Zook JM, Sedlazeck FJ. Benchmarking of small and large variants across tandem repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.29.564632. [PMID: 37961319 PMCID: PMC10634962 DOI: 10.1101/2023.10.29.564632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Collapse
|
10
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
11
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito-Garagorri E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Break-induced replication underlies formation of inverted triplications and generates unexpected diversity in haplotype structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.02.560172. [PMID: 37873367 PMCID: PMC10592851 DOI: 10.1101/2023.10.02.560172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Background The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .
Collapse
|
12
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Alvarez Jerez P, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat Methods 2023; 20:1483-1492. [PMID: 37710018 PMCID: PMC11222905 DOI: 10.1038/s41592-023-01993-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Kimberley J Billingsley
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M Genner
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
13
|
Vollger MR, Korlach J, Eldred KC, Swanson E, Underwood JG, Cheng YHH, Ranchalis J, Mao Y, Blue EE, Schwarze U, Munson KM, Saunders CT, Wenger AM, Allworth A, Chanprasert S, Duerden BL, Glass I, Horike-Pyne M, Kim M, Leppig KA, McLaughlin IJ, Ogawa J, Rosenthal EA, Sheppeard S, Sherman SM, Strohbehn S, Yuen AL, Reh TA, Byers PH, Bamshad MJ, Hisama FM, Jarvik GP, Sancak Y, Dipple KM, Stergachis AB. Synchronized long-read genome, methylome, epigenome, and transcriptome for resolving a Mendelian condition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.26.559521. [PMID: 37808736 PMCID: PMC10557686 DOI: 10.1101/2023.09.26.559521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid de novo genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1, and RB1) previously associated with single-gene MCs. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four 'omes' to resolve. These included nonsense-mediated decay, fusion transcript formation, enhancer adoption, transcriptional readthrough silencing, and inappropriate X chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.
Collapse
Affiliation(s)
- Mitchell R. Vollger
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | | | - Kiara C. Eldred
- University of Washington School of Medicine, Department of Biological Structure, Seattle, WA, USA
| | - Elliott Swanson
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
| | | | - Yong-Han H. Cheng
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
| | - Jane Ranchalis
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Yizi Mao
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Elizabeth E. Blue
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Ulrike Schwarze
- University of Washington School of Medicine, Department of Laboratory Medicine and Pathology, Seattle, WA, USA
| | - Katherine M. Munson
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
| | | | | | - Aimee Allworth
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Sirisak Chanprasert
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | | | - Ian Glass
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- University of Washington, Department of Pediatrics, Seattle, WA, USA
| | - Martha Horike-Pyne
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | | | - Kathleen A. Leppig
- Genetic Services, Kaiser Permanente Washington, Seattle, Washington, USA
| | | | | | | | - Sam Sheppeard
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Stephanie M. Sherman
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Samuel Strohbehn
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Amy L. Yuen
- Genetic Services, Kaiser Permanente Washington, Seattle, Washington, USA
| | | | - Thomas A. Reh
- University of Washington School of Medicine, Department of Biological Structure, Seattle, WA, USA
| | - Peter H. Byers
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
- University of Washington School of Medicine, Department of Laboratory Medicine and Pathology, Seattle, WA, USA
| | - Michael J. Bamshad
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- University of Washington, Department of Pediatrics, Seattle, WA, USA
| | - Fuki M. Hisama
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Gail P. Jarvik
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Yasemin Sancak
- University of Washington School of Medicine, Department of Pharmacology, Seattle, WA, USA
| | - Katrina M. Dipple
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- University of Washington, Department of Pediatrics, Seattle, WA, USA
| | - Andrew B. Stergachis
- University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| |
Collapse
|
14
|
Colicchio JM, Amstutz CL, Garcia N, Prabhu KN, Cairns TM, Akman M, Gottilla T, Gollery T, Stricklin SL, Bayer TS. A tool for rapid, automated characterization of population epigenomics in plants. Sci Rep 2023; 13:12915. [PMID: 37591855 PMCID: PMC10435466 DOI: 10.1038/s41598-023-38356-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/06/2023] [Indexed: 08/19/2023] Open
Abstract
Epigenetic variation in plant populations is an important factor in determining phenotype and adaptation to the environment. However, while advances have been made in the molecular and computational methods to analyze the methylation status of a given sample of DNA, tools to profile and compare the methylomes of multiple individual plants or groups of plants at high resolution and low cost are lacking. Here, we describe a computational approach and R package (sounDMR) that leverages the benefits of long read nanopore sequencing to enable robust identification of differential methylation from complex experimental designs, as well as assess the variability within treatment groups and identify individual plants of interest. We demonstrate the utility of this approach by profiling a population of Arabidopsis thaliana exposed to a demethylating agent and identify genomic regions of high epigenetic variability between individuals. Given the low cost of nanopore sequencing devices and the ease of sample preparation, these results show that high resolution epigenetic profiling of plant populations can be made more broadly accessible in plant breeding and biotechnology.
Collapse
Affiliation(s)
| | | | | | | | | | - Melis Akman
- Sound Agriculture Company, Emeryville, CA, USA
| | | | | | | | | |
Collapse
|
15
|
Ni P, Nie F, Zhong Z, Xu J, Huang N, Zhang J, Zhao H, Zou Y, Huang Y, Li J, Xiao CL, Luo F, Wang J. DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing. Nat Commun 2023; 14:4054. [PMID: 37422489 PMCID: PMC10329642 DOI: 10.1038/s41467-023-39784-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/22/2023] [Indexed: 07/10/2023] Open
Abstract
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Collapse
Affiliation(s)
- Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zeyu Zhong
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jinrui Xu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - You Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Yuanfeng Huang
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
| | - Jinchen Li
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, 410000, China
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
16
|
Spealman P, De T, Chuong JN, Gresham D. Best Practices in Microbial Experimental Evolution: Using Reporters and Long-Read Sequencing to Identify Copy Number Variation in Experimental Evolution. J Mol Evol 2023; 91:356-368. [PMID: 37012421 PMCID: PMC10275804 DOI: 10.1007/s00239-023-10102-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/21/2023] [Indexed: 04/05/2023]
Abstract
Copy number variants (CNVs), comprising gene amplifications and deletions, are a pervasive class of heritable variation. CNVs play a key role in rapid adaptation in both natural, and experimental, evolution. However, despite the advent of new DNA sequencing technologies, detection and quantification of CNVs in heterogeneous populations has remained challenging. Here, we summarize recent advances in the use of CNV reporters that provide a facile means of quantifying de novo CNVs at a specific locus in the genome, and nanopore sequencing, for resolving the often complex structures of CNVs. We provide guidance for the engineering and analysis of CNV reporters and practical guidelines for single-cell analysis of CNVs using flow cytometry. We summarize recent advances in nanopore sequencing, discuss the utility of this technology, and provide guidance for the bioinformatic analysis of these data to define the molecular structure of CNVs. The combination of reporter systems for tracking and isolating CNV lineages and long-read DNA sequencing for characterizing CNV structures enables unprecedented resolution of the mechanisms by which CNVs are generated and their evolutionary dynamics.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Titir De
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Julie N Chuong
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - David Gresham
- Department of Biology, New York University, New York, NY, 10003, USA.
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
17
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Jerez PA, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.12.523790. [PMID: 36711673 PMCID: PMC9882142 DOI: 10.1101/2023.01.12.523790] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
| | - Kimberley J. Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M. Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Kishwar Shafin
- Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J. Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
18
|
Behera S, LeFaive J, Orchard P, Mahmoud M, Paulin LF, Farek J, Soto DC, Parker SCJ, Smith AV, Dennis MY, Zook JM, Sedlazeck FJ. FixItFelix: improving genomic analysis by fixing reference errors. Genome Biol 2023; 24:31. [PMID: 36810122 PMCID: PMC9942314 DOI: 10.1186/s13059-023-02863-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 01/20/2023] [Indexed: 02/23/2023] Open
Abstract
The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jonathon LeFaive
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Peter Orchard
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Daniela C Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, CA, USA
| | - Stephen C J Parker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Albert V Smith
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Megan Y Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, CA, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
19
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
|
20
|
Martin SL, Lujan Toro B, James T, Sauder CA, Laforest M. Insights from the genomes of 4 diploid Camelina spp. G3 (BETHESDA, MD.) 2022; 12:jkac182. [PMID: 35976116 PMCID: PMC9713399 DOI: 10.1093/g3journal/jkac182] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 06/12/2022] [Indexed: 11/12/2022]
Abstract
Plant evolution has been a complex process involving hybridization and polyploidization making understanding the origin and evolution of a plant's genome challenging even once a published genome is available. The oilseed crop, Camelina sativa (Brassicaceae), has a fully sequenced allohexaploid genome with 3 unknown ancestors. To better understand which extant species best represent the ancestral genomes that contributed to C. sativa's formation, we sequenced and assembled chromosome level draft genomes for 4 diploid members of Camelina: C. neglecta C. hispida var. hispida, C. hispida var. grandiflora, and C. laxa using long and short read data scaffolded with proximity data. We then conducted phylogenetic analyses on regions of synteny and on genes described for Arabidopsis thaliana, from across each nuclear genome and the chloroplasts to examine evolutionary relationships within Camelina and Camelineae. We conclude that C. neglecta is closely related to C. sativa's sub-genome 1 and that C. hispida var. hispida and C. hispida var. grandiflora are most closely related to C. sativa's sub-genome 3. Further, the abundance and density of transposable elements, specifically Helitrons, suggest that the progenitor genome that contributed C. sativa's sub-genome 3 maybe more similar to the genome of C. hispida var. hispida than that of C. hispida var. grandiflora. These diploid genomes show few structural differences when compared to C. sativa's genome indicating little change to chromosome structure following allopolyploidization. This work also indicates that C. neglecta and C. hispida are important resources for understanding the genetics of C. sativa and potential resources for crop improvement.
Collapse
Affiliation(s)
- Sara L Martin
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada
| | - Beatriz Lujan Toro
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada
| | - Tracey James
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada
| | - Connie A Sauder
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0CA, Canada
| | - Martin Laforest
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada
| |
Collapse
|
21
|
Chander V, Mahmoud M, Hu J, Dardas Z, Grochowski CM, Dawood M, Khayat MM, Li H, Li S, Jhangiani S, Korchina V, Shen H, Weissenberger G, Meng Q, Gingras MC, Muzny DM, Doddapaneni H, Posey JE, Lupski JR, Sabo A, Murdock DR, Sedlazeck FJ, Gibbs RA. Long read sequencing and expression studies of AHDC1 deletions in Xia-Gibbs syndrome reveal a novel genetic regulatory mechanism. Hum Mutat 2022; 43:2033-2053. [PMID: 36054313 PMCID: PMC10167679 DOI: 10.1002/humu.24461] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/17/2022] [Accepted: 08/30/2022] [Indexed: 01/25/2023]
Abstract
Xia-Gibbs syndrome (XGS; MIM# 615829) is a rare mendelian disorder characterized by Development Delay (DD), intellectual disability (ID), and hypotonia. Individuals with XGS typically harbor de novo protein-truncating mutations in the AT-Hook DNA binding motif containing 1 (AHDC1) gene, although some missense mutations can also cause XGS. Large de novo heterozygous deletions that encompass the AHDC1 gene have also been ascribed as diagnostic for the disorder, without substantial evidence to support their pathogenicity. We analyzed 19 individuals with large contiguous deletions involving AHDC1, along with other genes. One individual bore the smallest known contiguous AHDC1 deletion (∼350 Kb), encompassing eight other genes within chr1p36.11 (Feline Gardner-Rasheed, IFI6, FAM76A, STX12, PPP1R8, THEMIS2, RPA2, SMPDL3B) and terminating within the first intron of AHDC1. The breakpoint junctions and phase of the deletion were identified using both short and long read sequencing (Oxford Nanopore). Quantification of RNA expression patterns in whole blood revealed that AHDC1 exhibited a mono-allelic expression pattern with no deficiency in overall AHDC1 expression levels, in contrast to the other deleted genes, which exhibited a 50% reduction in mRNA expression. These results suggest that AHDC1 expression in this individual is compensated by a novel regulatory mechanism and advances understanding of mutational and regulatory mechanisms in neurodevelopmental disorders.
Collapse
Affiliation(s)
- Varuna Chander
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Jianhong Hu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Zain Dardas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Moez Dawood
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michael M. Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - He Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Shoudong Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Shalini Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Viktoriya Korchina
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Hua Shen
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | | | - Qingchang Meng
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Marie-Claude Gingras
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Harsha Doddapaneni
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Jennifer E. Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - James R. Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Texas Children’s Hospital, Houston, Texas, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas, USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - David R. Murdock
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
22
|
Stenløkk K, Saitou M, Rud-Johansen L, Nome T, Moser M, Árnyasi M, Kent M, Barson NJ, Lien S. The emergence of supergenes from inversions in Atlantic salmon. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210195. [PMID: 35694753 PMCID: PMC9189505 DOI: 10.1098/rstb.2021.0195] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Supergenes link allelic combinations into non-recombining units known to play an essential role in maintaining adaptive genetic variation. However, because supergenes can be maintained over millions of years by balancing selection and typically exhibit strong recombination suppression, both the underlying functional variants and how the supergenes are formed are largely unknown. Particularly, questions remain over the importance of inversion breakpoint sequences and whether supergenes capture pre-existing adaptive variation or accumulate this following recombination suppression. To investigate the process of supergene formation, we identified inversion polymorphisms in Atlantic salmon by assembling eleven genomes with nanopore long-read sequencing technology. A genome assembly from the sister species, brown trout, was used to determine the standard state of the inversions. We found evidence for adaptive variation through genotype-environment associations, but not for the accumulation of deleterious mutations. One young 3 Mb inversion segregating in North American populations has captured adaptive variation that is still segregating within the standard arrangement of the inversion, while some adaptive variation has accumulated after the inversion. This inversion and two others had breakpoints disrupting genes. Three multigene inversions with matched repeat structures at the breakpoints did not show any supergene signatures, suggesting that shared breakpoint repeats may obstruct supergene formation. This article is part of the theme issue 'Genomic architecture of supergenes: causes and evolutionary consequences'.
Collapse
Affiliation(s)
- Kristina Stenløkk
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Marie Saitou
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Live Rud-Johansen
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Torfinn Nome
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Michel Moser
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Mariann Árnyasi
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Matthew Kent
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Nicola Jane Barson
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE) and Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| |
Collapse
|
23
|
Jensen EL, Gaughran SJ, Fusco NA, Poulakakis N, Tapia W, Sevilla C, Málaga J, Mariani C, Gibbs JP, Caccone A. The Galapagos giant tortoise Chelonoidis phantasticus is not extinct. Commun Biol 2022; 5:546. [PMID: 35681083 PMCID: PMC9184544 DOI: 10.1038/s42003-022-03483-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
The status of the Fernandina Island Galapagos giant tortoise (Chelonoidis phantasticus) has been a mystery, with the species known from a single specimen collected in 1906. The discovery in 2019 of a female tortoise living on the island provided the opportunity to determine if the species lives on. By sequencing the genomes of both individuals and comparing them to all living species of Galapagos giant tortoises, here we show that the two known Fernandina tortoises are from the same lineage and distinct from all others. The whole genome phylogeny groups the Fernandina individuals within a monophyletic group containing all species with a saddleback carapace morphology and one semi-saddleback species. This grouping of the saddleback species is contrary to mitochondrial DNA phylogenies, which place the saddleback species across several clades. These results imply the continued existence of lineage long considered extinct, with a current known population size of a single individual. Based on genomic data, the Galapagos giant tortoise species native to Fernandina Island appears to be alive and well, survived by at least one female after being considered extinct since 1906.
Collapse
Affiliation(s)
- Evelyn L Jensen
- School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne, UK.
| | - Stephen J Gaughran
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, USA.,Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Nicole A Fusco
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Nikos Poulakakis
- Department of Biology, School of Sciences and Engineering, University of Crete, Irakleio, Greece.,The Natural History Museum of Crete, School of Sciences and Engineering, University of Crete, Heraklion, Greece.,Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, Greece
| | - Washington Tapia
- Galapagos Conservancy, Fairfax, VA, USA.,University of Málaga, Campus Teatinos, Apdo, 59.29080, Málaga, Spain
| | - Christian Sevilla
- Conservation and Restoration of Insular Ecosystems Department, Galapagos National Park Directorate, Puerto Ayora, Galapagos, Ecuador
| | - Jeffreys Málaga
- Conservation and Restoration of Insular Ecosystems Department, Galapagos National Park Directorate, Puerto Ayora, Galapagos, Ecuador
| | - Carol Mariani
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA
| | - James P Gibbs
- Galapagos Conservancy, Fairfax, VA, USA.,Department of Environmental Biology, College of Environmental Science and Forestry, State University of New York, Syracuse, NY, USA
| | - Adalgisa Caccone
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| |
Collapse
|
24
|
Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa ( Medicago sativa L.). Cells 2021; 10:cells10123372. [PMID: 34943880 PMCID: PMC8699225 DOI: 10.3390/cells10123372] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/19/2021] [Accepted: 11/24/2021] [Indexed: 12/27/2022] Open
Abstract
Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.
Collapse
|