1
|
Bai X, Chen Z, Chen K, Wu Z, Wang R, Liu J, Chang L, Wen L, Tang F. Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq. Cell Discov 2024; 10:74. [PMID: 38977679 PMCID: PMC11231365 DOI: 10.1038/s41421-024-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
Collapse
Affiliation(s)
- Xiuzhen Bai
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Zonggui Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Changping Laboratory, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Zixin Wu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Rui Wang
- Department of Medicine, Cancer Institute, Stanford University, Stanford, CA, USA
| | - Jun'e Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
- School of Life Sciences, Peking University, Beijing, China
| | - Liang Chang
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education Beijing, Beijing, China
- Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
- Changping Laboratory, Beijing, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China.
- Changping Laboratory, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.
- School of Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
2
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
3
|
Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580432. [PMID: 38529499 PMCID: PMC10962706 DOI: 10.1101/2024.02.15.580432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany
| |
Collapse
|
4
|
Ewalt MD, Hsiao SJ. Molecular Methods: Clinical Utilization and Designing a Test Menu. Clin Lab Med 2024; 44:123-135. [PMID: 38821636 DOI: 10.1016/j.cll.2023.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
Pre-analytical factors in molecular oncology diagnostics are reviewed. Issues around sample collection, storage, and transport that might affect the stability of nucleic acids and the ability to perform molecular testing are addressed. In addition, molecular methods used commonly in clinical diagnostic laboratories, including newer technologies such as next-generation sequencing and digital droplet polymerase chain reaction, as well as their applications, are reviewed. Finally, we discuss considerations in designing a molecular test menu to deliver accurate and timely results in an efficient and cost-effective manner.
Collapse
Affiliation(s)
- Mark D Ewalt
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, S-801C, New York, NY 10065, USA
| | - Susan J Hsiao
- Department of Pathology & Cell Biology, Columbia University Medical Center, 630 West 168th Street, P&S16-408CB, New York, NY 10032, USA.
| |
Collapse
|
5
|
Zhou Q, Ji F, Lin D, Liu X, Zhu Z, Ruan J. KSNP: a fast de Bruijn graph-based haplotyping tool approaching data-in time cost. Nat Commun 2024; 15:3126. [PMID: 38605047 PMCID: PMC11009271 DOI: 10.1038/s41467-024-47562-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 04/04/2024] [Indexed: 04/13/2024] Open
Abstract
Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time.
Collapse
Affiliation(s)
- Qian Zhou
- PengCheng Laboratory, Shenzhen, China
| | - Fahu Ji
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Dongxiao Lin
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Xianming Liu
- PengCheng Laboratory, Shenzhen, China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China.
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
6
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Paasch A, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted phasing of 2-200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method. Sci Rep 2024; 14:7988. [PMID: 38580715 PMCID: PMC10997766 DOI: 10.1038/s41598-024-58733-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/02/2024] [Indexed: 04/07/2024] Open
Abstract
In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Amber Paasch
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA.
| |
Collapse
|
7
|
Holmes MJ, Mahjour B, Castro CP, Farnum GA, Diehl AG, Boyle AP. HaplotagLR: An efficient and configurable utility for haplotagging long reads. PLoS One 2024; 19:e0298688. [PMID: 38478504 PMCID: PMC10936807 DOI: 10.1371/journal.pone.0298688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/30/2024] [Indexed: 03/17/2024] Open
Abstract
Understanding the functional effects of sequence variation is crucial in genomics. Individual human genomes contain millions of variants that contribute to phenotypic variability and disease risks at the population level. Because variants rarely act in isolation, we must consider potential interactions of neighboring variants to accurately predict functional effects. We can accomplish this using haplotagging, which matches sequencing reads to their parental haplotypes using alleles observed at known heterozygous variants. However, few published tools for haplotagging exist and these share several technical and usability-related shortcomings that limit applicability, in particular a lack of insight or control over error rates, and lack of key metrics on the underlying sources of haplotagging error. Here we present HaplotagLR: a user-friendly tool that haplotags long sequencing reads based on a multinomial model and existing phased variant lists. HaplotagLR is user-configurable and includes a basic error model to control the empirical FDR in its output. We show that HaplotagLR outperforms the leading haplotagging method in simulated datasets, especially at high levels of specificity, and displays 7% greater sensitivity in haplotagging real data. HaplotagLR advances both the immediate utility of haplotagging and paves the way for further improvements to this important method.
Collapse
Affiliation(s)
- Monica J. Holmes
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Christopher P. Castro
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gregory A. Farnum
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Adam G. Diehl
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alan P. Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
van der Burg LLJ, de Wreede LC, Baldauf H, Sauter J, Schetelig J, Putter H, Böhringer S. Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region. Genet Epidemiol 2024; 48:3-26. [PMID: 37830494 DOI: 10.1002/gepi.22538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 09/06/2023] [Accepted: 09/25/2023] [Indexed: 10/14/2023]
Abstract
Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.
Collapse
Affiliation(s)
| | - Liesbeth C de Wreede
- Biomedical Data Sciences, LUMC, Leiden, The Netherlands
- DKMS, Dresden/Tübingen, Germany
| | | | | | - Johannes Schetelig
- DKMS, Dresden/Tübingen, Germany
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Hein Putter
- Biomedical Data Sciences, LUMC, Leiden, The Netherlands
| | | |
Collapse
|
9
|
Holt JM, Saunders CT, Rowell WJ, Kronenberg Z, Wenger AM, Eberle M. HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 2024; 40:btae042. [PMID: 38269623 PMCID: PMC10868326 DOI: 10.1093/bioinformatics/btae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/13/2023] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants. However, current phasing tools typically only phase small variants, leaving larger variants unphased. RESULTS We developed HiPhase, a tool that jointly phases SNVs, indels, structural, and tandem repeat variants. The main benefits of HiPhase are (i) dual mode allele assignment for detecting large variants, (ii) a novel application of the A*-algorithm to phasing, and (iii) logic allowing phase blocks to span breaks caused by alignment issues around reference gaps and homozygous deletions. In our assessment, HiPhase produced an average phase block NG50 of 480 kb with 929 switchflip errors and fully phased 93.8% of genes, improving over the current state of the art. Additionally, HiPhase jointly phases SNVs, indels, structural, and tandem repeat variants and includes innate multi-threading, statistics gathering, and concurrent phased alignment output generation. AVAILABILITY AND IMPLEMENTATION HiPhase is available as source code and a pre-compiled Linux binary with a user guide at https://github.com/PacificBiosciences/HiPhase.
Collapse
Affiliation(s)
- James M Holt
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | | | - William J Rowell
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Zev Kronenberg
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Aaron M Wenger
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| | - Michael Eberle
- Computational Biology, PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025, United States
| |
Collapse
|
10
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O'Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. Nat Genet 2024; 56:152-161. [PMID: 38057443 PMCID: PMC10872287 DOI: 10.1038/s41588-023-01608-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 11/08/2023] [Indexed: 12/08/2023]
Abstract
Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Philip W Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Departments of Genetics and Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
11
|
Kumar J, Karim A, Sweety UH, Sarma H, Nurunnabi M, Narayan M. Bioinspired Approaches for Central Nervous System Targeted Gene Delivery. ACS APPLIED BIO MATERIALS 2023. [PMID: 38100377 DOI: 10.1021/acsabm.3c00842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2023]
Abstract
Disorders of the central nervous system (CNS) which include a wide range of neurodegenerative and neurological conditions have become a serious global issue. The presence of CNS barriers poses a significant challenge to the progress of designing effective therapeutic delivery systems, limiting the effectiveness of drugs, genes, and other therapeutic agents. Natural nanocarriers present in biological systems have inspired researchers to design unique delivery systems through biomimicry. As natural resource derived delivery systems are more biocompatible, current research has been focused on the development of delivery systems inspired by bacteria, viruses, fungi, and mammalian cells. Despite their structural potential and extensive physiological function, making them an excellent choice for biomaterial engineering, the delivery of nucleic acids remains challenging due to their instability in biological systems. Similarly, the efficient delivery of genetic material within the tissues of interest remains a hurdle due to a lack of selectivity and targeting ability. Considering that gene therapies are the holy grail for intervention in diseases, including neurodegenerative disorders such as Alzheimer's disease, Parkinson's Disease, and Huntington's disease, this review centers around recent advances in bioinspired approaches to gene delivery for the prevention of CNS disorders.
Collapse
Affiliation(s)
- Jyotish Kumar
- Department of Chemistry and Biochemistry, The University of Texas at El Paso (UTEP), El Paso, Texas 79968, United States
| | - Afroz Karim
- Department of Chemistry and Biochemistry, The University of Texas at El Paso (UTEP), El Paso, Texas 79968, United States
| | - Ummy Habiba Sweety
- Environmental Science and Engineering, The University of Texas at El Paso (UTEP), El Paso, Texas 79968, United States
| | - Hemen Sarma
- Bioremediation Technology Research Group, Department of Botany, Bodoland University, Rangalikhata, Deborgaon, 783370, Kokrajhar (BTR), Assam, India
| | - Md Nurunnabi
- The Department of Pharmaceutical Sciences, School of Pharmacy, The University of Texas at El Paso, El Paso, Texas 79968, United States
| | - Mahesh Narayan
- Department of Chemistry and Biochemistry, The University of Texas at El Paso (UTEP), El Paso, Texas 79968, United States
| |
Collapse
|
12
|
Xie X, Sun X, Wang Y, Lehner B, Li X. Dominance vs epistasis: the biophysical origins and plasticity of genetic interactions within and between alleles. Nat Commun 2023; 14:5551. [PMID: 37689712 PMCID: PMC10492795 DOI: 10.1038/s41467-023-41188-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/25/2023] [Indexed: 09/11/2023] Open
Abstract
An important challenge in genetics, evolution and biotechnology is to understand and predict how mutations combine to alter phenotypes, including molecular activities, fitness and disease. In diploids, mutations in a gene can combine on the same chromosome or on different chromosomes as a "heteroallelic combination". However, a direct comparison of the extent, sign, and stability of the genetic interactions between variants within and between alleles is lacking. Here we use thermodynamic models of protein folding and ligand-binding to show that interactions between mutations within and between alleles are expected in even very simple biophysical systems. Protein folding alone generates within-allele interactions and a single molecular interaction is sufficient to cause between-allele interactions and dominance. These interactions change differently, quantitatively and qualitatively as a system becomes more complex. Altering the concentration of a ligand can, for example, switch alleles from dominant to recessive. Our results show that intra-molecular epistasis and dominance should be widely expected in even the simplest biological systems but also reinforce the view that they are plastic system properties and so a formidable challenge to predict. Accurate prediction of both intra-molecular epistasis and dominance will require either detailed mechanistic understanding and experimental parameterization or brute-force measurement and learning.
Collapse
Affiliation(s)
- Xuan Xie
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
| | - Xia Sun
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Yuheng Wang
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | - Ben Lehner
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
- ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
| | - Xianghua Li
- Zhejiang University - University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, 314400, P. R. China.
- Wellcome Sanger Institute, Wellcome Genome Campus Hinxton, Cambridge, CB10 1SA, UK.
- Deanery of Biomedical Sciences, College of Medicine & Veterinary Medicine, University of Edinburgh, Edinburgh, EH8 9XD, UK.
- Biomedical and Health Translational Centre of Zhejiang Province, Haizhou East Road 718, Haining, 314400, P. R. China.
| |
Collapse
|
13
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
14
|
Smirnov D, Konstantinovskiy N, Prokisch H. Integrative omics approaches to advance rare disease diagnostics. J Inherit Metab Dis 2023; 46:824-838. [PMID: 37553850 DOI: 10.1002/jimd.12663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/10/2023]
Abstract
Over the past decade high-throughput DNA sequencing approaches, namely whole exome and whole genome sequencing became a standard procedure in Mendelian disease diagnostics. Implementation of these technologies greatly facilitated diagnostics and shifted the analysis paradigm from variant identification to prioritisation and evaluation. The diagnostic rates vary widely depending on the cohort size, heterogeneity and disease and range from around 30% to 50% leaving the majority of patients undiagnosed. Advances in omics technologies and computational analysis provide an opportunity to increase these unfavourable rates by providing evidence for disease-causing variant validation and prioritisation. This review aims to provide an overview of the current application of several omics technologies including RNA-sequencing, proteomics, metabolomics and DNA-methylation profiling for diagnostics of rare genetic diseases in general and inborn errors of metabolism in particular.
Collapse
Affiliation(s)
- Dmitrii Smirnov
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Nikita Konstantinovskiy
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
| | - Holger Prokisch
- School of Medicine, Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Neurogenomics, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| |
Collapse
|
15
|
Li Y, Zhan G, Tu M, Wang Y, Cao J, Sun S. A chromosome-scale genome and proteome draft of Tremella fuciformis. Int J Biol Macromol 2023; 247:125749. [PMID: 37429350 DOI: 10.1016/j.ijbiomac.2023.125749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/09/2023] [Accepted: 07/07/2023] [Indexed: 07/12/2023]
Abstract
In this study, we first reported a high-quality chromosome-scale genome of Tremella fuciformis using Pacbio HiFi sequencing combining Hi-C technology. According to 21.6 Gb PacBio HiFi reads and 18.1 Gb Hi-C valid reads, we drafted a T. fuciformis genome of 27.38 Mb assigned to 10 chromosomes, with the contig N50 of 2.28 Mb, GC content of 56.51 %, BUSCOs completeness of 93.1 % and consensus quality value of 33.7. The following annotation of genomic components predicted 5,171 repeat sequences, 283 RNAs, and 10,150 protein-coding genes. Next, the intracellular proteins at three differential life stages of T. fuciformis (conidium, hyphal and fruiting body) were identified by the shot-gun proteomics. 6,823 canonical proteins (68.1 % of predicted proteome) have been identified with protein FDR cut-off of 0.01, establishing the first proteome draft of predicted protein-coding genes of T. fuciformis. Finally, 24 T. fuciformis polysaccharides (TPS) biosynthesis-related genes in mycelia were identified by comparative transcriptomics and proteomics, which may be more active than in conidium and revealed the TPS biosynthesis process in mycelia. This present study elucidated T. fuciformis genome composition and organization, drafted its associated proteome, and provided a genome-view of TPS biosynthesis, which will be a powerful platform for biological and genetic studies in T. fuciformis.
Collapse
Affiliation(s)
- Yaxing Li
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China; Basic Forestry and Proteomics Research Center, Fujian Agriculture and forestry university, China
| | - Guanping Zhan
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Min Tu
- Basic Forestry and Proteomics Research Center, Fujian Agriculture and forestry university, China
| | - Yuhua Wang
- Basic Forestry and Proteomics Research Center, Fujian Agriculture and forestry university, China
| | - Jixuan Cao
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Shujing Sun
- College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
16
|
Xie H, Li W, Guo Y, Su X, Chen K, Wen L, Tang F. Long-read-based single sperm genome sequencing for chromosome-wide haplotype phasing of both SNPs and SVs. Nucleic Acids Res 2023; 51:8020-8034. [PMID: 37351613 PMCID: PMC10450174 DOI: 10.1093/nar/gkad532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 06/01/2023] [Accepted: 06/09/2023] [Indexed: 06/24/2023] Open
Abstract
Although localized haploid phasing can be achieved using long read genome sequencing without parental data, reliable chromosome-scale phasing remains a great challenge. Given that sperm is a natural haploid cell, single-sperm genome sequencing can provide a chromosome-wide phase signal. Due to the limitation of read length, current short-read-based single-sperm genome sequencing methods can only achieve SNP haplotyping and come with difficulties in detecting and haplotyping structural variations (SVs) in complex genomic regions. To overcome these limitations, we developed a long-read-based single-sperm genome sequencing method and a corresponding data analysis pipeline that can accurately identify crossover events and chromosomal level aneuploidies in single sperm and efficiently detect SVs within individual sperm cells. Importantly, without parental genome information, our method can accurately conduct de novo phasing of heterozygous SVs as well as SNPs from male individuals at the whole chromosome scale. The accuracy for phasing of SVs was as high as 98.59% using 100 single sperm cells, and the accuracy for phasing of SNPs was as high as 99.95%. Additionally, our method reliably enabled deduction of the repeat expansions of haplotype-resolved STRs/VNTRs in single sperm cells. Our method provides a new opportunity for studying haplotype-related genetics in mammals.
Collapse
Affiliation(s)
- Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| | - Wen Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Xinjie Su
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| |
Collapse
|
17
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O’Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.19.533370. [PMID: 36993580 PMCID: PMC10055215 DOI: 10.1101/2023.03.19.533370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H. Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C. Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L. Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Julia K. Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A. Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Philip W. Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N. Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children’s Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J. Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland, (FIMM) Helsinki, Finland
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J. Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G. MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Kaitlin E. Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
18
|
Gupta P, Nakamichi K, Bonnell AC, Yanagihara R, Radulovich N, Hisama FM, Chao JR, Mustafi D. Familial co-segregation and the emerging role of long-read sequencing to re-classify variants of uncertain significance in inherited retinal diseases. NPJ Genom Med 2023; 8:20. [PMID: 37558662 PMCID: PMC10412581 DOI: 10.1038/s41525-023-00366-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/02/2023] [Indexed: 08/11/2023] Open
Abstract
Phasing genetic variants is essential in determining those that are potentially disease-causing. In autosomal recessive inherited retinal diseases (IRDs), reclassification of variants of uncertain significance (VUS) can provide a genetic diagnosis in indeterminate compound heterozygote cases. We report four cases in which familial co-segregation demonstrated a VUS resided in trans to a known pathogenic variant, which in concert with other supporting criteria, led to the reclassification of the VUS to likely pathogenic, thereby providing a genetic diagnosis in each case. We also demonstrate in a simplex patient without access to family members for co-segregation analysis that targeted long-read sequencing can provide haplotagged variant calling. This can elucidate if variants reside in trans and provide phase of genetic variants from the proband alone without parental testing. This emerging method can alleviate the bottleneck of haplotype analysis in cases where genetic testing of family members is unfeasible to provide a complete genetic diagnosis.
Collapse
Affiliation(s)
- Pankhuri Gupta
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Kenji Nakamichi
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA
| | - Alyssa C Bonnell
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA
| | - Ryan Yanagihara
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA
| | - Nick Radulovich
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA
| | - Fuki M Hisama
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Jennifer R Chao
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA
| | - Debarshi Mustafi
- Department of Ophthalmology and Roger and Karalis Johnson Retina Center, University of Washington, Seattle, WA, 98109, USA.
- Division of Ophthalmology, Seattle Children's Hospital, Seattle, WA, 98105, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, 98195, USA.
| |
Collapse
|
19
|
Frazer KA, Schork NJ. The human pangenome reference anticipates equitable and fundamental genomic insights. CELL GENOMICS 2023; 3:100360. [PMID: 37492100 PMCID: PMC10363913 DOI: 10.1016/j.xgen.2023.100360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
For the past few years, researchers in the Human Pangenome Reference Consortium (HPRC) have been working to catalog almost all human genomic diversity. Frazer and Schork preview an article recently published in Nature, "A draft human pangenome reference,"1 which represents the initial release of 47 fully phased diploid assemblies of genomes of individuals with diverse ancestries.
Collapse
Affiliation(s)
- Kelly A. Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Nicholas J. Schork
- Quantitative Medicine and Systems Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA
- Departments Molecular and Cellular Biology and Population Sciences, City of Hope National Medical Center, Duarte, CA 91010, USA
| |
Collapse
|
20
|
Ouchi S, Kajitani R, Itoh T. GreenHill: a de novo chromosome-level scaffolding and phasing tool using Hi-C. Genome Biol 2023; 24:162. [PMID: 37434204 DOI: 10.1186/s13059-023-03006-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/04/2023] [Indexed: 07/13/2023] Open
Abstract
Chromosome-level haplotype-resolved genome assembly is an important resource in molecular biology. However, current de novo haplotype assemblers require parental data or reference genomes and often fail to provide chromosome-level results. We present GreenHill, a novel scaffolding and phasing tool that considers various assemblers' contigs as input to reconstruct chromosome-level haplotypes using Hi-C without parental or reference data. Its unique functions include new error correction based on Hi-C contacts and the simultaneous use of Hi-C and long reads. Benchmarks reveal that GreenHill outperforms other approaches in contiguity and phasing accuracy, and the majority of chromosome arms are entirely phased.
Collapse
Affiliation(s)
- Shun Ouchi
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Rei Kajitani
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan.
| |
Collapse
|
21
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted Phasing of 2-200 Kilobase DNA Fragments with a Short-Read Sequencer and a Single-Tube Linked-Read Library Method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.05.531179. [PMID: 36945366 PMCID: PMC10028795 DOI: 10.1101/2023.03.05.531179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
In the human genome, heterozygous sites are genomic positions with different alleles inherited from each parent. On average, there is a heterozygous site every 1-2 kilobases (kb). Resolving whether two alleles in neighboring heterozygous positions are physically linked-that is, phased-is possible with a short-read sequencer if the sequencing library captures long-range information. TELL-Seq is a library preparation method based on millions of barcoded micro-sized beads that enables instrument-free phasing of a whole human genome in a single PCR tube. TELL-Seq incorporates a unique molecular identifier (barcode) to the short reads generated from the same high-molecular-weight (HMW) DNA fragment (known as 'linked-reads'). However, genome-scale TELL-Seq is not cost-effective for applications focusing on a single locus or a few loci. Here, we present an optimized TELL-Seq protocol that enables the cost-effective phasing of enriched loci (targets) of varying sizes, purity levels, and heterozygosity. Targeted TELL-Seq maximizes linked-read efficiency and library yield while minimizing input requirements, fragment collisions on microbeads, and sequencing burden. To validate the targeted protocol, we phased seven 180-200 kb loci enriched by CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis, four 20 kb loci enriched by CRISPR/Cas9-mediated protection from exonuclease digestion, and six 2-13 kb loci amplified by PCR. The selected targets have clinical and research relevance (BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA). These analyses reveal that targeted TELL-Seq provides a reliable way of phasing allelic variants within targets (2-200 kb in length) with the low cost and high accuracy of short-read sequencing.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Peter L. Chang
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093 USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| |
Collapse
|
22
|
Viering DH, Hureaux M, Neveling K, Latta F, Kwint M, Blanchard A, Konrad M, Bindels RJ, Schlingmann KP, Vargas-Poussou R, de Baaij JH. Long-Read Sequencing Identifies Novel Pathogenic Intronic Variants in Gitelman Syndrome. J Am Soc Nephrol 2023; 34:333-345. [PMID: 36302598 PMCID: PMC10103101 DOI: 10.1681/asn.2022050627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 10/17/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Gitelman syndrome is a salt-losing tubulopathy characterized by hypokalemic alkalosis and hypomagnesemia. It is caused by homozygous recessive or compound heterozygous pathogenic variants in SLC12A3 , which encodes the Na + -Cl - cotransporter (NCC). In up to 10% of patients with Gitelman syndrome, current genetic techniques detect only one specific pathogenic variant. This study aimed to identify a second pathogenic variant in introns, splice sites, or promoters to increase the diagnostic yield. METHODS Long-read sequencing of SLC12A3 was performed in 67 DNA samples from individuals with suspected Gitelman syndrome in whom a single likely pathogenic or pathogenic variant was previously detected. In addition, we sequenced DNA samples from 28 individuals with one variant of uncertain significance or no candidate variant. Midigene splice assays assessed the pathogenicity of novel intronic variants. RESULTS A second likely pathogenic/pathogenic variant was identified in 45 (67%) patients. Those with two likely pathogenic/pathogenic variants had a more severe electrolyte phenotype than other patients. Of the 45 patients, 16 had intronic variants outside of canonic splice sites (nine variants, mostly deep intronic, six novel), whereas 29 patients had an exonic variant or canonic splice site variant. Midigene splice assays of the previously known c.1670-191C>T variant and intronic candidate variants demonstrated aberrant splicing patterns. CONCLUSION Intronic pathogenic variants explain an important part of the missing heritability in Gitelman syndrome. Long-read sequencing should be considered in diagnostic workflows for Gitelman syndrome.
Collapse
Affiliation(s)
- Daan H.H.M. Viering
- Department of Physiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Marguerite Hureaux
- Reference Center for Hereditary Kidney and Childhood Diseases (Maladies Rénales Héréditaires de l’Enfant et de l’Adulte, MARHEA), Paris, France
- Department of Genetics, Hôpital Européen Georges-Pompidou, Assistance Publique Hôpitaux de Paris, Paris, France
- Paris CardioVascular Research Center, Institut National de la Santé et de Recherche Médicale (INSERM) U970, Paris City University, Paris, France
| | - Kornelia Neveling
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Femke Latta
- Department of Physiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Michael Kwint
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Anne Blanchard
- Reference Center for Hereditary Kidney and Childhood Diseases (Maladies Rénales Héréditaires de l’Enfant et de l’Adulte, MARHEA), Paris, France
- Clinical Investigations Center, Hôpital Européen Georges-Pompidou, Assistance Publique Hôpitaux de Paris, Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, University of Paris, Centre National de la Recherche Scientifique (CNRS), Paris, France
| | - Martin Konrad
- Department of General Pediatrics, University Children’s Hospital, Münster, Germany
| | - René J.M. Bindels
- Department of Physiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | - Rosa Vargas-Poussou
- Reference Center for Hereditary Kidney and Childhood Diseases (Maladies Rénales Héréditaires de l’Enfant et de l’Adulte, MARHEA), Paris, France
- Department of Genetics, Hôpital Européen Georges-Pompidou, Assistance Publique Hôpitaux de Paris, Paris, France
- Clinical Investigations Center, Hôpital Européen Georges-Pompidou, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Jeroen H.F. de Baaij
- Department of Physiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
23
|
Chan AP, Choi Y, Rangan A, Zhang G, Podder A, Berens M, Sharma S, Pirrotte P, Byron S, Duggan D, Schork NJ. Interrogating the Human Diplome: Computational Methods, Emerging Applications, and Challenges. Methods Mol Biol 2023; 2590:1-30. [PMID: 36335489 DOI: 10.1007/978-1-0716-2819-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Human DNA sequencing protocols have revolutionized human biology, biomedical science, and clinical practice, but still have very important limitations. One limitation is that most protocols do not separate or assemble (i.e., "phase") the nucleotide content of each of the maternally and paternally derived chromosomal homologs making up the 22 autosomal pairs and the chromosomal pair making up the pseudo-autosomal region of the sex chromosomes. This has led to a dearth of studies and a consequent underappreciation of many phenomena of fundamental importance to basic and clinical genomic science. We discuss a few protocols for obtaining phase information as well as their limitations, including those that could be used in tumor phasing settings. We then describe a number of biological and clinical phenomena that require phase information. These include phenomena that require precise knowledge of the nucleotide sequence in a chromosomal segment from germline or somatic cells, such as DNA binding events, and insight into unique cis vs. trans-acting functionally impactful variant combinations-for example, variants implicated in a phenotype governed by compound heterozygosity. In addition, we also comment on the need for reliable and consensus-based diploid-context computational workflows for variant identification as well as the need for laboratory-based functional verification strategies for validating cis vs. trans effects of variant combinations. We also briefly describe available resources, example studies, as well as areas of further research, and ultimately argue that the science behind the study of human diploidy, referred to as "diplomics," which will be enabled by nucleotide-level resolution of phased genomes, is a logical next step in the analysis of human genome biology.
Collapse
Affiliation(s)
- Agnes P Chan
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Yongwook Choi
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Aditya Rangan
- Courant Institute of Mathematical Sciences at New York University, New York, NY, USA
| | - Guangfa Zhang
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Avijit Podder
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Michael Berens
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Sunil Sharma
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Patrick Pirrotte
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Sara Byron
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Dave Duggan
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA
- The City of Hope National Medical Center, Duarte, CA, USA
| | - Nicholas J Schork
- The Translational Genomics Research Institute (TGen), part of the City of Hope National Medical Center, Phoenix, AZ, USA.
- The City of Hope National Medical Center, Duarte, CA, USA.
| |
Collapse
|
24
|
Hoehe MR, Herwig R. Analysis of 1276 Haplotype-Resolved Genomes Allows Characterization of Cis- and Trans-Abundant Genes. Methods Mol Biol 2023; 2590:237-272. [PMID: 36335503 DOI: 10.1007/978-1-0716-2819-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many methods for haplotyping have materialized, but their application on a significant scale has been rare to date. Here we summarize analyses that were carried out in 1092 genomes from the 1000 Genomes Consortium and validated in an unprecedented number of 184 PGP genomes that have been experimentally haplotype-resolved by application of the Long-Fragment Read (LFR) technology. These analyses provided first insights into the diplotypic nature of human genomes and its potential functional implications. Thus, protein-changing variants were not randomly distributed between the two homologues of 18,121 autosomal protein-coding genes but occurred significantly more frequently in cis than in trans configurations in virtually each of the 1276 phased genomes. This resulted in global cis/trans ratios of ~60:40, establishing "cis abundance" as a universal characteristic of diploid human genomes. This phenomenon was based on two different classes of genes, a larger one exhibiting cis configurations of protein-changing variants in excess, so-called "cis-abundant" genes, and a smaller one of "trans-abundant" genes. These two gene classes, which together constitute a common diplotypic exome, were further functionally distinguished by means of gene ontology (GO) and pathway enrichment analysis. Moreover, they were distinguishable in terms of their effects on the human interactome, where they constitute distinct cis and trans modules, as shown with network propagation on a large integrated protein-protein interaction network. These analyses, recently performed with updated database and analysis tools, further consolidated the characterization of cis- and trans-abundant genes while expanding previous results. In this chapter, we present the key results along with the materials and methods to motivate readers to investigate these findings independently and gain further insights into the diplotypic nature of genes and genomes.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
25
|
Hu Y, Yang C, Zhang L, Zhou X. Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads. Methods Mol Biol 2023; 2590:161-182. [PMID: 36335499 DOI: 10.1007/978-1-0716-2819-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Phasing is essential for determining the origins of each set of alleles in the whole-genome sequencing data of individuals. As such, it provides essential information for the causes of hereditary diseases and the sources of individual variability. Recent technical breakthroughs in linked-read (referred to as co-barcoding in other chapters of the book) and long-read sequencing and downstream analysis have brought the goal of accurate and complete phasing within reach. Here we review recent progress related to the assembly and phasing of personal genomes based on linked-reads and related applications. Motivated by current limitations in generating high-quality diploid assemblies and detecting variants, a new suite of software tools, Aquila, was developed to fully take advantage of linked-read sequencing technology. The overarching goal of Aquila is to exploit the strengths of linked-read technology including long-range connectivity and inherent phasing of variants for reference-assisted local de novo assembly at the whole-genome scale. The diploid nature of the assemblies facilitates detection and phasing of genetic variation, including single nucleotide variations (SNVs), small insertions and deletions (indels), and structural variants (SVs). An extension of Aquila, Aquila_stLFR, focuses on another newly developed linked-reads sequencing technology, single-tube long-fragment read (stLFR). AquilaSV, a region-based diploid assembly approach, is used to characterize structural variants and can achieve diploid assembly in one target region at a time. Lastly, we introduce HAPDeNovo, a program that exploits phasing information from linked-read sequencing to improve detection of de novo mutations. Use of these tools is expected to harness the advantages of linked-reads technology, improve phasing, and advance variant discovery.
Collapse
Affiliation(s)
- Yunfei Hu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
| | - Xin Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Nashville, TN, USA.
| |
Collapse
|
26
|
Mastromatteo S, Chen A, Gong J, Lin F, Thiruvahindrapuram B, Sung WW, Whitney J, Wang Z, Patel RV, Keenan K, Halevy A, Panjwani N, Avolio J, Wang C, Côté-Maurais G, Bégin S, Adam D, Brochiero E, Bjornson C, Chilvers M, Price A, Parkins M, van Wylick R, Mateos-Corral D, Hughes D, Smith MJ, Morrison N, Tullis E, Stephenson AL, Wilcox P, Quon BS, Leung WM, Solomon M, Sun L, Ratjen F, Strug LJ. High-quality read-based phasing of cystic fibrosis cohort informs genetic understanding of disease modification. HGG ADVANCES 2023; 4:100156. [PMID: 36386424 PMCID: PMC9647008 DOI: 10.1016/j.xhgg.2022.100156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
Phasing of heterozygous alleles is critical for interpretation of cis-effects of disease-relevant variation. We sequenced 477 individuals with cystic fibrosis (CF) using linked-read sequencing, which display an average phase block N50 of 4.39 Mb. We use these samples to construct a graph representation of CFTR haplotypes, demonstrating its utility for understanding complex CF alleles. These are visualized in a Web app, CFTbaRcodes, that enables interactive exploration of CFTR haplotypes present in this cohort. We perform fine-mapping and phasing of the chr7q35 trypsinogen locus associated with CF meconium ileus, an intestinal obstruction at birth associated with more severe CF outcomes and pancreatic disease. A 20-kb deletion polymorphism and a PRSS2 missense variant p.Thr8Ile (rs62473563) are shown to independently contribute to meconium ileus risk (p = 0.0028, p = 0.011, respectively) and are PRSS2 pancreas eQTLs (p = 9.5 × 10−7 and p = 1.4 × 10−4, respectively), suggesting the mechanism by which these polymorphisms contribute to CF. The phase information from linked reads provides a putative causal explanation for variation at a CF-relevant locus, which also has implications for the genetic basis of non-CF pancreatitis, to which this locus has been reported to contribute.
Collapse
|
27
|
Noto K, Ruiz L. Accurate genome-wide phasing from IBD data. BMC Bioinformatics 2022; 23:502. [PMID: 36424541 PMCID: PMC9686111 DOI: 10.1186/s12859-022-05066-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/17/2022] [Indexed: 11/25/2022] Open
Abstract
As genotype databases increase in size, so too do the number of detectable segments of identity by descent (IBD): segments of the genome where two individuals share an identical copy of one of their two parental haplotypes, due to shared ancestry. We show that given a large enough genotype database, these segments of IBD collectively overlap entire chromosomes, including instances of IBD that span multiple chromosomes, and can be used to accurately separate the alleles inherited from each parent across the entire genome. The resulting phase is not an improvement over state-of-the-art local phasing methods, but provides accurate long-range phasing that indicates which of two haplotypes in different regions of the genome, including different chromosomes, was inherited from the same parent. We are able to separate the DNA inherited from each parent completely, across the entire genome, with 98% median accuracy in a test set of 30,000 individuals. We estimate the IBD data requirements for accurate genome-wide phasing, and we propose a method for estimating confidence in the resulting phase. We show that our methods do not require the genotypes of close family, and that they are robust to genotype errors and missing data. In fact, our method can impute missing data accurately and correct genotype errors.
Collapse
|
28
|
Zhou Y, Leung AWS, Ahmed SS, Lam TW, Luo R. Duet: SNP-assisted structural variant calling and phasing using Oxford nanopore sequencing. BMC Bioinformatics 2022; 23:465. [PMCID: PMC9639287 DOI: 10.1186/s12859-022-05025-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 10/29/2022] [Indexed: 11/09/2022] Open
Abstract
Background Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging.
Results We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools. Conclusion Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at https://github.com/yekaizhou/duet.
Collapse
Affiliation(s)
- Yekai Zhou
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Amy Wing-Sze Leung
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Syed Shakeel Ahmed
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Tak-Wah Lam
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Ruibang Luo
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
29
|
Holt GS, Batty LE, Alobaidi BKS, Smith HE, Oud MS, Ramos L, Xavier MJ, Veltman JA. Phasing of de novo mutations using a scaled-up multiple amplicon long-read sequencing approach. Hum Mutat 2022; 43:1545-1556. [PMID: 36047340 PMCID: PMC9826063 DOI: 10.1002/humu.24450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/11/2022] [Accepted: 08/18/2022] [Indexed: 01/11/2023]
Abstract
De novo mutations (DNMs) play an important role in severe genetic disorders that reduce fitness. To better understand their role in disease, it is important to determine the parent-of-origin and timing of mutational events that give rise to these mutations, especially in sex-specific developmental disorders such as male infertility. However, currently available short-read sequencing approaches are not ideally suited for phasing, as this requires long continuous DNA strands that span both the DNM and one or more informative single-nucleotide polymorphisms. To overcome these challenges, we optimized and implemented a multiplexed long-read sequencing approach using Oxford Nanopore technologies MinION platform. We focused on improving target amplification, integrating long-read sequenced data with high-quality short-read sequence data, and developing an anchored phasing computational method. This approach handled the inherent phasing challenges of long-range target amplification and the normal accumulation of sequencing error associated with long-read sequencing. In total, 77 of 109 DNMs (71%) were successfully phased and parent-of-origin identified. The majority of phased DNMs were prezygotic (90%), the accuracy of which is highlighted by an average mutant allele frequency of 49.6% and standard error of 0.84%. This study demonstrates the benefits of employing an integrated short-read and long-read sequencing approach for large-scale DNM phasing.
Collapse
Affiliation(s)
- Giles S. Holt
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Lois E. Batty
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Bilal K. S. Alobaidi
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Hannah E. Smith
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Manon S. Oud
- Department of Human Genetics, Donders Institute for BrainCognition and Behaviour, RadboudumcNijmegenThe Netherlands
| | - Liliana Ramos
- Department of Obstetrics and Gynecology, Division of Reproductive MedicineRadboudumcNijmegenThe Netherlands
| | - Miguel J. Xavier
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Joris A. Veltman
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| |
Collapse
|
30
|
Tilleman L, Rubben K, Van Criekinge W, Deforce D, Van Nieuwerburgh F. Haplotyping pharmacogenes using TLA combined with Illumina or Nanopore sequencing. Sci Rep 2022; 12:17734. [PMID: 36273027 PMCID: PMC9587992 DOI: 10.1038/s41598-022-22499-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 10/16/2022] [Indexed: 01/18/2023] Open
Abstract
The currently used pharmacogenetic genotyping assays offer limited haplotype information, which can potentially cause specific functional effects to be missed. This study tested if Targeted Locus Amplification (TLA), when using non-patient-specific primers combined with Illumina or Nanopore sequencing, can offer an advantage in terms of accurate phasing. The TLA method selectively amplifies and sequences entire genes based on crosslinking DNA in close physical proximity. This way, DNA fragments that were initially further apart in the genome are ligated into one molecule, making it possible to sequence distant variants within one short read. In this study, four pharmacogenes, CYP2D6, CYP2C19, CYP1A2 and BRCA1, were sequenced after enrichment using different primer pairs. Only 24% or 38% of the nucleotides mapped on target when using Illumina or Nanopore sequencing, respectively. With an average depth of more than 1000X for the regions of interest, none of the genes were entirely covered with either sequencing method. For three of the four genes, less than half of the variants were phased correctly compared to the reference. The Nanopore dataset with the optimized primer pair for CYP2D6 resulted in the correct haplotype, showing that this method can be used for reliable genotyping and phasing of pharmacogenes but does require patient-specific primer design and optimization to be effective.
Collapse
Affiliation(s)
- Laurentijn Tilleman
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium
| | - Kaat Rubben
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium
| | - Wim Van Criekinge
- Laboratory of Bioinformatics and Computational Genomics, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, 9000, Ghent, Belgium.
| |
Collapse
|
31
|
Lyu R, Tsui V, Crismani W, Liu R, Shim H, McCarthy D. sgcocaller and comapr: personalised haplotype assembly and comparative crossover map analysis using single-gamete sequencing data. Nucleic Acids Res 2022; 50:e118. [PMID: 36107768 PMCID: PMC9723612 DOI: 10.1093/nar/gkac764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/17/2022] [Accepted: 09/06/2022] [Indexed: 12/24/2022] Open
Abstract
Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through the availability of high-throughput single-cell sequencing technologies. However, high-throughput single-gamete data commonly have low depth of coverage per gamete, which challenges existing gamete-based haplotype phasing methods. In addition, haplotyping a large number of single gametes from high-throughput single-cell DNA sequencing data and constructing meiotic crossover profiles using existing methods requires intensive processing. Here, we introduce efficient software tools for the essential tasks of generating personalised haplotypes and calling crossovers in gametes from single-gamete DNA sequencing data (sgcocaller), and constructing, visualising, and comparing individualised crossover landscapes from single gametes (comapr). With additional data pre-possessing, the tools can also be applied to bulk-sequenced samples. We demonstrate that sgcocaller is able to generate impeccable phasing results for high-coverage datasets, on which it is more accurate and stable than existing methods, and also performs well on low-coverage single-gamete sequencing datasets for which current methods fail. Our tools achieve highly accurate results with user-friendly installation, comprehensive documentation, efficient computation times and minimal memory usage.
Collapse
Affiliation(s)
- Ruqian Lyu
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, 9 Princes Street, Fitzroy, Victoria 3065, Australia,Melbourne Integrative Genomics/School of Mathematics and Statistics, Faculty of Science, The University of Melbourne, Building 184, Royal Parade, Parkville, Victoria 3010, Australia
| | - Vanessa Tsui
- DNA Repair and Recombination Laboratory, St Vincent’s Institute of Medical Research, 9 Princes Street, Fitzroy, Victoria 3065, Australia,The Faculty of Medicine, Dentistry and Health Science, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Wayne Crismani
- DNA Repair and Recombination Laboratory, St Vincent’s Institute of Medical Research, 9 Princes Street, Fitzroy, Victoria 3065, Australia,The Faculty of Medicine, Dentistry and Health Science, The University of Melbourne, Melbourne, Victoria 3010, Australia
| | - Ruijie Liu
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, 9 Princes Street, Fitzroy, Victoria 3065, Australia
| | | | - Davis J McCarthy
- To whom correspondence should be addressed. Tel: +61 3 9231 2480; Fax: +61 3 9416 2676;
| |
Collapse
|
32
|
Ashton JJ, Seaby EG, Beattie RM, Ennis S. NOD2 in Crohn's disease- unfinished business. J Crohns Colitis 2022; 17:450-458. [PMID: 36006803 PMCID: PMC10069614 DOI: 10.1093/ecco-jcc/jjac124] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Indexed: 01/01/2023]
Abstract
Studies of Crohn's disease consistently implicate NOD2 as the most important gene in disease pathogenesis since first being identified in 2001. Since this point, genome-wide association, next-generation sequencing, and functional analyses have all confirmed a key role for NOD2, but despite this, NOD2 also has significant unresolved complexity. More recent studies have reinvigorated an early hypothesis that NOD2 may be a single-gene cause of disease, and the distinct ileal stricturing phenotype seen with NOD2-related disease presents an opportunity for personalised diagnosis, disease prediction and targeted therapy. The genomics of NOD2 has much that remains unknown, including the role of rare variation, phasing of variants across the haplotype block and the role of variation in the NOD2-regulatory regions. Here, we discuss the evidence and the unmet needs of NOD2-research, based on recently published evidence, and suggest methods that may meet these requirements.
Collapse
Affiliation(s)
- James J Ashton
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK.,Department of Paediatric Gastroenterology, Southampton Children's Hospital, Southampton, UK
| | - Eleanor G Seaby
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - R Mark Beattie
- Department of Paediatric Gastroenterology, Southampton Children's Hospital, Southampton, UK
| | - Sarah Ennis
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
33
|
Takamatsu G, Yanagi K, Koganebuchi K, Yoshida F, Lee JS, Toyama K, Hattori K, Katagiri C, Kondo T, Kunugi H, Kimura R, Kaname T, Matsushita M. Haplotype phasing of a bipolar disorder pedigree revealed rare multiple mutations of SPOCD1 gene in the 1p36-35 susceptibility locus. J Affect Disord 2022; 310:96-105. [PMID: 35504398 DOI: 10.1016/j.jad.2022.04.150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 04/12/2022] [Accepted: 04/26/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND The etiology of bipolar disorder (BD) is poorly understood. Considering the complexity of BD, pedigree-based sequencing studies focusing on haplotypes at specific loci may be practical to discover high-impact risk variants. This study comprehensively examined the haplotype sequence at 1p36-35 BD and recurrent depressive disorder (RDD) susceptibility loci. METHODS We surveyed BD families in Okinawa, Japan. We performed linkage analysis and determined the phased sequence of the affected haplotype using whole genome sequencing. We filtered rare missense variants on the haplotype. For validation, we conducted a case-control genetic association study on approximately 3000 Japanese subjects. RESULTS We identified a three-generation multiplex pedigree with BD and RDD. Strikingly, we identified a significant linkage with mood disorders (logarithm of odds [LOD] = 3.61) at 1p36-35, supported in other ancestry studies. Finally, we determined the entire sequence of the 6.4-Mb haplotype shared by all affected subjects. Moreover, we found a rare triplet of missense variants in the SPOCD1 gene on the haplotype. Notably, despite the rare frequency, one heterozygote with multiple SPOCD1 variants was identified in an independent set of 88 BD type I genotyping samples. LIMITATIONS The 1p36-35 sequence was obtained from only a single pedigree. The replicate sample was small. Short-read sequencing might miss structural variants. A polygenic risk score was not analyzed. CONCLUSION The 1p36-35 haplotype sequence may be valuable for future BD variant studies. In particular, SPOCD1 is a promising candidate gene and should be validated.
Collapse
Affiliation(s)
- Gakuya Takamatsu
- Department of Molecular and Cellular Physiology, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan; Department of Neuropsychiatry, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Kumiko Yanagi
- Department of Genome Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Kae Koganebuchi
- Advanced Medical Research Center, Faculty of Medicine, University of the Ryukyus, Okinawa, Japan; Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Fuyuko Yoshida
- Department of Mental Disorder Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan; Department of Behavioral Medicine, National Institute of Mental Health, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Jun-Seok Lee
- Department of Molecular and Cellular Physiology, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan; Advanced Medical Research Center, Faculty of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Kanako Toyama
- Department of Molecular and Cellular Physiology, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Kotaro Hattori
- Department of Mental Disorder Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan; Department of Bioresources, Medical Genome Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Chiaki Katagiri
- Department of Molecular and Cellular Physiology, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan; Department of Synbiotics, Institute for Genetic Medicine, Hokkaido University, Hokkaido, Japan
| | - Tsuyoshi Kondo
- Department of Neuropsychiatry, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Hiroshi Kunugi
- Department of Mental Disorder Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan; Department of Psychiatry, Teikyo University School of Medicine, Tokyo, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Tadashi Kaname
- Department of Genome Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Masayuki Matsushita
- Department of Molecular and Cellular Physiology, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan.
| |
Collapse
|
34
|
Direct Chromosomal Phasing: An Easy and Fast Approach for Broadening Prenatal Diagnostic Applicability. THALASSEMIA REPORTS 2022. [DOI: 10.3390/thalassrep12030011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The assignment of alleles to haplotypes in prenatal diagnostic assays has traditionally depended on family study analyses. However, this prevents the wide application of prenatal diagnosis based on haplotype analysis, especially in countries with dispersed populations. Here, we present an easy and fast approach using Droplet Digital PCR for the direct determination of haplotype blocks, overcoming the necessity for acquiring other family members’ genetic samples. We demonstrate this approach on nine families that were referred to our center for a prenatal diagnosis of β-thalassaemia using four highly polymorphic single nucleotide variations and the most common pathogenic β-thalassaemia variation in our population. Our approach resulted in the successful direct chromosomal phasing and haplotyping for all nine of the families analyzed, demonstrating a complete agreement with the haplotypes that are ascertained based on family trios. The clinical utility of this approach is envisaged to open the application of prenatal diagnosis for β-thalassaemia to all cases, while simultaneously providing a model for extending the prenatal diagnostic application of other monogenic diseases as well.
Collapse
|
35
|
Sakamoto Y, Miyake S, Oka M, Kanai A, Kawai Y, Nagasawa S, Shiraishi Y, Tokunaga K, Kohno T, Seki M, Suzuki Y, Suzuki A. Phasing analysis of lung cancer genomes using a long read sequencer. Nat Commun 2022; 13:3464. [PMID: 35710642 PMCID: PMC9203510 DOI: 10.1038/s41467-022-31133-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 06/02/2022] [Indexed: 12/14/2022] Open
Abstract
Chromosomal backgrounds of cancerous mutations still remain elusive. Here, we conduct the phasing analysis of non-small cell lung cancer specimens of 20 Japanese patients. By the combinatory use of short and long read sequencing data, we obtain long phased blocks of 834 kb in N50 length with >99% concordance rate. By analyzing the obtained phasing information, we reveal that several cancer genomes harbor regions in which mutations are unevenly distributed to either of two haplotypes. Large-scale chromosomal rearrangement events, which resemble chromothripsis events but have smaller scales, occur on only one chromosome, and these events account for the observed biased distributions. Interestingly, the events are characteristic of EGFR mutation-positive lung adenocarcinomas. Further integration of long read epigenomic and transcriptomic data reveal that haploid chromosomes are not always at equivalent transcriptomic/epigenomic conditions. Distinct chromosomal backgrounds are responsible for later cancerous aberrations in a haplotype-specific manner.
Collapse
Affiliation(s)
- Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Shuhei Miyake
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Miho Oka
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Ono Pharmaceutical Co., Ltd, Ibaraki, Japan
| | - Akinori Kanai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yosuke Kawai
- Genome Medical Science Project (Toyama), National Center for Global Health and Medicine, Tokyo, Japan
| | - Satoi Nagasawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project (Toyama), National Center for Global Health and Medicine, Tokyo, Japan
| | - Takashi Kohno
- Division of Genome Biology, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|
36
|
Personalized Medicine for the Critically Ill Patient: A Narrative Review. Processes (Basel) 2022. [DOI: 10.3390/pr10061200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Personalized Medicine (PM) is rapidly advancing in everyday medical practice. Technological advances allow researchers to reach patients more than ever with their discoveries. The critically ill patient is probably the most complex of all, and personalized medicine must make serious efforts to fulfill the desire to “treat the individual, not the disease”. The complexity of critically ill pathologies arises from the severe state these patients and from the deranged pathways of their diseases. PM constitutes the integration of basic research into clinical practice; however, to make this possible complex and voluminous data require processing through even more complex mathematical models. The result of processing biodata is a digitized individual, from which fragments of information can be extracted for specific purposes. With this review, we aim to describe the current state of PM technologies and methods and explore its application in critically ill patients, as well as some of the challenges associated with PM in intensive care from the perspective of economic, approval, and ethical issues. This review can help in understanding the complexity of, P.M.; the complex processes needed for its application in critically ill patients, the benefits that make the effort of implementation worthwhile, and the current challenges of PM.
Collapse
|
37
|
Wakita S, Hara M, Kitabatake Y, Kawatani K, Kurahashi H, Hashizume R. Experimental method for haplotype phasing across the entire length of chromosome 21 in trisomy 21 cells using a chromosome elimination technique. J Hum Genet 2022; 67:565-572. [PMID: 35637312 PMCID: PMC9510051 DOI: 10.1038/s10038-022-01049-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/25/2022] [Accepted: 05/12/2022] [Indexed: 11/09/2022]
Abstract
Modern sequencing technologies produce a single consensus sequence without distinguishing between homologous chromosomes. Haplotype phasing solves this limitation by identifying alleles on the maternal and paternal chromosomes. This information is critical for understanding gene expression models in genetic disease research. Furthermore, the haplotype phasing of three homologous chromosomes in trisomy cells is more complicated than that in disomy cells. In this study, we attempted the accurate and complete haplotype phasing of chromosome 21 in trisomy 21 cells. To separate homologs, we established three corrected disomy cell lines (ΔPaternal chromosome, ΔMaternal chromosome 1, and ΔMaternal chromosome 2) from trisomy 21 induced pluripotent stem cells by eliminating one chromosome 21 utilizing the Cre-loxP system. These cells were then whole-genome sequenced by a next-generation sequencer. By simply comparing the base information of the whole-genome sequence data at the same position between each corrected disomy cell line, we determined the base on the eliminated chromosome and performed phasing. We phased 51,596 single nucleotide polymorphisms (SNPs) on chromosome 21, randomly selected seven SNPs spanning the entire length of the chromosome, and confirmed that there was no contradiction by direct sequencing.
Collapse
Affiliation(s)
- Sachiko Wakita
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan
| | - Mari Hara
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan
| | - Yasuji Kitabatake
- Department of Pediatrics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan
| | - Keiji Kawatani
- Department of Pediatrics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan.,Department of Neuroscience, Mayo Clinic, Scottsdale, AZ, USA
| | - Hiroki Kurahashi
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Toyoake, Japan
| | - Ryotaro Hashizume
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan. .,Department of Genomic Medicine, Mie University Hospital, Mie, Japan.
| |
Collapse
|
38
|
Implementation of CYP2D6 copy-number imputation panel and frequency of key pharmacogenetic variants in Finnish individuals with a psychotic disorder. THE PHARMACOGENOMICS JOURNAL 2022; 22:166-172. [PMID: 35197553 PMCID: PMC9151384 DOI: 10.1038/s41397-022-00270-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 02/01/2022] [Accepted: 02/08/2022] [Indexed: 11/08/2022]
Abstract
We demonstrate that CYP2D6 copy-number variation (CNV) can be imputed using existing imputation algorithms. Additionally, we report frequencies of key pharmacogenetic variants in individuals with a psychotic disorder from the genetically bottle-necked population of Finland. We combined GWAS chip and CYP2D6 CNV data from the Breast Cancer Pain Genetics study to construct an imputation panel (n = 902) for CYP2D6 CNV. The resulting data set was used as a CYP2D6 CNV imputation panel in 9262 non-related individuals from the SUPER-Finland study. Based on imputation of 9262 individuals we confirm the higher frequency of CYP2D6 ultrarapid metabolizers and a 22-fold enrichment of the UGT1A1 decreased function variant rs4148323 (UGT1A1*6) in Finland compared with non-Finnish Europeans. Similarly, the NUDT15 variant rs116855232 was highly enriched in Finland. We demonstrate that imputation of CYP2D6 CNV is possible and the methodology enables studying CYP2D6 in large biobanks with genome-wide data.
Collapse
|
39
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
AbstractRare diseases affect 30 million people in the USA and more than 300–400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25–35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
|
40
|
Dilernia D, Amin P, Flores J, Stecenko A, Sorscher E. Mutation profiling of the c.1521_1523delCTT (p.Phe508del, F508del) CFTR allele using haplotype-resolved long-read next generation sequencing. Hum Mutat 2022; 43:595-603. [PMID: 35170824 DOI: 10.1002/humu.24352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 02/07/2022] [Accepted: 02/14/2022] [Indexed: 11/09/2022]
Abstract
Current approaches to characterize the mutational profile of the cystic fibrosis transmembrane conductance regulator (CFTR) gene are based on targeted mutation analysis (TMA) or whole gene studies derived from short-read next generation sequencing (NGS). However, these methods lack phasing capability which, in certain scenarios, can provide clinically valuable information. In the present work, we performed near-full length CFTR using Single-Molecule Real-Time Sequencing to produce haplotype-resolved data from both homozygous and heterozygous individuals for mutation c.1521_1523delCTT (p.Phe508del, F508del). This approach utilizes target enrichment of the CFTR gene using biotinylated probes, facilitates multiplexing samples in the same sequencing run, and utilizes fully-automated bioinformatics pipelines for error correction and variant calling. We show a remarkable conservation of F508del haplotype, consistent with the single gene founder effect, as well as diverse mutational profiles in non-F508del alleles. By the same method, 105 single nucleotide polymorphisms (SNPs) exhibiting invariant linkage to F508del CFTR (which better define the founder haplotype) were identified. High level homology between F508del sequences derived from heterozygotes, and those obtained from homozygous individuals, demonstrate accuracy of this method to produce haplotype resolved sequencing. The studies provide a new diagnostic technology for detailed analysis of complex CFTR alleles linked to disease severity. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Dario Dilernia
- Department of Pathology, School of Medicine, Emory University.,Emory Vaccine Center, Emory University
| | | | - Julie Flores
- Department of Pediatrics, School of Medicine, Emory University, and the Emory + Children's Center for Cystic Fibrosis and Airways Disease Research
| | - Arlene Stecenko
- Department of Pediatrics, School of Medicine, Emory University, and the Emory + Children's Center for Cystic Fibrosis and Airways Disease Research
| | - Eric Sorscher
- Department of Pediatrics, School of Medicine, Emory University, and the Emory + Children's Center for Cystic Fibrosis and Airways Disease Research
| |
Collapse
|
41
|
Benchmarking phasing software with a whole-genome sequenced cattle pedigree. BMC Genomics 2022; 23:130. [PMID: 35164677 PMCID: PMC8845340 DOI: 10.1186/s12864-022-08354-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 01/24/2022] [Indexed: 12/30/2022] Open
Abstract
Background Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. Results After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. Conclusions We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08354-6.
Collapse
|
42
|
Peltz G, Tan Y. What Have We Learned (or Expect to) From Analysis of Murine Genetic Models Related to Substance Use Disorders? Front Psychiatry 2022; 12:793961. [PMID: 35095607 PMCID: PMC8790171 DOI: 10.3389/fpsyt.2021.793961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/09/2021] [Indexed: 11/29/2022] Open
Abstract
The tremendous public health problem created by substance use disorders (SUDs) presents a major opportunity for mouse genetics. Inbred mouse strains exhibit substantial and heritable differences in their responses to drugs of abuse (DOA) and in many of the behaviors associated with susceptibility to SUD. Therefore, genetic discoveries emerging from analysis of murine genetic models can provide critically needed insight into the neurobiological effects of DOA, and they can reveal how genetic factors affect susceptibility drug addiction. There are already indications, emerging from our prior analyses of murine genetic models of responses related to SUDs that mouse genetic models of SUD can provide actionable information, which can lead to new approaches for alleviating SUDs. Lastly, we consider the features of murine genetic models that enable causative genetic factors to be successfully identified; and the methodologies that facilitate genetic discovery.
Collapse
Affiliation(s)
- Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA, United States
| | | |
Collapse
|
43
|
Yu Y, Chen L, Miao X, Li SC. SpecHap: a diploid phasing algorithm based on spectral graph theory. Nucleic Acids Res 2021; 49:e114. [PMID: 34403470 PMCID: PMC8565328 DOI: 10.1093/nar/gkab709] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 07/25/2021] [Accepted: 08/02/2021] [Indexed: 11/30/2022] Open
Abstract
Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms.
Collapse
Affiliation(s)
- Yonghan Yu
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Lingxi Chen
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Xinyao Miao
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Shuai Cheng Li
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| |
Collapse
|
44
|
Bhat JA, Yu D, Bohra A, Ganie SA, Varshney RK. Features and applications of haplotypes in crop breeding. Commun Biol 2021; 4:1266. [PMID: 34737387 PMCID: PMC8568931 DOI: 10.1038/s42003-021-02782-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 10/09/2021] [Indexed: 12/17/2022] Open
Abstract
Climate change with altered pest-disease dynamics and rising abiotic stresses threatens resource-constrained agricultural production systems worldwide. Genomics-assisted breeding (GAB) approaches have greatly contributed to enhancing crop breeding efficiency and delivering better varieties. Fast-growing capacity and affordability of DNA sequencing has motivated large-scale germplasm sequencing projects, thus opening exciting avenues for mining haplotypes for breeding applications. This review article highlights ways to mine haplotypes and apply them for complex trait dissection and in GAB approaches including haplotype-GWAS, haplotype-based breeding, haplotype-assisted genomic selection. Improvement strategies that efficiently deploy superior haplotypes to hasten breeding progress will be key to safeguarding global food security.
Collapse
Affiliation(s)
- Javaid Akhter Bhat
- National Center for Soybean Improvement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Deyue Yu
- National Center for Soybean Improvement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Abhishek Bohra
- Crop Improvement Division, ICAR- Indian Institute of Pulses Research (ICAR- IIPR), Kanpur, India
| | - Showkat Ahmad Ganie
- Department of Biotechnology, Visva-Bharati, Santiniketan, 731235, WB, India.
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, 502324, India.
- State Agricultural Biotechnology Centre, Centre for Crop & Food Research Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia.
| |
Collapse
|
45
|
Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P, Jain M, Carroll A, Paten B. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods 2021; 18:1322-1332. [PMID: 34725481 PMCID: PMC8571015 DOI: 10.1038/s41592-021-01299-w] [Citation(s) in RCA: 106] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 09/06/2021] [Indexed: 01/15/2023]
Abstract
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).
Collapse
Affiliation(s)
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | | | | | | | | | | | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Miten Jain
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | |
Collapse
|
46
|
Luo X, Kang X, Schönhuth A. phasebook: haplotype-aware de novo assembly of diploid genomes from long reads. Genome Biol 2021; 22:299. [PMID: 34706745 PMCID: PMC8549298 DOI: 10.1186/s13059-021-02512-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 10/05/2021] [Indexed: 01/27/2023] Open
Abstract
Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly. However, current long-read assemblers are either reference based, so introduce biases, or fail to capture the haplotype diversity of diploid genomes. We present phasebook, a de novo approach for reconstructing the haplotypes of diploid genomes from long reads. phasebook outperforms other approaches in terms of haplotype coverage by large margins, in addition to achieving competitive performance in terms of assembly errors and assembly contiguity.
Collapse
Affiliation(s)
- Xiao Luo
- Life Science & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Xiongbin Kang
- Life Science & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Alexander Schönhuth
- Life Science & Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
47
|
Wang M, Fang Z, Yoo B, Bejerano G, Peltz G. The Effect of Population Structure on Murine Genome-Wide Association Studies. Front Genet 2021; 12:745361. [PMID: 34589118 PMCID: PMC8475632 DOI: 10.3389/fgene.2021.745361] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/25/2021] [Indexed: 12/14/2022] Open
Abstract
The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 - 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.
Collapse
Affiliation(s)
- Meiyue Wang
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| | - Zhuoqing Fang
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| | - Boyoung Yoo
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, United States
| | - Gill Bejerano
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, United States.,Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, United States.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, United States.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, United States
| | - Gary Peltz
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| |
Collapse
|
48
|
Ewalt MD, Hsiao SJ. Molecular Methods: Clinical Utilization and Designing a Test Menu. Surg Pathol Clin 2021; 14:359-368. [PMID: 34373088 DOI: 10.1016/j.path.2021.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Pre-analytical factors in molecular oncology diagnostics are reviewed. Issues around sample collection, storage, and transport that might affect the stability of nucleic acids and the ability to perform molecular testing are addressed. In addition, molecular methods used commonly in clinical diagnostic laboratories, including newer technologies such as next-generation sequencing and digital droplet polymerase chain reaction, as well as their applications, are reviewed. Finally, we discuss considerations in designing a molecular test menu to deliver accurate and timely results in an efficient and cost-effective manner.
Collapse
Affiliation(s)
- Mark D Ewalt
- Department of Pathology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, S-618, New York, NY 10065, USA
| | - Susan J Hsiao
- Department of Pathology & Cell Biology, Columbia University Medical Center, 630 West 168th Street, P&S11-453, New York, NY 10032, USA.
| |
Collapse
|
49
|
Cooke DP, Wedge DC, Lunter G. A unified haplotype-based method for accurate and comprehensive variant calling. Nat Biotechnol 2021; 39:885-892. [PMID: 33782612 PMCID: PMC7611855 DOI: 10.1038/s41587-021-00861-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 02/18/2021] [Indexed: 01/31/2023]
Abstract
Almost all haplotype-based variant callers were designed specifically for detecting common germline variation in diploid populations, and give suboptimal results in other scenarios. Here we present Octopus, a variant caller that uses a polymorphic Bayesian genotyping model capable of modeling sequencing data from a range of experimental designs within a unified haplotype-aware framework. Octopus combines sequencing reads and prior information to phase-called genotypes of arbitrary ploidy, including those with somatic mutations. We show that Octopus accurately calls germline variants in individuals, including single nucleotide variants, indels and small complex replacements such as microinversions. Using a synthetic tumor data set derived from clean sequencing data from a sample with known germline haplotypes and observed mutations in a large cohort of tumor samples, we show that Octopus is more sensitive to low-frequency somatic variation, yet calls considerably fewer false positives than other methods. Octopus also outputs realigned evidence BAM files to aid validation and interpretation.
Collapse
Affiliation(s)
- Daniel P Cooke
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
| | - David C Wedge
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Department of Epidemiology, University Medical Centre Groningen, Groningen, the Netherlands
| |
Collapse
|
50
|
Haplotype tagging reveals parallel formation of hybrid races in two butterfly species. Proc Natl Acad Sci U S A 2021; 118:2015005118. [PMID: 34155138 PMCID: PMC8237668 DOI: 10.1073/pnas.2015005118] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A defining goal in genetics is linking variation in DNA sequence to trait evolution between populations and, ultimately, species. Genome sequencing efficiently captures such variation but typically in millions of tiny fragments that omit haplotype or linkage information. We present “haplotagging,” a simple, rapid linked-read sequencing technique that allows high-throughput sequencing without sacrificing haplotype information. We validated this affordable approach for whole-genome haplotyping in large populations. We used haplotagging to investigate the rise of a novel hybrid morph in parallel hybrid zones of two comimetic Heliconius butterfly species in Ecuador. Our results reveal that strikingly parallel divergences in their genomes produced coordinated shifts in haplotype frequencies across the hybrid zone, giving rise to comimetic hybrid morphs in each species. Genetic variation segregates as linked sets of variants or haplotypes. Haplotypes and linkage are central to genetics and underpin virtually all genetic and selection analysis. Yet, genomic data often omit haplotype information due to constraints in sequencing technologies. Here, we present “haplotagging,” a simple, low-cost linked-read sequencing technique that allows sequencing of hundreds of individuals while retaining linkage information. We apply haplotagging to construct megabase-size haplotypes for over 600 individual butterflies (Heliconius erato and H. melpomene), which form overlapping hybrid zones across an elevational gradient in Ecuador. Haplotagging identifies loci controlling distinctive high- and lowland wing color patterns. Divergent haplotypes are found at the same major loci in both species, while chromosome rearrangements show no parallelism. Remarkably, in both species, the geographic clines for the major wing-pattern loci are displaced by 18 km, leading to the rise of a novel hybrid morph in the center of the hybrid zone. We propose that shared warning signaling (Müllerian mimicry) may couple the cline shifts seen in both species and facilitate the parallel coemergence of a novel hybrid morph in both comimetic species. Our results show the power of efficient haplotyping methods when combined with large-scale sequencing data from natural populations.
Collapse
|