1
|
Kinnersley B, Sud A, Everall A, Cornish AJ, Chubb D, Culliford R, Gruber AJ, Lärkeryd A, Mitsopoulos C, Wedge D, Houlston R. Analysis of 10,478 cancer genomes identifies candidate driver genes and opportunities for precision oncology. Nat Genet 2024:10.1038/s41588-024-01785-9. [PMID: 38890488 DOI: 10.1038/s41588-024-01785-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 05/01/2024] [Indexed: 06/20/2024]
Abstract
Tumor genomic profiling is increasingly seen as a prerequisite to guide the treatment of patients with cancer. To explore the value of whole-genome sequencing (WGS) in broadening the scope of cancers potentially amenable to a precision therapy, we analysed whole-genome sequencing data on 10,478 patients spanning 35 cancer types recruited to the UK 100,000 Genomes Project. We identified 330 candidate driver genes, including 74 that are new to any cancer. We estimate that approximately 55% of patients studied harbor at least one clinically relevant mutation, predicting either sensitivity or resistance to certain treatments or clinical trial eligibility. By performing computational chemogenomic analysis of cancer mutations we identify additional targets for compounds that represent attractive candidates for future clinical trials. This study represents one of the most comprehensive efforts thus far to identify cancer driver genes in the real world setting and assess their impact on informing precision oncology.
Collapse
Affiliation(s)
- Ben Kinnersley
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- University College London Cancer Institute, University College London, London, UK
| | - Amit Sud
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Andrew Everall
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Alex J Cornish
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Daniel Chubb
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Richard Culliford
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Andreas J Gruber
- Systems Biology & Biomedical Data Science Laboratory, University of Konstanz, Konstanz, Germany
| | - Adrian Lärkeryd
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Costas Mitsopoulos
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - David Wedge
- Manchester Cancer Research Centre, University of Manchester, Manchester, UK
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.
| |
Collapse
|
2
|
Yang J, Wang DF, Huang JH, Zhu QH, Luo LY, Lu R, Xie XL, Salehian-Dehkordi H, Esmailizadeh A, Liu GE, Li MH. Structural variant landscapes reveal convergent signatures of evolution in sheep and goats. Genome Biol 2024; 25:148. [PMID: 38845023 PMCID: PMC11155191 DOI: 10.1186/s13059-024-03288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 05/21/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Sheep and goats have undergone domestication and improvement to produce similar phenotypes, which have been greatly impacted by structural variants (SVs). Here, we report a high-quality chromosome-level reference genome of Asiatic mouflon, and implement a comprehensive analysis of SVs in 897 genomes of worldwide wild and domestic populations of sheep and goats to reveal genetic signatures underlying convergent evolution. RESULTS We characterize the SV landscapes in terms of genetic diversity, chromosomal distribution and their links with genes, QTLs and transposable elements, and examine their impacts on regulatory elements. We identify several novel SVs and annotate corresponding genes (e.g., BMPR1B, BMPR2, RALYL, COL21A1, and LRP1B) associated with important production traits such as fertility, meat and milk production, and wool/hair fineness. We detect signatures of selection involving the parallel evolution of orthologous SV-associated genes during domestication, local environmental adaptation, and improvement. In particular, we find that fecundity traits experienced convergent selection targeting the gene BMPR1B, with the DEL00067921 deletion explaining ~10.4% of the phenotypic variation observed in goats. CONCLUSIONS Our results provide new insights into the convergent evolution of SVs and serve as a rich resource for the future improvement of sheep, goats, and related livestock.
Collapse
Affiliation(s)
- Ji Yang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Feng Wang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Jia-Hui Huang
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Qiang-Hui Zhu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ling-Yun Luo
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ran Lu
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Meng-Hua Li
- State Key Laboratory of Animal Biotech Breeding, China Agricultural University, Beijing, 100193, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
3
|
Xie M, Zheng ZJ, Zhou Y, Zhang YX, Li Q, Tian LY, Cao J, Xu YT, Ren J, Yu Q, Wu SS, Fang S, Zhuang DY, Geng J, Chen CS, Li HB. Prospective Investigation of Optical Genome Mapping for Prenatal Genetic Diagnosis. Clin Chem 2024; 70:820-829. [PMID: 38517460 DOI: 10.1093/clinchem/hvae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/22/2024] [Indexed: 03/23/2024]
Abstract
BACKGROUND Optical genome mapping (OGM) is a novel assay for detecting structural variants (SVs) and has been retrospectively evaluated for its performance. However, its prospective evaluation in prenatal diagnosis remains unreported. This study aimed to prospectively assess the technical concordance of OGM with standard of care (SOC) testing in prenatal diagnosis. METHODS A prospective cohort of 204 pregnant women was enrolled in this study. Amniotic fluid samples from these women were subjected to OGM and SOC testing, which included chromosomal microarray analysis (CMA) and karyotyping (KT) in parallel. The diagnostic yield of OGM was evaluated, and the technical concordance between OGM and SOC testing was assessed. RESULTS OGM successfully analyzed 204 cultured amniocyte samples, even with a cell count as low as 0.24 million. In total, 60 reportable SVs were identified through combined OGM and SOC testing, with 22 SVs detected by all 3 techniques. The diagnostic yield for OGM, CMA, and KT was 25% (51/204), 22.06% (45/204), and 18.14% (37/204), respectively. The highest diagnostic yield (29.41%, 60/204) was achieved when OGM and KT were used together. OGM demonstrated a concordance of 95.56% with CMA and 75.68% with KT in this cohort study. CONCLUSIONS Our findings suggest that OGM can be effectively applied in prenatal diagnosis using cultured amniocytes and exhibits high concordance with SOC testing. The combined use of OGM and KT appears to yield the most promising diagnostic outcomes.
Collapse
Affiliation(s)
- Min Xie
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, Ningbo, China
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| | - Zhao-Jing Zheng
- Laboratory of Cytogenetics & Cytogenomics, Hangzhou Juno Genomics Inc., Hangzhou, China
| | - Ying Zhou
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, Ningbo, China
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| | - Yu-Xin Zhang
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, Ningbo, China
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| | - Qiong Li
- Prenatal and Neonatal Screening Center, Ningbo Women and Children's Hospital, Ningbo, China
| | - Li-Yun Tian
- Fetal Medicine Centre, Ningbo Women and Children's Hospital, Ningbo, China
| | - Juan Cao
- Fetal Medicine Centre, Ningbo Women and Children's Hospital, Ningbo, China
| | - Yan-Ting Xu
- Laboratory of Cytogenetics & Cytogenomics, Hangzhou Juno Genomics Inc., Hangzhou, China
| | - Jie Ren
- Laboratory of Cytogenetics & Cytogenomics, Hangzhou Juno Genomics Inc., Hangzhou, China
| | - Qi Yu
- Prenatal and Neonatal Screening Center, Ningbo Women and Children's Hospital, Ningbo, China
| | - Shan-Shan Wu
- Paediatric Surgery Centre, Ningbo Women and Children's Hospital, Ningbo, China
| | - Shu Fang
- Laboratory of Cytogenetics & Cytogenomics, Hangzhou Juno Genomics Inc., Hangzhou, China
| | - Dan-Yan Zhuang
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, Ningbo, China
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| | - Juan Geng
- Laboratory of Cytogenetics & Cytogenomics, Hangzhou Juno Genomics Inc., Hangzhou, China
| | - Chang-Shui Chen
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| | - Hai-Bo Li
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, Ningbo, China
- Ningbo Key Laboratory for the Prevention and Treatment of Embryogenic Diseases, Ningbo Women and Children's Hospital, Ningbo, China
| |
Collapse
|
4
|
Kernohan KD, Boycott KM. The expanding diagnostic toolbox for rare genetic diseases. Nat Rev Genet 2024; 25:401-415. [PMID: 38238519 DOI: 10.1038/s41576-023-00683-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2023] [Indexed: 05/23/2024]
Abstract
Genomic technologies, such as targeted, exome and short-read genome sequencing approaches, have revolutionized the care of patients with rare genetic diseases. However, more than half of patients remain without a diagnosis. Emerging approaches from research-based settings such as long-read genome sequencing and optical genome mapping hold promise for improving the identification of disease-causal genetic variants. In addition, new omic technologies that measure the transcriptome, epigenome, proteome or metabolome are showing great potential for variant interpretation. As genetic testing options rapidly expand, the clinical community needs to be mindful of their individual strengths and limitations, as well as remaining challenges, to select the appropriate diagnostic test, correctly interpret results and drive innovation to address insufficiencies. If used effectively - through truly integrative multi-omics approaches and data sharing - the resulting large quantities of data from these established and emerging technologies will greatly improve the interpretative power of genetic and genomic diagnostics for rare diseases.
Collapse
Affiliation(s)
- Kristin D Kernohan
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada
- Newborn Screening Ontario, CHEO, Ottawa, ON, Canada
| | - Kym M Boycott
- CHEO Research Institute, University of Ottawa, Ottawa, ON, Canada.
- Department of Genetics, CHEO, Ottawa, ON, Canada.
| |
Collapse
|
5
|
LeMaster C, Schwendinger-Schreck C, Ge B, Cheung WA, McLennan R, Johnston JJ, Pastinen T, Smail C. Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.15.24304216. [PMID: 38562793 PMCID: PMC10984062 DOI: 10.1101/2024.03.15.24304216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22, 019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1×10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.
Collapse
Affiliation(s)
- Cas LeMaster
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Carl Schwendinger-Schreck
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Bing Ge
- McGill University, Montreal, Quebec, Canada
| | - Warren A. Cheung
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Rebecca McLennan
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Jeffrey J. Johnston
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Tomi Pastinen
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Craig Smail
- Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
| |
Collapse
|
6
|
Gunasekaran D, Ardell DH, Nobile CJ. SNP-SVant: A Computational Workflow to Predict and Annotate Genomic Variants in Organisms Lacking Benchmarked Variants. Curr Protoc 2024; 4:e1046. [PMID: 38717471 PMCID: PMC11081530 DOI: 10.1002/cpz1.1046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.
Collapse
Affiliation(s)
- Deepika Gunasekaran
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| | - David H. Ardell
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| | - Clarissa J. Nobile
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
- Health Science Research Institute, University of California, Merced, CA, USA
| |
Collapse
|
7
|
Ten Berk de Boer E, Ameur A, Bunikis I, Ek M, Stattin EL, Feuk L, Eisfeldt J, Lindstrand A. Long-read sequencing and optical mapping generates near T2T assemblies that resolves a centromeric translocation. Sci Rep 2024; 14:9000. [PMID: 38637641 PMCID: PMC11026446 DOI: 10.1038/s41598-024-59683-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/13/2024] [Indexed: 04/20/2024] Open
Abstract
Long-read genome sequencing (lrGS) is a promising method in genetic diagnostics. Here we investigate the potential of lrGS to detect a disease-associated chromosomal translocation between 17p13 and the 19 centromere. We constructed two sets of phased and non-phased de novo assemblies; (i) based on lrGS only and (ii) hybrid assemblies combining lrGS with optical mapping using lrGS reads with a median coverage of 34X. Variant calling detected both structural variants (SVs) and small variants and the accuracy of the small variant calling was compared with those called with short-read genome sequencing (srGS). The de novo and hybrid assemblies had high quality and contiguity with N50 of 62.85 Mb, enabling a near telomere to telomere assembly with less than a 100 contigs per haplotype. Notably, we successfully identified the centromeric breakpoint of the translocation. A concordance of 92% was observed when comparing small variant calling between srGS and lrGS. In summary, our findings underscore the remarkable potential of lrGS as a comprehensive and accurate solution for the analysis of SVs and small variants. Thus, lrGS could replace a large battery of genetic tests that were used for the diagnosis of a single symptomatic translocation carrier, highlighting the potential of lrGS in the realm of digital karyotyping.
Collapse
Affiliation(s)
- Esmee Ten Berk de Boer
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Marlene Ek
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| | - Eva-Lena Stattin
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, 752 36, Uppsala, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden.
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden.
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 65, Solna, Sweden.
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76, Stockholm, Sweden
| |
Collapse
|
8
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
9
|
Du ZZ, He JB, Jiao WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 2024; 25:91. [PMID: 38589937 PMCID: PMC11003132 DOI: 10.1186/s13059-024-03239-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
Collapse
Affiliation(s)
- Ze-Zhen Du
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jia-Bao He
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wen-Biao Jiao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
| |
Collapse
|
10
|
David G, Bertolotti A, Layer R, Scofield D, Hayward A, Baril T, Burnett HA, Gudmunds E, Jensen H, Husby A. Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations. Genome Biol Evol 2024; 16:evae049. [PMID: 38489588 PMCID: PMC11018544 DOI: 10.1093/gbe/evae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/28/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024] Open
Abstract
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read-discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
Collapse
Affiliation(s)
- Gabriel David
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | | | - Ryan Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Douglas Scofield
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Alexander Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK
| | - Hamish A Burnett
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Gudmunds
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Henrik Jensen
- Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Arild Husby
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| |
Collapse
|
11
|
Li X, Liu Q, Fu C, Li M, Li C, Li X, Zhao S, Zheng Z. Characterizing structural variants based on graph-genotyping provides insights into pig domestication and local adaption. J Genet Genomics 2024; 51:394-406. [PMID: 38056526 DOI: 10.1016/j.jgg.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Structural variants (SVs), such as deletions (DELs) and insertions (INSs), contribute substantially to pig genetic diversity and phenotypic variation. Using a library of SVs discovered from long-read primary assemblies and short-read sequenced genomes, we map pig genomic SVs with a graph-based method for re-genotyping SVs in 402 genomes. Our results demonstrate that those SVs harboring specific trait-associated genes may greatly shape pig domestication and local adaptation. Further characterization of SVs reveals that some population-stratified SVs may alter the transcription of genes by affecting regulatory elements. We identify that the genotypes of two DELs (296-bp DEL, chr7: 52,172,101-52,172,397; 278-bp DEL, chr18: 23,840,143-23,840,421) located in muscle-specific enhancers are associated with the expression of target genes related to meat quality (FSD2) and muscle fiber hypertrophy (LMOD2 and WASL) in pigs. Our results highlight the role of SVs in domestic porcine evolution, and the identified candidate functional genes and SVs are valuable resources for future genomic research and breeding programs in pigs.
Collapse
Affiliation(s)
- Xin Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Quan Liu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Chong Fu
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Mengxun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Changchun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China
| | - Xinyun Li
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Shuhong Zhao
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; The Cooperative Innovation Center for Sustainable Pig Production, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| | - Zhuqing Zheng
- Key Lab of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education and Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Institute of Agricultural Biotechnology, Jingchu University of Technology, Jingmen, Hubei 448000, China.
| |
Collapse
|
12
|
Jiang J, Xu YC, Zhang ZQ, Chen JF, Niu XM, Hou XH, Li XT, Wang L, Zhang YE, Ge S, Guo YL. Forces driving transposable element load variation during Arabidopsis range expansion. THE PLANT CELL 2024; 36:840-862. [PMID: 38036296 PMCID: PMC10980350 DOI: 10.1093/plcell/koad296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 12/02/2023]
Abstract
Genetic load refers to the accumulated and potentially life-threatening deleterious mutations in populations. Understanding the mechanisms underlying genetic load variation of transposable element (TE) insertion, a major large-effect mutation, during range expansion is an intriguing question in biology. Here, we used 1,115 global natural accessions of Arabidopsis (Arabidopsis thaliana) to study the driving forces of TE load variation during its range expansion. TE load increased with range expansion, especially in the recently established Yangtze River basin population. Effective population size, which explains 62.0% of the variance in TE load, high transposition rate, and selective sweeps contributed to TE accumulation in the expanded populations. We genetically mapped and identified multiple candidate causal genes and TEs, and revealed the genetic architecture of TE load variation. Overall, this study reveals the variation in TE genetic load during Arabidopsis expansion and highlights the causes of TE load variation from the perspectives of both population genetics and quantitative genetics.
Collapse
Affiliation(s)
- Juan Jiang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Zhi-Qin Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jia-Fu Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Min Niu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xing-Hui Hou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xin-Tong Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Li Wang
- Agricultural Synthetic Biology Center, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| | - Yong E Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents & Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
13
|
Joe S, Park JL, Kim J, Kim S, Park JH, Yeo MK, Lee D, Yang JO, Kim SY. Comparison of structural variant callers for massive whole-genome sequence data. BMC Genomics 2024; 25:318. [PMID: 38549092 PMCID: PMC10976732 DOI: 10.1186/s12864-024-10239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.
Collapse
Grants
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NRF-2020M3E5D708517212, 2020M3A9I6A0103605713 Ministry of Science and ICT, South Korea
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
- NTIS-1711170620 KRIBB Research Initiative Program
Collapse
Affiliation(s)
- Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jong-Lyul Park
- Aging Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Functional Genomics, University of Science and Technology (UST), 34113, Daejeon, Republic of Korea
| | - Jun Kim
- Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Sangok Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea
| | - Min-Kyung Yeo
- Department of Pathology, Chungnam National University School of Medicine, Daejeon, 35015, Republic of Korea
| | - Dongyoon Lee
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| | - Seon-Young Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioscience, University of Science and Technology (UST), Daejeon, 34113, Republic of Korea.
| |
Collapse
|
14
|
Wang S, Zhu X, Wang X, Liu Y, Zhao M, Chang Z, Wang X, Shao Y, Wang J. TMBstable: a variant caller controls performance variation across heterogeneous sequencing samples. Brief Bioinform 2024; 25:bbae159. [PMID: 38632951 PMCID: PMC11024516 DOI: 10.1093/bib/bbae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/31/2024] [Accepted: 03/25/2024] [Indexed: 04/19/2024] Open
Abstract
In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.
Collapse
Affiliation(s)
- Shenjie Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xuwen Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yuqian Liu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| | - Minchao Zhao
- Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China
| | - Zhili Chang
- Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China
| | - Xiaonan Wang
- Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China
| | - Yang Shao
- Nanjing Geneseeq Technology Inc., Nanjing, Jiangsu, China
- School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
15
|
Olivucci G, Iovino E, Innella G, Turchetti D, Pippucci T, Magini P. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front Genet 2024; 15:1374860. [PMID: 38510277 PMCID: PMC10951082 DOI: 10.3389/fgene.2024.1374860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The clinical application of technological progress in the identification of DNA alterations has always led to improvements of diagnostic yields in genetic medicine. At chromosome side, from cytogenetic techniques evaluating number and gross structural defects to genomic microarrays detecting cryptic copy number variants, and at molecular level, from Sanger method studying the nucleotide sequence of single genes to the high-throughput next-generation sequencing (NGS) technologies, resolution and sensitivity progressively increased expanding considerably the range of detectable DNA anomalies and alongside of Mendelian disorders with known genetic causes. However, particular genomic regions (i.e., repetitive and GC-rich sequences) are inefficiently analyzed by standard genetic tests, still relying on laborious, time-consuming and low-sensitive approaches (i.e., southern-blot for repeat expansion or long-PCR for genes with highly homologous pseudogenes), accounting for at least part of the patients with undiagnosed genetic disorders. Third generation sequencing, generating long reads with improved mappability, is more suitable for the detection of structural alterations and defects in hardly accessible genomic regions. Although recently implemented and not yet clinically available, long read sequencing (LRS) technologies have already shown their potential in genetic medicine research that might greatly impact on diagnostic yield and reporting times, through their translation to clinical settings. The main investigated LRS application concerns the identification of structural variants and repeat expansions, probably because techniques for their detection have not evolved as rapidly as those dedicated to single nucleotide variants (SNV) identification: gold standard analyses are karyotyping and microarrays for balanced and unbalanced chromosome rearrangements, respectively, and southern blot and repeat-primed PCR for the amplification and sizing of expanded alleles, impaired by limited resolution and sensitivity that have not been significantly improved by the advent of NGS. Nevertheless, more recently, with the increased accuracy provided by the latest product releases, LRS has been tested also for SNV detection, especially in genes with highly homologous pseudogenes and for haplotype reconstruction to assess the parental origin of alleles with de novo pathogenic variants. We provide a review of relevant recent scientific papers exploring LRS potential in the diagnosis of genetic diseases and its potential future applications in routine genetic testing.
Collapse
Affiliation(s)
- Giulia Olivucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
- Department of Surgical and Oncological Sciences, University of Palermo, Palermo, Italy
| | - Emanuela Iovino
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Giovanni Innella
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Daniela Turchetti
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Tommaso Pippucci
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Pamela Magini
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| |
Collapse
|
16
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
17
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024:elae003. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
18
|
Singh A, Ramakrishna G, Singh NK, Abdin MZ, Gaikwad K. Genomic insight into variations associated with flowering-time and early-maturity in pigeonpea mutant TAT-10 and its wild type parent T21. Int J Biol Macromol 2024; 257:128559. [PMID: 38061506 DOI: 10.1016/j.ijbiomac.2023.128559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023]
Abstract
Pigeonpea [Cajanus cajan (L.) Millspaugh] is an important grain legume crop with a broad range of 90 to 300 days for maturity. To identify the genomic variations associated with the early maturity, we conducted whole-genome resequencing of an early-maturing pigeonpea mutant TAT-10 and its wild type parent T21. A total of 135.67 and 146.34 million sequencing reads were generated for T21 and TAT-10, respectively. From this resequencing data, 1,397,178 and 1,419,904 SNPs, 276,741 and 292,347 InDels, and 87,583 and 92,903 SVs were identified in T21 and TAT-10, respectively. We identified 203 genes in the pigeonpea genome that are homologs of flowering-related genes in Arabidopsis and found 791 genomic variations unique to TAT-10 linked to 94 flowering-related genes. We identified three candidate genes for early maturity in TAT-10; Suppressor of FRI 4 (SUF4), Early Flowering In Short Days (EFS), and Probable Lysine-Specific Demethylase ELF6. The variations in ELF6 were predicted to be possibly damaging and the expression profiles of EFS and ELF6 also supported their probable role during early flowering in TAT-10. The present study has generated information on genomic variations associated with candidate genes for early maturity, which can be further studied and exploited for developing the early-maturing pigeonpea cultivars.
Collapse
Affiliation(s)
- Anupam Singh
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; Centre for Transgenic Plant Development, Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi 110062, India
| | | | | | - Malik Zainul Abdin
- Centre for Transgenic Plant Development, Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi 110062, India.
| | - Kishor Gaikwad
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India.
| |
Collapse
|
19
|
Bagger FO, Borgwardt L, Jespersen AS, Hansen AR, Bertelsen B, Kodama M, Nielsen FC. Whole genome sequencing in clinical practice. BMC Med Genomics 2024; 17:39. [PMID: 38287327 PMCID: PMC10823711 DOI: 10.1186/s12920-024-01795-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 01/01/2024] [Indexed: 01/31/2024] Open
Abstract
Whole genome sequencing (WGS) is becoming the preferred method for molecular genetic diagnosis of rare and unknown diseases and for identification of actionable cancer drivers. Compared to other molecular genetic methods, WGS captures most genomic variation and eliminates the need for sequential genetic testing. Whereas, the laboratory requirements are similar to conventional molecular genetics, the amount of data is large and WGS requires a comprehensive computational and storage infrastructure in order to facilitate data processing within a clinically relevant timeframe. The output of a single WGS analyses is roughly 5 MIO variants and data interpretation involves specialized staff collaborating with the clinical specialists in order to provide standard of care reports. Although the field is continuously refining the standards for variant classification, there are still unresolved issues associated with the clinical application. The review provides an overview of WGS in clinical practice - describing the technology and current applications as well as challenges connected with data processing, interpretation and clinical reporting.
Collapse
Affiliation(s)
- Frederik Otzen Bagger
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Line Borgwardt
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Sand Jespersen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Anna Reimer Hansen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Birgitte Bertelsen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Miyako Kodama
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Finn Cilius Nielsen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
20
|
Bailey SM, Cross EM, Kinner-Bibeau L, Sebesta HC, Bedford JS, Tompkins CJ. Monitoring Genomic Structural Rearrangements Resulting from Gene Editing. J Pers Med 2024; 14:110. [PMID: 38276232 PMCID: PMC10817574 DOI: 10.3390/jpm14010110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/27/2024] Open
Abstract
The cytogenomics-based methodology of directional genomic hybridization (dGH) enables the detection and quantification of a more comprehensive spectrum of genomic structural variants than any other approach currently available, and importantly, does so on a single-cell basis. Thus, dGH is well-suited for testing and/or validating new advancements in CRISPR-Cas9 gene editing systems. In addition to aberrations detected by traditional cytogenetic approaches, the strand specificity of dGH facilitates detection of otherwise cryptic intra-chromosomal rearrangements, specifically small inversions. As such, dGH represents a powerful, high-resolution approach for the quantitative monitoring of potentially detrimental genomic structural rearrangements resulting from exposure to agents that induce DNA double-strand breaks (DSBs), including restriction endonucleases and ionizing radiations. For intentional genome editing strategies, it is critical that any undesired effects of DSBs induced either by the editing system itself or by mis-repair with other endogenous DSBs are recognized and minimized. In this paper, we discuss the application of dGH for assessing gene editing-associated structural variants and the potential heterogeneity of such rearrangements among cells within an edited population, highlighting its relevance to personalized medicine strategies.
Collapse
Affiliation(s)
- Susan M. Bailey
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Erin M. Cross
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | | - Henry C. Sebesta
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | - Joel S. Bedford
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA;
- KromaTiD, Inc., Longmont, CO 80501, USA; (E.M.C.); (L.K.-B.); (H.C.S.)
| | | |
Collapse
|
21
|
Veldsman WP, Yang C, Zhang Z, Huang Y, Chowdhury D, Zhang L. Structural and Functional Disparities within the Human Gut Virome in Terms of Genome Topology and Representative Genome Selection. Viruses 2024; 16:134. [PMID: 38257834 PMCID: PMC10820185 DOI: 10.3390/v16010134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open
Abstract
Circularity confers protection to viral genomes where linearity falls short, thereby fulfilling the form follows function aphorism. However, a shift away from morphology-based classification toward the molecular and ecological classification of viruses is currently underway within the field of virology. Recent years have seen drastic changes in the International Committee on Taxonomy of Viruses' operational definitions of viruses, particularly for the tailed phages that inhabit the human gut. After the abolition of the order Caudovirales, these tailed phages are best defined as members of the class Caudoviricetes. To determine the epistemological value of genome topology in the context of the human gut virome, we designed a set of seven experiments to assay the impact of genome topology and representative viral selection on biological interpretation. Using Oxford Nanopore long reads for viral genome assembly coupled with Illumina short-read polishing, we showed that circular and linear virus genomes differ remarkably in terms of genome quality, GC skew, transfer RNA gene frequency, structural variant frequency, cross-reference functional annotation (COG, KEGG, Pfam, and TIGRfam), state-of-the-art marker-based classification, and phage-host interaction. Furthermore, the disparity profile changes during dereplication. In particular, our phage-host interaction results demonstrated that proportional abundances cannot be meaningfully compared without due regard for genome topology and dereplication threshold, which necessitates the need for standardized reporting. As a best practice guideline, we recommend that comparative studies of the human gut virome always report the ratio of circular to linear viral genomes along with the dereplication threshold so that structural and functional metrics can be placed into context when assessing biologically relevant metagenomic properties such as proportional abundance.
Collapse
Affiliation(s)
- Werner P. Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
| | | | - Debajyoti Chowdhury
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China;
- Computational Medicine Laboratory, Hong Kong Baptist University, Hong Kong SAR, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China; (W.P.V.); (C.Y.); (Z.Z.)
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen 518057, China
| |
Collapse
|
22
|
Zhuang G, Zhang X, Du W, Xu L, Ma J, Luo H, Tang H, Wang W, Wang P, Li M, Yang X, Wu D, Fang S. A benchmarking framework for the accurate and cost-effective detection of clinically-relevant structural variants for cancer target identification and diagnosis. J Transl Med 2024; 22:65. [PMID: 38229122 DOI: 10.1186/s12967-024-04865-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/06/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Accurate clinical structural variant (SV) calling is essential for cancer target identification and diagnosis but has been historically challenging due to the lack of ground truth for clinical specimens. Meanwhile, reduced clinical-testing cost is the key to the widespread clinical utility. METHODS We analyzed massive data from tumor samples of 476 patients and developed a computational framework for accurate and cost-effective detection of clinically-relevant SVs. In addition, standard materials and classical experiments including immunohistochemistry and/or fluorescence in situ hybridization were used to validate the developed computational framework. RESULTS We systematically evaluated the common algorithms for SV detection and established an expert-reviewed SV call set of 1,303 tumor-specific SVs with high-evidence levels. Moreover, we developed a random-forest-based decision model to improve the true positive of SVs. To independently validate the tailored 'two-step' strategy, we utilized standard materials and classical experiments. The accuracy of the model was over 90% (92-99.78%) for all types of data. CONCLUSION Our study provides a valuable resource and an actionable guide to improve cancer-specific SV detection accuracy and clinical applicability.
Collapse
Affiliation(s)
- Guiwu Zhuang
- Department of Gastrointestinal Surgery, The Eighth Affiliated Hospital, Sun Yat-Sen University, Shenzhen, China
| | - Xiaotao Zhang
- Department of Radiotherapy, Qingdao Central Hospital, University of Health and Rehabilitation Sciences, Qingdao, China
| | - Wenjing Du
- Department of Radiotherapy, Shanxi Province Cancer Hospital/Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| | - Libin Xu
- Department of Orthopedic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiyong Ma
- Department of Respiration, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Haitao Luo
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Hongzhen Tang
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Wei Wang
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Peng Wang
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Miao Li
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Xu Yang
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Dongfang Wu
- Shenzhen Engineering Center for Translational Medicine of Precision Cancer Immunodiagnosis and Therapy, YuceBio Technology Co., Ltd., Shenzhen, China
| | - Shencun Fang
- Department of Respiratory Medicine, Nanjing Chest Hospital, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China.
| |
Collapse
|
23
|
Puckelwartz MJ, Pesce LL, Hernandez EJ, Webster G, Dellefave-Castillo LM, Russell MW, Geisler SS, Kearns SD, Karthik F, Etheridge SP, Monroe TO, Pottinger TD, Kannankeril PJ, Shoemaker MB, Fountain D, Roden DM, Faulkner M, MacLeod HM, Burns KM, Yandell M, Tristani-Firouzi M, George AL, McNally EM. The impact of damaging epilepsy and cardiac genetic variant burden in sudden death in the young. Genome Med 2024; 16:13. [PMID: 38229148 PMCID: PMC10792876 DOI: 10.1186/s13073-024-01284-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 01/03/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Sudden unexpected death in children is a tragic event. Understanding the genetics of sudden death in the young (SDY) enables family counseling and cascade screening. The objective of this study was to characterize genetic variation in an SDY cohort using whole genome sequencing. METHODS The SDY Case Registry is a National Institutes of Health/Centers for Disease Control and Prevention surveillance effort to discern the prevalence, causes, and risk factors for SDY. The SDY Case Registry prospectively collected clinical data and DNA biospecimens from SDY cases < 20 years of age. SDY cases were collected from medical examiner and coroner offices spanning 13 US jurisdictions from 2015 to 2019. The cohort included 211 children (median age 0.33 year; range 0-20 years), determined to have died suddenly and unexpectedly and from whom DNA biospecimens for DNA extractions and next-of-kin consent were ascertained. A control cohort consisted of 211 randomly sampled, sex- and ancestry-matched individuals from the 1000 Genomes Project. Genetic variation was evaluated in epilepsy, cardiomyopathy, and arrhythmia genes in the SDY and control cohorts. American College of Medical Genetics/Genomics guidelines were used to classify variants as pathogenic or likely pathogenic. Additionally, pathogenic and likely pathogenic genetic variation was identified using a Bayesian-based artificial intelligence (AI) tool. RESULTS The SDY cohort was 43% European, 29% African, 3% Asian, 16% Hispanic, and 9% with mixed ancestries and 39% female. Six percent of the cohort was found to harbor a pathogenic or likely pathogenic genetic variant in an epilepsy, cardiomyopathy, or arrhythmia gene. The genomes of SDY cases, but not controls, were enriched for rare, potentially damaging variants in epilepsy, cardiomyopathy, and arrhythmia-related genes. A greater number of rare epilepsy genetic variants correlated with younger age at death. CONCLUSIONS While damaging cardiomyopathy and arrhythmia genes are recognized contributors to SDY, we also observed an enrichment in epilepsy-related genes in the SDY cohort and a correlation between rare epilepsy variation and younger age at death. These findings emphasize the importance of considering epilepsy genes when evaluating SDY.
Collapse
Affiliation(s)
- Megan J Puckelwartz
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Lorenzo L Pesce
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Gregory Webster
- Division of Cardiology, Department of Pediatrics, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | | | - Mark W Russell
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Sarah S Geisler
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Samuel D Kearns
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Felix Karthik
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Susan P Etheridge
- Division of Pediatric Cardiology, University of Utah, Salt Lake City, UT, USA
| | - Tanner O Monroe
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Tess D Pottinger
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Prince J Kannankeril
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M Benjamin Shoemaker
- Department of Medicine, Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Darlene Fountain
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan M Roden
- Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | | | - Kristin M Burns
- Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Alfred L George
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Elizabeth M McNally
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
24
|
Audet S, Triassi V, Gelinas M, Legault-Cadieux N, Ferraro V, Duquette A, Tetreault M. Integration of multi-omics technologies for molecular diagnosis in ataxia patients. Front Genet 2024; 14:1304711. [PMID: 38239855 PMCID: PMC10794629 DOI: 10.3389/fgene.2023.1304711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/27/2023] [Indexed: 01/22/2024] Open
Abstract
Background: Episodic ataxias are rare neurological disorders characterized by recurring episodes of imbalance and coordination difficulties. Obtaining definitive molecular diagnoses poses challenges, as clinical presentation is highly heterogeneous, and literature on the underlying genetics is limited. While the advent of high-throughput sequencing technologies has significantly contributed to Mendelian disorders genetics, interpretation of variants of uncertain significance and other limitations inherent to individual methods still leaves many patients undiagnosed. This study aimed to investigate the utility of multi-omics for the identification and validation of molecular candidates in a cohort of complex cases of ataxia with episodic presentation. Methods: Eight patients lacking molecular diagnosis despite extensive clinical examination were recruited following standard genetic testing. Whole genome and RNA sequencing were performed on samples isolated from peripheral blood mononuclear cells. Integration of expression and splicing data facilitated genomic variants prioritization. Subsequently, long-read sequencing played a crucial role in the validation of those candidate variants. Results: Whole genome sequencing uncovered pathogenic variants in four genes (SPG7, ATXN2, ELOVL4, PMPCB). A missense and a nonsense variant, both previously reported as likely pathogenic, configured in trans in individual #1 (SPG7: c.2228T>C/p.I743T, c.1861C>T/p.Q621*). An ATXN2 microsatellite expansion (CAG32) in another late-onset case. In two separate individuals, intronic variants near splice sites (ELOVL4: c.541 + 5G>A; PMPCB: c.1154 + 5G>C) were predicted to induce loss-of-function splicing, but had never been reported as disease-causing. Long-read sequencing confirmed the compound heterozygous variants configuration, repeat expansion length, as well as splicing landscape for those pathogenic variants. A potential genetic modifier of the ATXN2 expansion was discovered in ZFYVE26 (c.3022C>T/p.R1008*). Conclusion: Despite failure to identify pathogenic variants through clinical genetic testing, the multi-omics approach enabled the molecular diagnosis in 50% of patients, also giving valuable insights for variant prioritization in remaining cases. The findings demonstrate the value of long-read sequencing for the validation of candidate variants in various scenarios. Our study demonstrates the effectiveness of leveraging complementary omics technologies to unravel the underlying genetics in patients with unresolved rare diseases such as ataxia. Molecular diagnoses not only hold significant promise in improving patient care management, but also alleviates the burden of diagnostic odysseys, more broadly enhancing quality of life.
Collapse
Affiliation(s)
- Sebastien Audet
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Valerie Triassi
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
| | - Myriam Gelinas
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Nab Legault-Cadieux
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| | - Vincent Ferraro
- Department of Medicine, University of Montreal Hospital Centre (CHUM), Montreal, QC, Canada
| | - Antoine Duquette
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
- Neurology Service, Department of Medicine, André-Barbeau Movement Disorders Unit, University of Montreal Hospital (CHUM), Montreal, QC, Canada
- Genetic Service, Department of Medicine, University of Montreal Hospital (CHUM), Montreal, QC, Canada
| | - Martine Tetreault
- University of Montreal Hospital Research Center (CRCHUM), Montreal, QC, Canada
- Department of Neurosciences, University of Montreal, Montreal, QC, Canada
| |
Collapse
|
25
|
James KN, Chowdhury S, Ding Y, Batalov S, Watkins K, Kwon YH, Van Der Kraan L, Ellsworth K, Kingsmore SF, Guidugli L. Genome sequencing detects a wide range of clinically relevant copy-number variants and other genomic alterations. Genet Med 2024; 26:101006. [PMID: 37869996 DOI: 10.1016/j.gim.2023.101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 10/03/2023] [Accepted: 10/06/2023] [Indexed: 10/24/2023] Open
Abstract
PURPOSE Copy-number variants (CNVs) and other non-single nucleotide variant/indel variant types contribute an important proportion of diagnoses in individuals with suspected genetic disease. This study describes the range of such variants detected by genome sequencing (GS). METHODS For a pediatric cohort of 1032 participants undergoing clinical GS, we characterize the CNVs and other non-single nucleotide variant/indel variant types that were reported, including aneuploidies, mobile element insertions, and uniparental disomies, and we describe the bioinformatic pipeline used to detect these variants. RESULTS Together, these genetic alterations accounted for 15.8% of reported variants. Notably, 67.9% of these were deletions, 32.9% of which overlapped a single gene, and many deletions were reported together with a second variant in the same gene in cases of recessive disease. A retrospective medical record review in a subset of this cohort revealed that up to 6 additional genetic tests were ordered in 68% (26/38) of cases, some of which failed to report the CNVs/rare variants reported on GS. CONCLUSION GS detected a broad range of reported variant types, including CNVs ranging in size from 1 Kb to 46 Mb.
Collapse
Affiliation(s)
- Kiely N James
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | | | - Yan Ding
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | - Sergey Batalov
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | - Kelly Watkins
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | - Yong Hyun Kwon
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | | | | | | | - Lucia Guidugli
- Rady Children's Institute for Genomic Medicine, San Diego, CA.
| |
Collapse
|
26
|
Hsu JS, Wu DC, Shih SH, Liu JF, Tsai YC, Lee TL, Chen WA, Tseng YH, Lo YC, Lin HY, Chen YC, Chen JY, Chou TH, Chang DTH, Su MW, Guo WH, Mao HH, Chen CY, Chen PL. Complete genomic profiles of 1496 Taiwanese reveal curated medical insights. J Adv Res 2023:S2090-1232(23)00405-8. [PMID: 38159844 DOI: 10.1016/j.jare.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/03/2023] [Accepted: 12/27/2023] [Indexed: 01/03/2024] Open
Abstract
INTRODUCTION The population of Taiwan has a long history of ethno-cultural evolution. The Taiwanese population was isolated from other large populations such as the European, Han Chinese, and Japanese population. The Taiwan Biobank (TWB) project has built a nationwide database, particularly for personal whole-genome sequence (WGS) to facilitate basic and clinical collaboration nationally and internationally, making it one of the most valuable public datasets of the East Asian population. OBJECTIVES This study provides comprehensive medical genomic findings from TWB WGS data, for better characterization of disease susceptibility and the choice of ideal treatment regimens in Taiwanese population. METHODS We reanalyzed 1496 WGS using a PrecisionFDA Truth challenge winner method Sentieon DNAscope. Single nucleotide variants (SNV) and small insertions/deletions (INDEL) were benchmarked. We also analyzed pharmacogenomic (PGx) drug-associated alleles, and copy number variants (CNV). Multiple practicing clinicians reviewed and curated the clinically significant variants. Variant annotations can be browsed at TaiwanGenomes (https://genomes.tw). RESULTS We found that each participant had an average of 6,870.7 globally novel variants and 75.3% (831/1103) of the participants harbored at least one PharmGKB-selected high evidence level human leukocyte antigen (HLA) risk allele. 54 PharmGKB-reported high-level instances of evidence of Cytochrome P450 variant-drug pairs, with a population frequency of over 13.2%. We also identified 23 variants in the ACMG secondary finding V3 gene list from 25 participants, suggesting that 1.67% (25/1496) of the population is harboring at least one medical actionable variant. Our carrier status analyses suggest that one in 25 couples (3.94%) would risk having offspring with at least one pathogenic variant, which is in line with rates found in Japan and Singapore. For pathogenic CNV, we detected 6.88% and 2.02% carrier rates for alpha thalassemia and spinal muscular atrophy, respectively. CONCLUSION Our study highlights the overall medical insights of a complete Taiwanese genomic profile.
Collapse
Affiliation(s)
- Jacob Shujui Hsu
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei 100025, Taiwan; Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei 100233, Taiwan
| | - Dung-Chi Wu
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei 10617, Taiwan
| | - Shang-Hung Shih
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei 10617, Taiwan
| | - Jen-Feng Liu
- Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei 100233, Taiwan
| | - Ya-Chen Tsai
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Tung-Lin Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei 100226, Taiwan
| | - Wei-An Chen
- Department of Medical Genetics, National Taiwan University Hospital, Taipei 100226, Taiwan
| | - Yi-Hsuan Tseng
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei 100025, Taiwan
| | - Yi-Chung Lo
- Department of Electrical Engineering, National Cheng-Kung University, Tainan 701401, Taiwan
| | - Hong-Ye Lin
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Yi-Chieh Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei 100025, Taiwan
| | - Jing-Yi Chen
- Department of Electrical Engineering, National Cheng-Kung University, Tainan 701401, Taiwan
| | - Ting-Hsuan Chou
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei 100025, Taiwan
| | - Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng-Kung University, Tainan 701401, Taiwan; Digital Technology Division, SinoPac Holdings, Taiwan
| | - Ming Wei Su
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115201, Taiwan
| | - Wei-Hong Guo
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Hsin-Hsiang Mao
- Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Chien-Yu Chen
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei 10617, Taiwan; Department of Biomechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan.
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei 100025, Taiwan; Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei 100233, Taiwan; Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei 10617, Taiwan; Department of Medical Genetics, National Taiwan University Hospital, Taipei 100226, Taiwan; Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei 100233, Taiwan.
| |
Collapse
|
27
|
Liu S, Ebel ER, Luniewski A, Zulawinska J, Simpson ML, Kim J, Ene N, Braukmann TWA, Congdon M, Santos W, Yeh E, Guler JL. Direct long read visualization reveals metabolic interplay between two antimalarial drug targets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528367. [PMID: 36824743 PMCID: PMC9948948 DOI: 10.1101/2023.02.13.528367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Increases in the copy number of large genomic regions, termed genome amplification, are an important adaptive strategy for malaria parasites. Numerous amplifications across the Plasmodium falciparum genome contribute directly to drug resistance or impact the fitness of this protozoan parasite. During the characterization of parasite lines with amplifications of the dihydroorotate dehydrogenase (DHODH) gene, we detected increased copies of an additional genomic region that encompassed 3 genes (~5 kb) including GTP cyclohydrolase I (GCH1 amplicon). While this gene is reported to increase the fitness of antifolate resistant parasites, GCH1 amplicons had not previously been implicated in any other antimalarial resistance context. Here, we further explored the association between GCH1 and DHODH copy number. Using long read sequencing and single read visualization, we directly observed a higher number of tandem GCH1 amplicons in parasites with increased DHODH copies (up to 9 amplicons) compared to parental parasites (3 amplicons). While all GCH1 amplicons shared a consistent structure, expansions arose in 2-unit steps (from 3 to 5 to 7, etc copies). Adaptive evolution of DHODH and GCH1 loci was further bolstered when we evaluated prior selection experiments; DHODH amplification was only successful in parasite lines with pre-existing GCH1 amplicons. These observations, combined with the direct connection between metabolic pathways that contain these enzymes, lead us to propose that the GCH1 locus is beneficial for the fitness of parasites exposed to DHODH inhibitors. This finding highlights the importance of studying variation within individual parasite genomes as well as biochemical connections of drug targets as novel antimalarials move towards clinical approval.
Collapse
Affiliation(s)
- Shiwei Liu
- University of Virginia, Department of Biology, Charlottesville, VA, USA
- Current affiliation: Indiana University School of Medicine, Indianapolis, IN, USA
| | - Emily R. Ebel
- Stanford, Departments of Pediatrics and Microbiology & Immunology, Stanford, CA, USA
| | | | - Julia Zulawinska
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Jane Kim
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | - Nnenna Ene
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| | | | - Molly Congdon
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Webster Santos
- Virginia Tech, Department of Chemistry, Blacksburg, VA, USA
| | - Ellen Yeh
- Stanford University, Departments of Pathology and Microbiology & Immunology, Stanford, CA, USA
| | - Jennifer L. Guler
- University of Virginia, Department of Biology, Charlottesville, VA, USA
| |
Collapse
|
28
|
Meng X, Wang M, Luo M, Sun L, Yan Q, Liu Y. Systematic evaluation of multiple NGS platforms for structural variants detection. J Biol Chem 2023; 299:105436. [PMID: 37944616 PMCID: PMC10724692 DOI: 10.1016/j.jbc.2023.105436] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/29/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
Structural variations (SV) are critical genome changes affecting human diseases. Although many hybridization-based methods exist, evaluating SVs through next-generation sequencing (NGS) data is still necessary for broader research exploration. Here, we comprehensively compared the performance of 16 SV callers and multiple NGS platforms using NA12878 whole genome sequencing (WGS) datasets. The results indicated that several SV callers performed well relatively, such as Manta, GRIDSS, LUMPY, TARDIS, FermiKit, and Wham. Meanwhile, all NGS platforms have a similar performance using a single software. Additionally, we found that the source of undetected SVs was mostly from long reads datasets, therefore, the more appropriate strategy for accurate SV detection will be an integration of long and shorter reads in the future. At present, in the period of NGS as a mainstream method in bioinformatics, our study would provide helpful and comprehensive guidelines for specific categories of SV research.
Collapse
Affiliation(s)
- Xuan Meng
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Miao Wang
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Mingjie Luo
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Yongfeng Liu
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China.
| |
Collapse
|
29
|
Lemay MA, de Ronne M, Bélanger R, Belzile F. k-mer-based GWAS enhances the discovery of causal variants and candidate genes in soybean. THE PLANT GENOME 2023; 16:e20374. [PMID: 37596724 DOI: 10.1002/tpg2.20374] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/19/2023] [Indexed: 08/20/2023]
Abstract
Genome-wide association studies (GWAS) are powerful statistical methods that detect associations between genotype and phenotype at genome scale. Despite their power, GWAS frequently fail to pinpoint the causal variant or the gene controlling a given trait in crop species. Assessing genetic variants other than single-nucleotide polymorphisms (SNPs) could alleviate this problem. In this study, we tested the potential of structural variant (SV)- and k-mer-based GWAS in soybean by applying these methods as well as conventional SNP/indel-based GWAS to 13 traits. We assessed the performance of each GWAS approach based on loci for which the causal genes or variants were known from previous genetic studies. We found that k-mer-based GWAS was the most versatile approach and the best at pinpointing causal variants or candidate genes. Moreover, k-mer-based analyses identified promising candidate genes for loci related to pod color, pubescence form, and resistance to Phytophthora sojae. In our dataset, SV-based GWAS did not add value compared to k-mer-based GWAS and may not be worth the time and computational resources invested. Despite promising results, significant challenges remain regarding the downstream analysis of k-mer-based GWAS. Notably, better methods are needed to associate significant k-mers with sequence variation. Our results suggest that coupling k-mer- and SNP/indel-based GWAS is a powerful approach for discovering candidate genes in crop species.
Collapse
Affiliation(s)
- Marc-André Lemay
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - Maxime de Ronne
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - Richard Bélanger
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| | - François Belzile
- Département de phytologie, Université Laval, Québec, QC, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
- Centre de recherche et d'innovation sur les végétaux, Université Laval, Québec, QC, Canada
| |
Collapse
|
30
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
31
|
Magi A, Mattei G, Mingrino A, Caprioli C, Ronchini C, Frigè G, Semeraro R, Baragli M, Bolognini D, Colombo E, Mazzarella L, Pelicci PG. GASOLINE: detecting germline and somatic structural variants from long-reads data. Sci Rep 2023; 13:20817. [PMID: 38012350 PMCID: PMC10682169 DOI: 10.1038/s41598-023-48285-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4-5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, 50100, Florence, Italy.
- Institute for Biomedical Technologies, National Research Council, Segrate, Milan, Italy.
| | - Gianluca Mattei
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Chiara Caprioli
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Chiara Ronchini
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Marta Baragli
- Department of Information Engineering, University of Florence, 50100, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Emanuela Colombo
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy.
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.
| |
Collapse
|
32
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
33
|
Karunarathne P, Zhou Q, Schliep K, Milesi P. A comprehensive framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile r package for paralogue and CNV detection. Mol Ecol Resour 2023; 23:1772-1789. [PMID: 37515483 DOI: 10.1111/1755-0998.13843] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent studies have highlighted the significant role of copy number variants (CNVs) in phenotypic diversity, environmental adaptation and species divergence across eukaryotes. The presence of CNVs also has the potential to introduce genotyping biases, which can pose challenges to accurate population and quantitative genetic analyses. However, detecting CNVs in genomes, particularly in non-model organisms, presents a formidable challenge. To address this issue, we have developed a statistical framework and an accompanying r software package that leverage allelic-read depth from single nucleotide polymorphism (SNP) data for accurate CNV detection. Our framework capitalises on two key principles. First, it exploits the distribution of allelic-read depth ratios in heterozygotes for individual SNPs by comparing it against an expected distribution based on binomial sampling. Second, it identifies SNPs exhibiting an apparent excess of heterozygotes under Hardy-Weinberg equilibrium. By employing multiple statistical tests, our method not only enhances sensitivity to sampling effects but also effectively addresses reference biases, resulting in optimised SNP classification. Our framework is compatible with various NGS technologies (e.g. RADseq, Exome-capture). This versatility enables CNV calling from genomes of diverse complexities. To streamline the analysis process, we have implemented our framework in the user-friendly r package 'rCNV', which automates the entire workflow seamlessly. We trained our models using simulated data and validated their performance on four datasets derived from different sequencing technologies, including RADseq (Chinook salmon-Oncorhynchus tshawytscha), Rapture (American lobster-Homarus americanus), Exome-capture (Norway spruce-Picea abies) and WGS (Malaria mosquito-Anopheles gambiae).
Collapse
Affiliation(s)
- Piyal Karunarathne
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
- Institute of Population Genetics, Heinrich Heine University, Düsseldorf, Germany
| | - Qiujie Zhou
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
| | - Klaus Schliep
- Institute of Computational Biotechnology, Graz University of Technology, Graz, Austria
| | - Pascal Milesi
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
| |
Collapse
|
34
|
Pajuste FD, Remm M. GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads. Sci Rep 2023; 13:17765. [PMID: 37853040 PMCID: PMC10584998 DOI: 10.1038/s41598-023-44636-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 10/10/2023] [Indexed: 10/20/2023] Open
Abstract
Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.99) between GeneToCN predictions and experimentally determined copy numbers. An additional validation on FCGR3 genes showed a higher concordance for FCGR3A compared to two other methods, but reduced accuracy for FCGR3B. We further tested the method on three different genomic regions (SMN, NPY4R, and LPA Kringle IV-2 domain). Predicted copy number distributions of these genes in a set of 500 individuals from the Estonian Biobank were in good agreement with the previously published studies. In addition, we investigated the possibility to use GeneToCN on sequencing data generated by different technologies by comparing copy number predictions from Illumina, PacBio, and Oxford Nanopore data of the same sample. Despite the differences in variability of k-mer frequencies, all three sequencing technologies give similar predictions with GeneToCN.
Collapse
Affiliation(s)
- Fanny-Dhelia Pajuste
- Institute of Molecular and Cell Biology, University of Tartu, 23 Riia Str., 51010, Tartu, Estonia.
| | - Maido Remm
- Institute of Molecular and Cell Biology, University of Tartu, 23 Riia Str., 51010, Tartu, Estonia
| |
Collapse
|
35
|
Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y, Liu W, Chen C, Song Y, Tan L, Yin K, Zhao Y, Yan Z, Lou S, Zan Y, Liu J. The pan-genome and local adaptation of Arabidopsis thaliana. Nat Commun 2023; 14:6259. [PMID: 37802986 PMCID: PMC10558531 DOI: 10.1038/s41467-023-42029-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 09/27/2023] [Indexed: 10/08/2023] Open
Abstract
Arabidopsis thaliana serves as a model species for investigating various aspects of plant biology. However, the contribution of genomic structural variations (SVs) and their associate genes to the local adaptation of this widely distribute species remains unclear. Here, we de novo assemble chromosome-level genomes of 32 A. thaliana ecotypes and determine that variable genes expand the gene pool in different ecotypes and thus assist local adaptation. We develop a graph-based pan-genome and identify 61,332 SVs that overlap with 18,883 genes, some of which are highly involved in ecological adaptation of this species. For instance, we observe a specific 332 bp insertion in the promoter region of the HPCA1 gene in the Tibet-0 ecotype that enhances gene expression, thereby promotes adaptation to alpine environments. These findings augment our understanding of the molecular mechanisms underlying the local adaptation of A. thaliana across diverse habitats.
Collapse
Affiliation(s)
- Minghui Kang
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Haolin Wu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Huanhuan Liu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Wenyu Liu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Mingjia Zhu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Yu Han
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Wei Liu
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Chunlin Chen
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Yan Song
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Luna Tan
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Kangqun Yin
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Yusen Zhao
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Zhen Yan
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | - Shangling Lou
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China.
| | - Yanjun Zan
- Key Laboratory of Tobacco Improvement and Biotechnology, Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao, 266000, China.
| | - Jianquan Liu
- State Key Laboratory of Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
- Key Laboratory of Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
36
|
Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform 2023; 24:bbad297. [PMID: 37587831 PMCID: PMC10516374 DOI: 10.1093/bib/bbad297] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/05/2023] [Accepted: 07/23/2023] [Indexed: 08/18/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that can take many different forms such as copy number alterations, inversions and translocations. During cell development and aging, somatic SVs accumulate in the genome with potentially neutral, deleterious or pathological effects. Generation of somatic SVs is a key mutational process in cancer development and progression. Despite their importance, the detection of somatic SVs is challenging, making them less studied than somatic single-nucleotide variants. In this review, we summarize recent advances in whole-genome sequencing (WGS)-based approaches for detecting somatic SVs at the tissue and single-cell levels and discuss their advantages and limitations. First, we describe the state-of-the-art computational algorithms for somatic SV calling using bulk WGS data and compare the performance of somatic SV detectors in the presence or absence of a matched-normal control. We then discuss the unique features of cutting-edge single-cell-based techniques for analyzing somatic SVs. The advantages and disadvantages of bulk and single-cell approaches are highlighted, along with a discussion of their sensitivity to copy-neutral SVs, usefulness for functional inferences and experimental and computational costs. Finally, computational approaches for linking somatic SVs to their functional readouts, such as those obtained from single-cell transcriptome and epigenome analyses, are illustrated, with a discussion of the promise of these approaches in health and diseases.
Collapse
Affiliation(s)
- Dohun Yi
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Hyobin Jeong
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
37
|
Xu X, Chen B, Zhang J, Lan S, Wu S. Whole-genome resequencing analysis of the medicinal plant Gardenia jasminoides. PeerJ 2023; 11:e16056. [PMID: 37744244 PMCID: PMC10512932 DOI: 10.7717/peerj.16056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/17/2023] [Indexed: 09/26/2023] Open
Abstract
Background Gardenia jasminoides is a species of Chinese medicinal plant, which has high medicinal and economic value and rich genetic diversity, but the study on its genetic diversity is far not enough. Methods In this study, one wild and one cultivated gardenia materials were resequenced using IlluminaHiSeq sequencing platform and the data were evaluated to understand the genomic characteristics of G. jasminoides. Results After data analysis, the results showed that clean data of 11.77G, Q30 reached 90.96%. The average comparison rate between the sample and reference genome was 96.08%, the average coverage depth was 15X, and the genome coverage was 85.93%. The SNPs of FD and YP1 were identified, and 3,087,176 and 3,241,416 SNPs were developed, respectively. In addition, SNP non-synonymous mutation, InDel mutation, SV mutation and CNV mutation were also detected between the sample and the reference genome, and KEGG, GO and COG database annotations were made for genes with DNA level variation. The structural gene variation in the biosynthetic pathway of crocin and gardenia, the main medicinal substance of G. jasminoides was further explored, which provided basic data for molecular breeding and genetic diversity of G. jasminoides in the future.
Collapse
Affiliation(s)
- Xinyu Xu
- Fujian Academy of Forestry Sciences, Fuzhou, Fujian, China
- College of Landscape and Architecture, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Bihua Chen
- Fujian Academy of Forestry Sciences, Fuzhou, Fujian, China
| | - Juan Zhang
- Fujian Academy of Forestry Sciences, Fuzhou, Fujian, China
| | - Siren Lan
- College of Landscape and Architecture, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Shasha Wu
- College of Landscape and Architecture, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| |
Collapse
|
38
|
Liu Q, Xie B, Gao Y, Xu S, Lu Y. A protocol for applying low-coverage whole-genome sequencing data in structural variation studies. STAR Protoc 2023; 4:102433. [PMID: 37432854 PMCID: PMC10362160 DOI: 10.1016/j.xpro.2023.102433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 05/23/2023] [Accepted: 06/12/2023] [Indexed: 07/13/2023] Open
Abstract
Structural variations (SVs) have a great impact on various biological processes and influence physical traits in many species. Here, we present a protocol for applying the low-coverage next-generation sequencing data of Rhipicephalus microplus to detect high-differentiated SVs accurately. We also outline its use to investigate population/species-specific genetic structures, local adaptation, and transcriptional function. We describe steps for constructing variation maps and SV annotation. We then detail population genetic analysis and differential gene expression analysis. For complete details on the usage and execution of this protocol, please refer to Liu et al. (2023).
Collapse
Affiliation(s)
- Qi Liu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China
| | - Bo Xie
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yang Gao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China.
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China.
| |
Collapse
|
39
|
Rao H, Zhang H, Zou Y, Ma P, Huang T, Yuan H, Zhou J, Lu W, Li Q, Huang S, Liu Y, Yang B. Analysis of chromosomal structural variations in patients with recurrent spontaneous abortion using optical genome mapping. Front Genet 2023; 14:1248755. [PMID: 37732322 PMCID: PMC10507169 DOI: 10.3389/fgene.2023.1248755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023] Open
Abstract
Background and aims: Certain chromosomal structural variations (SVs) in biological parents can lead to recurrent spontaneous abortions (RSAs). Unequal crossing over during meiosis can result in the unbalanced rearrangement of gamete chromosomes such as duplication or deletion. Unfortunately, routine techniques such as karyotyping, fluorescence in situ hybridization (FISH), chromosomal microarray analysis (CMA), and copy number variation sequencing (CNV-seq) cannot detect all types of SVs. In this study, we show that optical genome mapping (OGM) quickly and accurately detects SVs for RSA patients with a high resolution and provides more information about the breakpoint regions at gene level. Methods: Seven couples who had suffered RSA with unbalanced chromosomal rearrangements of aborted embryos were recruited, and ultra-high molecular weight (UHMW) DNA was isolated from their peripheral blood. The consensus genome map was created by de novo assembly on the Bionano Solve data analysis software. SVs and breakpoints were identified via alignments of the reference genome GRCh38/hg38. The exact breakpoint sequences were verified using either Oxford Nanopore sequencing or Sanger sequencing. Results: Various SVs in the recruited couples were successfully detected by OGM. Also, additional complex chromosomal rearrangement (CCRs) and four cryptic balanced reciprocal translocations (BRTs) were revealed, further refining the underlying genetic causes of RSA. Two of the disrupted genes identified in this study, FOXK2 [46,XY,t(7; 17)(q31.3; q25)] and PLXDC2 [46,XX,t(10; 16)(p12.31; q23.1)], had been previously shown to be associated with male fertility and embryo transit. Conclusion: OGM accurately detects chromosomal SVs, especially cryptic BRTs and CCRs. It is a useful complement to routine human genetic diagnostics, such as karyotyping, and detects cryptic BRTs and CCRs more accurately than routine genetic diagnostics.
Collapse
Affiliation(s)
- Huihua Rao
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Haoyi Zhang
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
| | - Yongyi Zou
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Pengpeng Ma
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Tingting Huang
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Huizhen Yuan
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Jihui Zhou
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Wan Lu
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Qiao Li
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Shuhui Huang
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Yanqiu Liu
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| | - Bicheng Yang
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
- Jiangxi Key Laboratory of Birth Defect Prevention and Control, Jiangxi Maternal and Child Health Hospital, Nanchang, Jiangxi, China
| |
Collapse
|
40
|
Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of Copy Number Variation in the Roma. Hum Genet 2023; 142:1327-1343. [PMID: 37311904 PMCID: PMC10449987 DOI: 10.1007/s00439-023-02579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/02/2023] [Indexed: 06/15/2023]
Abstract
We provide the first whole genome Copy Number Variant (CNV) study addressing Roma, along with reference populations from South Asia, the Middle East and Europe. Using CNV calling software for short-read sequence data, we identified 3171 deletions and 489 duplications. Taking into account the known population history of the Roma, as inferred from whole genome nucleotide variation, we could discern how this history has shaped CNV variation. As expected, patterns of deletion variation, but not duplication, in the Roma followed those obtained from single nucleotide polymorphisms (SNPs). Reduced effective population size resulting in slightly relaxed natural selection may explain our observation of an increase in intronic (but not exonic) deletions within Loss of Function (LoF)-intolerant genes. Over-representation analysis for LoF-intolerant gene sets hosting intronic deletions highlights a substantial accumulation of shared biological processes in Roma, intriguingly related to signaling, nervous system and development features, which may be related to the known profile of private disease in the population. Finally, we show the link between deletions and known trait-related SNPs reported in the genome-wide association study (GWAS) catalog, which exhibited even frequency distributions among the studied populations. This suggests that, in general human populations, the strong association between deletions and SNPs associated to biomedical conditions and traits could be widespread across continental populations, reflecting a common background of potentially disease/trait-related CNVs.
Collapse
Affiliation(s)
- Marco Antinucci
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Comas
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
41
|
Wang H, Makowski C, Zhang Y, Qi A, Kaufmann T, Smeland OB, Fiecas M, Yang J, Visscher PM, Chen CH. Chromosomal inversion polymorphisms shape human brain morphology. Cell Rep 2023; 42:112896. [PMID: 37505983 PMCID: PMC10508191 DOI: 10.1016/j.celrep.2023.112896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/27/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
The impact of chromosomal inversions on human brain morphology remains underexplored. We studied 35 common inversions classified from genotypes of 33,018 adults with European ancestry. The inversions at 2p22.3, 16p11.2, and 17q21.31 reach genome-wide significance, followed by 8p23.1 and 6p21.33, in their association with cortical and subcortical morphology. The 17q21.31, 8p23.1, and 16p11.2 regions comprise the LRRC37, OR7E, and NPIP duplicated gene families. We find the 17q21.31 MAPT inversion region, known for harboring neurological risk, to be the most salient locus among common variants for shaping and patterning the cortex. Overall, we observe the inverted orientations decreasing brain size, with the exception that the 2p22.3 inversion is associated with increased subcortical volume and the 8p23.1 inversion is associated with increased motor cortex. These significant inversions are in the genomic hotspots of neuropsychiatric loci. Our findings are generalizable to 3,472 children and demonstrate inversions as essential genetic variation to understand human brain phenotypes.
Collapse
Affiliation(s)
- Hao Wang
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Carolina Makowski
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Anna Qi
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Tobias Kaufmann
- Department of Psychiatry and Psychotherapy, Tübingen Center for Mental Health, University of Tübingen, 72076 Tübingen, Germany; Norwegian Centre for Mental Disorders Research, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway
| | - Olav B Smeland
- Norwegian Centre for Mental Disorders Research, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway
| | - Mark Fiecas
- Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Chi-Hua Chen
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
42
|
Xie H, Li W, Guo Y, Su X, Chen K, Wen L, Tang F. Long-read-based single sperm genome sequencing for chromosome-wide haplotype phasing of both SNPs and SVs. Nucleic Acids Res 2023; 51:8020-8034. [PMID: 37351613 PMCID: PMC10450174 DOI: 10.1093/nar/gkad532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 06/01/2023] [Accepted: 06/09/2023] [Indexed: 06/24/2023] Open
Abstract
Although localized haploid phasing can be achieved using long read genome sequencing without parental data, reliable chromosome-scale phasing remains a great challenge. Given that sperm is a natural haploid cell, single-sperm genome sequencing can provide a chromosome-wide phase signal. Due to the limitation of read length, current short-read-based single-sperm genome sequencing methods can only achieve SNP haplotyping and come with difficulties in detecting and haplotyping structural variations (SVs) in complex genomic regions. To overcome these limitations, we developed a long-read-based single-sperm genome sequencing method and a corresponding data analysis pipeline that can accurately identify crossover events and chromosomal level aneuploidies in single sperm and efficiently detect SVs within individual sperm cells. Importantly, without parental genome information, our method can accurately conduct de novo phasing of heterozygous SVs as well as SNPs from male individuals at the whole chromosome scale. The accuracy for phasing of SVs was as high as 98.59% using 100 single sperm cells, and the accuracy for phasing of SNPs was as high as 99.95%. Additionally, our method reliably enabled deduction of the repeat expansions of haplotype-resolved STRs/VNTRs in single sperm cells. Our method provides a new opportunity for studying haplotype-related genetics in mammals.
Collapse
Affiliation(s)
- Haoling Xie
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| | - Wen Li
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Yuqing Guo
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Xinjie Su
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Kexuan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Lu Wen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
| | - Fuchou Tang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China
- Changping Laboratory, Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102206, China
| |
Collapse
|
43
|
Romagnoli S, Bartalucci N, Vannucchi AM. Resolving complex structural variants via nanopore sequencing. Front Genet 2023; 14:1213917. [PMID: 37674481 PMCID: PMC10479017 DOI: 10.3389/fgene.2023.1213917] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/26/2023] [Indexed: 09/08/2023] Open
Abstract
The recent development of high-throughput sequencing platforms provided impressive insights into the field of human genetics and contributed to considering structural variants (SVs) as the hallmark of genome instability, leading to the establishment of several pathologic conditions, including neoplasia and neurodegenerative and cognitive disorders. While SV detection is addressed by next-generation sequencing (NGS) technologies, the introduction of more recent long-read sequencing technologies have already been proven to be invaluable in overcoming the inaccuracy and limitations of NGS technologies when applied to resolve wide and structurally complex SVs due to the short length (100-500 bp) of the sequencing read utilized. Among the long-read sequencing technologies, Oxford Nanopore Technologies developed a sequencing platform based on a protein nanopore that allows the sequencing of "native" long DNA molecules of virtually unlimited length (typical range 1-100 Kb). In this review, we focus on the bioinformatics methods that improve the identification and genotyping of known and novel SVs to investigate human pathological conditions, discussing the possibility of introducing nanopore sequencing technology into routine diagnostics.
Collapse
Affiliation(s)
| | | | - Alessandro Maria Vannucchi
- CRIMM, Center of Research and Innovation of Myeloproliferative Neoplasms, DENOTHE Excellence Center, Careggi University Hospital and Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
44
|
Ahsan MU, Liu Q, Perdomo JE, Fang L, Wang K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 2023; 20:1143-1158. [PMID: 37386186 DOI: 10.1038/s41592-023-01932-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 05/31/2023] [Indexed: 07/01/2023]
Abstract
As long-read sequencing technologies are becoming increasingly popular, a number of methods have been developed for the discovery and analysis of structural variants (SVs) from long reads. Long reads enable detection of SVs that could not be previously detected from short-read sequencing, but computational methods must adapt to the unique challenges and opportunities presented by long-read sequencing. Here, we summarize over 50 long-read-based methods for SV detection, genotyping and visualization, and discuss how new telomere-to-telomere genome assemblies and pangenome efforts can improve the accuracy and drive the development of SV callers in the future.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan Elliot Perdomo
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
45
|
Schmidt M, Kutzner A. MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads. Genome Biol 2023; 24:170. [PMID: 37461107 DOI: 10.1186/s13059-023-03009-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/06/2023] [Indexed: 07/20/2023] Open
Abstract
Structural variant (SV) calling belongs to the standard tools of modern bioinformatics for identifying and describing alterations in genomes. Initially, this work presents several complex genomic rearrangements that reveal conceptual ambiguities inherent to the representation via basic SV. We contextualize these ambiguities theoretically as well as practically and propose a graph-based approach for resolving them. For various yeast genomes, we practically compute adjacency matrices of our graph model and demonstrate that they provide highly accurate descriptions of one genome in terms of another. An open-source prototype implementation of our approach is available under the MIT license at https://github.com/ITBE-Lab/MA .
Collapse
Affiliation(s)
- Markus Schmidt
- Biomedical Center Munich, Department of Physiological Chemistry, Ludwig-Maximilians-Universität, Großhaderner Str. 9, 82152, Planegg-Martinsried, Germany
| | - Arne Kutzner
- Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 133-791, Republic of Korea.
| |
Collapse
|
46
|
van Belzen IAEM, Cai C, van Tuil M, Badloe S, Strengman E, Janse A, Verwiel ETP, van der Leest DFM, Kester L, Molenaar JJ, Meijerink J, Drost J, Peng WC, Kerstens HHD, Tops BBJ, Holstege FCP, Kemmeren P, Hehir-Kwa JY. Systematic discovery of gene fusions in pediatric cancer by integrating RNA-seq and WGS. BMC Cancer 2023; 23:618. [PMID: 37400763 DOI: 10.1186/s12885-023-11054-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 03/08/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Gene fusions are important cancer drivers in pediatric cancer and their accurate detection is essential for diagnosis and treatment. Clinical decision-making requires high confidence and precision of detection. Recent developments show RNA sequencing (RNA-seq) is promising for genome-wide detection of fusion products but hindered by many false positives that require extensive manual curation and impede discovery of pathogenic fusions. METHODS We developed Fusion-sq to overcome existing disadvantages of detecting gene fusions. Fusion-sq integrates and "fuses" evidence from RNA-seq and whole genome sequencing (WGS) using intron-exon gene structure to identify tumor-specific protein coding gene fusions. Fusion-sq was then applied to the data generated from a pediatric pan-cancer cohort of 128 patients by WGS and RNA sequencing. RESULTS In a pediatric pan-cancer cohort of 128 patients, we identified 155 high confidence tumor-specific gene fusions and their underlying structural variants (SVs). This includes all clinically relevant fusions known to be present in this cohort (30 patients). Fusion-sq distinguishes healthy-occurring from tumor-specific fusions and resolves fusions in amplified regions and copy number unstable genomes. A high gene fusion burden is associated with copy number instability. We identified 27 potentially pathogenic fusions involving oncogenes or tumor-suppressor genes characterized by underlying SVs, in some cases leading to expression changes indicative of activating or disruptive effects. CONCLUSIONS Our results indicate how clinically relevant and potentially pathogenic gene fusions can be identified and their functional effects investigated by combining WGS and RNA-seq. Integrating RNA fusion predictions with underlying SVs advances fusion detection beyond extensive manual filtering. Taken together, we developed a method for identifying candidate gene fusions that is suitable for precision oncology applications. Our method provides multi-omics evidence for assessing the pathogenicity of tumor-specific gene fusions for future clinical decision making.
Collapse
Affiliation(s)
| | - Casey Cai
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Marc van Tuil
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Shashi Badloe
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Eric Strengman
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Alex Janse
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | | | - Lennart Kester
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jan J Molenaar
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Department of Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
| | - Jules Meijerink
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jarno Drost
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Weng Chuan Peng
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Bastiaan B J Tops
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | | | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
- Center for Molecular Medicine, UMC Utrecht and Utrecht University, Utrecht, The Netherlands.
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
47
|
Sohn JI, Choi MH, Yi D, Menon VA, Kim YJ, Lee J, Park JW, Kyung S, Shin SH, Na B, Joung JG, Ju YS, Yeom MS, Koh Y, Yoon SS, Baek D, Kim TM, Nam JW. Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets. Nat Biomed Eng 2023; 7:853-866. [PMID: 36536253 DOI: 10.1038/s41551-022-00980-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/01/2022] [Indexed: 12/24/2022]
Abstract
Variant callers typically produce massive numbers of false positives for structural variations, such as cancer-relevant copy-number alterations and fusion genes resulting from genome rearrangements. Here we describe an ultrafast and accurate detector of somatic structural variations that reduces read-mapping costs by filtering out reads matched to pan-genome k-mer sets. The detector, which we named ETCHING (for efficient detection of chromosomal rearrangements and fusion genes), reduces the number of false positives by leveraging machine-learning classifiers trained with six breakend-related features (clipped-read count, split-reads count, supporting paired-end read count, average mapping quality, depth difference and total length of clipped bases). When benchmarked against six callers on reference cell-free DNA, validated biomarkers of structural variants, matched tumour and normal whole genomes, and tumour-only targeted sequencing datasets, ETCHING was 11-fold faster than the second-fastest structural-variant caller at comparable performance and memory use. The speed and accuracy of ETCHING may aid large-scale genome projects and facilitate practical implementations in precision medicine.
Collapse
Affiliation(s)
- Jang-Il Sohn
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, Republic of Korea
| | - Min-Hak Choi
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Dohun Yi
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Vipin A Menon
- Department of Life Science, Hanyang University, Seoul, Republic of Korea
| | - Yeon Jeong Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
| | - Junehawk Lee
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Jung Woo Park
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | | | | | - Byunggook Na
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
| | - Je-Gun Joung
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, Republic of Korea
| | - Young Seok Ju
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
- Biomedical Science and Engineering Interdisciplinary Program, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Min Sun Yeom
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Youngil Koh
- College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Sung-Soo Yoon
- College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Tae-Min Kim
- Department of Medical Informatics and Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, Hanyang University, Seoul, Republic of Korea.
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, Republic of Korea.
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea.
| |
Collapse
|
48
|
Yahya S, Watson CM, Carr I, McKibbin M, Crinnion LA, Taylor M, Bonin H, Fletcher T, El-Asrag ME, Ali M, Toomes C, Inglehearn CF. Long-Read Nanopore Sequencing of RPGR ORF15 is Enhanced Following DNase I Treatment of MinION Flow Cells. Mol Diagn Ther 2023; 27:525-535. [PMID: 37284979 PMCID: PMC10299921 DOI: 10.1007/s40291-023-00656-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2023] [Indexed: 06/08/2023]
Abstract
INTRODUCTION RPGR ORF15 is an exon present almost exclusively in the retinal transcript of RPGR. It is purine-rich, repetitive and notoriously hard to sequence, but is a hotspot for mutations causing X-linked retinitis pigmentosa. METHODS Long-read nanopore sequencing on MinION and Flongle flow cells was used to sequence RPGR ORF15 in genomic DNA from patients with inherited retinal dystrophy. A flow cell wash kit was used on a MinION flow cell to increase yield. Findings were confirmed by PacBio SMRT long-read sequencing. RESULTS We showed that long-read nanopore sequencing successfully reads through a 2 kb PCR-amplified fragment containing ORF15. We generated reads of sufficient quality and cumulative read-depth to detect pathogenic RP-causing variants. However, we observed that this G-rich, repetitive DNA segment rapidly blocks the available pores, resulting in sequence yields less than 5% of the expected output. This limited the extent to which samples could be pooled, increasing cost. We tested the utility of a MinION wash kit containing DNase I to digest DNA fragments remaining on the flow cell, regenerating the pores. Use of the DNase I treatment allowed repeated re-loading, increasing the sequence reads obtained. Our customised workflow was used to screen pooled amplification products from previously unsolved inherited retinal disease (IRD) in patients, identifying two new cases with pathogenic ORF15 variants. DISCUSSION We report the novel finding that long-read nanopore sequencing can read through RPGR-ORF15, a DNA sequence not captured by short-read next-generation sequencing (NGS), but with a more reduced yield. Use of a flow cell wash kit containing DNase I unblocks the pores, allowing reloading of further library aliquots over a 72-h period, increasing yield. The workflow we describe provides a novel solution to the need for a rapid, robust, scalable, cost-effective ORF15 screening protocol.
Collapse
Affiliation(s)
- Samar Yahya
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
- Department of Medical Genetics, School of Medicine, King Abdulaziz University, Rabigh, Kingdom of Saudi Arabia
| | - Christopher M Watson
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
- North East and Yorkshire Genomic Laboratory Hub, Central Lab, St. James's University Hospital, Leeds, UK
| | - Ian Carr
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
| | - Martin McKibbin
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
- Department of Ophthalmology, St. James's University Hospital, Leeds, UK
| | - Laura A Crinnion
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
| | - Morag Taylor
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
| | - Hope Bonin
- North West Genomic Laboratory Hub, Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester, UK
| | - Tracy Fletcher
- North West Genomic Laboratory Hub, Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester, UK
| | - Mohammed E El-Asrag
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
- Department of Zoology, Faculty of Science, Benha University, Banha, Egypt
- Institute of Cancer and Genomic Science, University of Birmingham, Birmingham, UK
| | - Manir Ali
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
| | - Carmel Toomes
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK
| | - Chris F Inglehearn
- Leeds Institute of Medical Research, School of Medicine, University of Leeds, St James's University Hospital, Wellcome Trust Brenner Building, Beckett Street, Leeds, LS9 7TF, UK.
| |
Collapse
|
49
|
Medvedev P. Theoretical Analysis of Sequencing Bioinformatics Algorithms and Beyond. COMMUNICATIONS OF THE ACM 2023; 66:118-125. [PMID: 38736702 PMCID: PMC11087067 DOI: 10.1145/3571723] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
A case study reveals the theoretical analysis of algorithms is not always as helpful as standard dogma might suggest.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
50
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|