51
|
Quan C, Li Y, Liu X, Wang Y, Ping J, Lu Y, Zhou G. Characterization of structural variation in Tibetans reveals new evidence of high-altitude adaptation and introgression. Genome Biol 2021; 22:159. [PMID: 34034800 PMCID: PMC8146648 DOI: 10.1186/s13059-021-02382-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 05/14/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Structural variation (SV) acts as an essential mutational force shaping the evolution and function of the human genome. However, few studies have examined the role of SVs in high-altitude adaptation and little is known of adaptive introgressed SVs in Tibetans so far. RESULTS Here, we generate a comprehensive catalog of SVs in a Chinese Tibetan (n = 15) and Han (n = 10) population using nanopore sequencing technology. Among a total of 38,216 unique SVs in the catalog, 27% are sequence-resolved for the first time. We systematically assess the distribution of these SVs across repeat sequences and functional genomic regions. Through genotyping in additional 276 genomes, we identify 69 Tibetan-Han stratified SVs and 80 candidate adaptive genes. We also discover a few adaptive introgressed SV candidates and provide evidence for a deletion of 335 base pairs at 1p36.32. CONCLUSIONS Overall, our results highlight the important role of SVs in the evolutionary processes of Tibetans' adaptation to the Qinghai-Tibet Plateau and provide a valuable resource for future high-altitude adaptation studies.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yuanfeng Li
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Xinyi Liu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yahui Wang
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Jie Ping
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
- Hebei University, Baoding, Hebei Province 071002 People’s Republic of China
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850 People’s Republic of China
- Hebei University, Baoding, Hebei Province 071002 People’s Republic of China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166 People’s Republic of China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025 People’s Republic of China
| |
Collapse
|
52
|
Valle-Inclan JE, Stangl C, de Jong AC, van Dessel LF, van Roosmalen MJ, Helmijr JCA, Renkens I, Janssen R, de Blank S, de Witte CJ, Martens JWM, Jansen MPHM, Lolkema MP, Kloosterman WP. Optimizing Nanopore sequencing-based detection of structural variants enables individualized circulating tumor DNA-based disease monitoring in cancer patients. Genome Med 2021; 13:86. [PMID: 34006333 PMCID: PMC8130429 DOI: 10.1186/s13073-021-00899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 04/27/2021] [Indexed: 12/18/2022] Open
Abstract
Here, we describe a novel approach for rapid discovery of a set of tumor-specific genomic structural variants (SVs), based on a combination of low coverage cancer genome sequencing using Oxford Nanopore with an SV calling and filtering pipeline. We applied the method to tumor samples of high-grade ovarian and prostate cancer patients and validated on average ten somatic SVs per patient with breakpoint-spanning PCR mini-amplicons. These SVs could be quantified in ctDNA samples of patients with metastatic prostate cancer using a digital PCR assay. The results suggest that SV dynamics correlate with and may improve existing treatment-response biomarkers such as PSA. https://github.com/UMCUGenetics/SHARC .
Collapse
Affiliation(s)
- Jose Espejo Valle-Inclan
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Christina Stangl
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Division of Molecular Oncology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Anouk C de Jong
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Lisanne F van Dessel
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Markus J van Roosmalen
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jean C A Helmijr
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Ivo Renkens
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Roel Janssen
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Sam de Blank
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Chris J de Witte
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - John W M Martens
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Maurice P H M Jansen
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Martijn P Lolkema
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands.
| | - Wigard P Kloosterman
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands. .,Cyclomics, Utrecht, The Netherlands. .,Frame Cancer Therapeutics, Amsterdam, The Netherlands.
| |
Collapse
|
53
|
Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 2021; 53:779-786. [PMID: 33972781 DOI: 10.1038/s41588-021-00865-4] [Citation(s) in RCA: 123] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 04/05/2021] [Indexed: 01/05/2023]
Abstract
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.
Collapse
|
54
|
Fujimoto A, Wong JH, Yoshii Y, Akiyama S, Tanaka A, Yagi H, Shigemizu D, Nakagawa H, Mizokami M, Shimada M. Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer. Genome Med 2021; 13:65. [PMID: 33910608 PMCID: PMC8082928 DOI: 10.1186/s13073-021-00883-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 04/06/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Identification of germline variation and somatic mutations is a major issue in human genetics. However, due to the limitations of DNA sequencing technologies and computational algorithms, our understanding of genetic variation and somatic mutations is far from complete. METHODS In the present study, we performed whole-genome sequencing using long-read sequencing technology (Oxford Nanopore) for 11 Japanese liver cancers and matched normal samples which were previously sequenced for the International Cancer Genome Consortium (ICGC). We constructed an analysis pipeline for the long-read data and identified germline and somatic structural variations (SVs). RESULTS In polymorphic germline SVs, our analysis identified 8004 insertions, 6389 deletions, 27 inversions, and 32 intra-chromosomal translocations. By comparing to the chimpanzee genome, we correctly inferred events that caused insertions and deletions and found that most insertions were caused by transposons and Alu is the most predominant source, while other types of insertions, such as tandem duplications and processed pseudogenes, are rare. We inferred mechanisms of deletion generations and found that most non-allelic homolog recombination (NAHR) events were caused by recombination errors in SINEs. Analysis of somatic mutations in liver cancers showed that long reads could detect larger numbers of SVs than a previous short-read study and that mechanisms of cancer SV generation were different from that of germline deletions. CONCLUSIONS Our analysis provides a comprehensive catalog of polymorphic and somatic SVs, as well as their possible causes. Our software are available at https://github.com/afujimoto/CAMPHOR and https://github.com/afujimoto/CAMPHORsomatic .
Collapse
Affiliation(s)
- Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Jing Hao Wong
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yukiko Yoshii
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Shintaro Akiyama
- Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Japan
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Science, Yokohama, Japan
| | - Azusa Tanaka
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Hitomi Yagi
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Daichi Shigemizu
- Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Japan
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Science, Yokohama, Japan
| | - Hidewaki Nakagawa
- Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Japan
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Science, Yokohama, Japan
| | - Masashi Mizokami
- Genome Medical Sciences Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Mihoko Shimada
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
55
|
Vollrath P, Chawla HS, Schiessl SV, Gabur I, Lee H, Snowdon RJ, Obermeier C. A novel deletion in FLOWERING LOCUS T modulates flowering time in winter oilseed rape. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1217-1231. [PMID: 33471161 PMCID: PMC7973412 DOI: 10.1007/s00122-021-03768-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/06/2021] [Indexed: 05/05/2023]
Abstract
A novel structural variant was discovered in the FLOWERING LOCUS T orthologue BnaFT.A02 by long-read sequencing. Nested association mapping in an elite winter oilseed rape population revealed that this 288 bp deletion associates with early flowering, putatively by modification of binding-sites for important flowering regulation genes. Perfect timing of flowering is crucial for optimal pollination and high seed yield. Extensive previous studies of flowering behavior in Brassica napus (canola, rapeseed) identified mutations in key flowering regulators which differentiate winter, semi-winter and spring ecotypes. However, because these are generally fixed in locally adapted genotypes, they have only limited relevance for fine adjustment of flowering time in elite cultivar gene pools. In crosses between ecotypes, the ecotype-specific major-effect mutations mask minor-effect loci of interest for breeding. Here, we investigated flowering time in a multiparental mapping population derived from seven elite winter oilseed rape cultivars which are fixed for major-effect mutations separating winter-type rapeseed from other ecotypes. Association mapping revealed eight genomic regions on chromosomes A02, C02 and C03 associating with fine modulation of flowering time. Long-read genomic resequencing of the seven parental lines identified seven structural variants coinciding with candidate genes for flowering time within chromosome regions associated with flowering time. Segregation patterns for these variants in the elite multiparental population and a diversity set of winter types using locus-specific assays revealed significant associations with flowering time for three deletions on chromosome A02. One of these was a previously undescribed 288 bp deletion within the second intron of FLOWERING LOCUS T on chromosome A02, emphasizing the advantage of long-read sequencing for detection of structural variants in this size range. Detailed analysis revealed the impact of this specific deletion on flowering-time modulation under extreme environments and varying day lengths in elite, winter-type oilseed rape.
Collapse
Affiliation(s)
- Paul Vollrath
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Harmeet S Chawla
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Sarah V Schiessl
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Iulian Gabur
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - HueyTyng Lee
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | |
Collapse
|
56
|
Sailer S, Coassin S, Lackner K, Fischer C, McNeill E, Streiter G, Kremser C, Maglione M, Green CM, Moralli D, Moschen AR, Keller MA, Golderer G, Werner-Felmayer G, Tegeder I, Channon KM, Davies B, Werner ER, Watschinger K. When the genome bluffs: a tandem duplication event during generation of a novel Agmo knockout mouse model fools routine genotyping. Cell Biosci 2021; 11:54. [PMID: 33726865 PMCID: PMC7962373 DOI: 10.1186/s13578-021-00566-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 02/25/2021] [Indexed: 12/12/2022] Open
Abstract
Background Genome editing in mice using either classical approaches like homologous recombination or CRISPR/Cas9 has been reported to harbor off target effects (insertion/deletion, frame shifts or gene segment duplications) that lead to mutations not only in close proximity to the target site but also outside. Only the genomes of few engineered mouse strains have been sequenced. Since the role of the ether-lipid cleaving enzyme alkylglycerol monooxygenase (AGMO) in physiology and pathophysiology remains enigmatic, we created a knockout mouse model for AGMO using EUCOMM stem cells but unforeseen genotyping issues that did not agree with Mendelian distribution and enzyme activity data prompted an in-depth genomic validation of the mouse model. Results We report a gene segment tandem duplication event that occurred during the generation of an Agmo knockout-first allele by homologous recombination. Only low homology was seen between the breakpoints. While a single copy of the recombinant 18 kb cassette was integrated correctly around exon 2 of the Agmo gene, whole genome nanopore sequencing revealed a 94 kb duplication in the Agmo locus that contains Agmo wild-type exons 1–3. The duplication fooled genotyping by routine PCR, but could be resolved using qPCR-based genotyping, targeted locus amplification sequencing and nanopore sequencing. Despite this event, this Agmo knockout mouse model lacks AGMO enzyme activity and can therefore be used to study its physiological role. Conclusions A duplication event occurred at the exact locus of the homologous recombination and was not detected by conventional quality control filters such as FISH or long-range PCR over the recombination sites. Nanopore sequencing provides a cost convenient method to detect such underrated off-target effects, suggesting its use for additional quality assessment of gene editing in mice and also other model organisms.
Collapse
Affiliation(s)
- Sabrina Sailer
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Stefan Coassin
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Katharina Lackner
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Caroline Fischer
- Institute of Clinical Pharmacology of the Medical Faculty, Goethe-University, Frankfurt (Main), Germany
| | - Eileen McNeill
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.,Division of Cardiovascular Medicine, British Heart Foundation Centre of Research Excellence, University of Oxford, Oxford, United Kingdom
| | - Gertraud Streiter
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Christian Kremser
- Department of Radiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Manuel Maglione
- Department of Visceral, Transplant and Thoracic Surgery, Medical University of Innsbruck, Innsbruck, Austria
| | - Catherine M Green
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Daniela Moralli
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Alexander R Moschen
- Department of Internal Medicine I, Gastroenterology, Endocrinology and Metabolism, Medical University of Innsbruck, Innsbruck, Austria
| | - Markus A Keller
- Institute of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria
| | - Georg Golderer
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Gabriele Werner-Felmayer
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Irmgard Tegeder
- Institute of Clinical Pharmacology of the Medical Faculty, Goethe-University, Frankfurt (Main), Germany
| | - Keith M Channon
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.,Division of Cardiovascular Medicine, British Heart Foundation Centre of Research Excellence, University of Oxford, Oxford, United Kingdom
| | - Benjamin Davies
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Ernst R Werner
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria
| | - Katrin Watschinger
- Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innsbruck, Austria. .,Institute of Biological Chemistry, Biocenter, Medical University of Innsbruck, Innrain 80, 6020, Innsbruck, Austria.
| |
Collapse
|
57
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
58
|
Mariani M, Zimmerman C, Rodriguez P, Hasenohr E, Aimola G, Gerrard DL, Richman A, Dest A, Flamand L, Kaufer B, Frietze S. Higher-Order Chromatin Structures of Chromosomally Integrated HHV-6A Predict Integration Sites. Front Cell Infect Microbiol 2021; 11:612656. [PMID: 33718266 PMCID: PMC7953476 DOI: 10.3389/fcimb.2021.612656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/20/2021] [Indexed: 12/31/2022] Open
Abstract
Human herpesvirus -6A and 6B (HHV-6A/B) can integrate their genomes into the telomeres of human chromosomes. Viral integration can occur in several cell types, including germinal cells, resulting in individuals that harbor the viral genome in every cell of their body. The integrated genome is efficiently silenced but can sporadically reactivate resulting in various clinical symptoms. To date, the integration mechanism and the subsequent silencing of HHV-6A/B genes remains poorly understood. Here we investigate the genome-wide chromatin contacts of the integrated HHV-6A in latently-infected cells. We show that HHV-6A becomes transcriptionally silent upon infection of these cells over the course of seven days. In addition, we established an HHV-6-specific 4C-seq approach, revealing that the HHV-6A 3D interactome is associated with quiescent chromatin states in cells harboring integrated virus. Furthermore, we observed that the majority of virus chromatin interactions occur toward the distal ends of specific human chromosomes. Exploiting this finding, we established a 4C-seq method that accurately detects the chromosomal integration sites. We further implement long-read minION sequencing in the 4C-seq assay and developed a method to identify HHV-6A/B integration sites in clinical samples.
Collapse
Affiliation(s)
- Michael Mariani
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Cosima Zimmerman
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Princess Rodriguez
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Ellie Hasenohr
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Giulia Aimola
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Diana Lea Gerrard
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Alyssa Richman
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Andrea Dest
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Louis Flamand
- Division of Infectious Disease and Immunity, CHU de Québec Research Center-Université Laval, Quebec City, QC, Canada
| | - Benedikt Kaufer
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Seth Frietze
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States.,University of Vermont Cancer Center, Burlington, VT, United States
| |
Collapse
|
59
|
Akbari V, Garant JM, O'Neill K, Pandoh P, Moore R, Marra MA, Hirst M, Jones SJM. Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase. Genome Biol 2021; 22:68. [PMID: 33618748 PMCID: PMC7898412 DOI: 10.1186/s13059-021-02283-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/29/2021] [Indexed: 02/08/2023] Open
Abstract
The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.
Collapse
Affiliation(s)
- Vahid Akbari
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jean-Michel Garant
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Kieran O'Neill
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Pawan Pandoh
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Richard Moore
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Martin Hirst
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.,Department of Microbiology and Immunology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
60
|
Zhou X, Zhang L, Weng Z, Dill DL, Sidow A. Aquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads. Nat Commun 2021; 12:1077. [PMID: 33597536 PMCID: PMC7889865 DOI: 10.1038/s41467-021-21395-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 01/20/2021] [Indexed: 01/19/2023] Open
Abstract
We introduce Aquila, a new approach to variant discovery in personal genomes, which is critical for uncovering the genetic contributions to health and disease. Aquila uses a reference sequence and linked-read data to generate a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. The contigs of the assemblies from our libraries cover >95% of the human reference genome, with over 98% of that in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased Variant Call Format (VCF) file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective approach that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA.
| | - Lu Zhang
- Department of Pathology, Stanford University, Stanford, CA, USA
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Ziming Weng
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
61
|
Mitsuhashi S, Frith MC, Matsumoto N. Genome-wide survey of tandem repeats by nanopore sequencing shows that disease-associated repeats are more polymorphic in the general population. BMC Med Genomics 2021; 14:17. [PMID: 33413375 PMCID: PMC7791882 DOI: 10.1186/s12920-020-00853-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. METHODS We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. RESULTS We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. CONCLUSIONS We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
62
|
Abstract
Long DNA and RNA reads from nanopore and PacBio technologies have many applications, but the raw reads have a substantial error rate. More accurate sequences can be obtained by merging multiple reads from overlapping parts of the same sequence. lamassemble aligns up to ∼1000 reads to each other, and makes a consensus sequence, which is often much more accurate than the raw reads. It is useful for studying a region of interest such as an expanded tandem repeat or other disease-causing mutation.
Collapse
|
63
|
Joshi D, Mao S, Kannan S, Diggavi S. QAlign: aligning nanopore reads accurately using current-level modeling. Bioinformatics 2020; 37:625-633. [PMID: 33051648 PMCID: PMC8097683 DOI: 10.1093/bioinformatics/btaa875] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 07/17/2020] [Accepted: 09/29/2020] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. RESULTS We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. AVAILABILITY AND IMPLEMENTATION https://github.com/joshidhaivat/QAlign.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhaivat Joshi
- Electrical & Computer Engineering, University of California, Los Angeles, CA 90095, USA
| | - Shunfu Mao
- Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Sreeram Kannan
- Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA,To whom correspondence should be addressed. or
| | - Suhas Diggavi
- Electrical & Computer Engineering, University of California, Los Angeles, CA 90095, USA,To whom correspondence should be addressed. or
| |
Collapse
|
64
|
Xie Y, Zhong Y, Chang J, Kwan HS. Chromosome-level de novo assembly of Coprinopsis cinerea A43mut B43mut pab1-1 #326 and genetic variant identification of mutants using Nanopore MinION sequencing. Fungal Genet Biol 2020; 146:103485. [PMID: 33253902 DOI: 10.1016/j.fgb.2020.103485] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 10/22/2020] [Accepted: 11/13/2020] [Indexed: 11/26/2022]
Abstract
The homokaryotic Coprinopsis cinerea strain A43mut B43mut pab1-1 #326 is a widely used experimental model for developmental studies in mushroom-forming fungi. It can grow on defined artificial media and complete the whole lifecycle within two weeks. The mutations in mating type factors A and B result in the special feature of clamp formation and fruiting without mating. This feature allows investigations and manipulations with a homokaryotic genetic background. Current genome assembly of strain #326 was based on short-read sequencing data and was highly fragmented, leading to the bias in gene annotation and downstream analyses. Here, we report a chromosome-level genome assembly of strain #326. Oxford Nanopore Technology (ONT) MinION sequencing was used to get long reads. Illumina short reads was used to polish the sequences. A combined assembly yield 13 chromosomes and a mitochondrial genome as individual scaffolds. The assembly has 15,250 annotated genes with a high synteny with the C. cinerea strain Okayama-7 #130. This assembly has great improvement on contiguity and annotations. It is a suitable reference for further genomic studies, especially for the genetic, genomic and transcriptomic analyses in ONT long reads. Single nucleotide variants and structural variants in six mutagenized and cisplatin-screened mutants could be identified and validated. A 66 bp deletion in Ras GTPase-activating protein (RasGAP) was found in all mutants. To make a better use of ONT sequencing platform, we modified a high-molecular-weight genomic DNA isolation protocol based on magnetic beads for filamentous fungi. This study showed the use of MinION to construct a fungal reference genome and to perform downstream studies in an individual laboratory. An experimental workflow was proposed, from DNA isolation and whole genome sequencing, to genome assembly and variant calling. Our results provided solutions and parameters for fungal genomic analysis on MinION sequencing platform.
Collapse
Affiliation(s)
- Yichun Xie
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region
| | - Yiyi Zhong
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region
| | - Jinhui Chang
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region; The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China
| | - Hoi Shan Kwan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region.
| |
Collapse
|
65
|
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes (Basel) 2020; 11:E1444. [PMID: 33266238 PMCID: PMC7760597 DOI: 10.3390/genes11121444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 01/23/2023] Open
Abstract
Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.
Collapse
Affiliation(s)
- Nazeefa Fatima
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Anna Petri
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Ulf Gyllensten
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
66
|
Lecompte L, Peterlongo P, Lavenier D, Lemaitre C. SVJedi: genotyping structural variations with long reads. Bioinformatics 2020; 36:4568-4575. [PMID: 32437523 DOI: 10.1093/bioinformatics/btaa527] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 03/27/2020] [Accepted: 05/18/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. RESULTS We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. AVAILABILITY AND IMPLEMENTATION https://github.com/llecompte/SVJedi.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
67
|
Ma X, Fan J, Wu Y, Zhao S, Zheng X, Sun C, Tan L. Whole-genome de novo assemblies reveal extensive structural variations and dynamic organelle-to-nucleus DNA transfers in African and Asian rice. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:596-612. [PMID: 32748498 PMCID: PMC7693357 DOI: 10.1111/tpj.14946] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/17/2020] [Accepted: 07/22/2020] [Indexed: 05/05/2023]
Abstract
Asian cultivated rice (Oryza sativa) and African cultivated rice (Oryza glaberrima) originated from the wild rice species Oryza rufipogon and Oryza barthii, respectively. The genomes of both cultivated species have undergone profound changes during domestication. Whole-genome de novo assemblies of O. barthii, O. glaberrima, O. rufipogon and Oryza nivara, produced using PacBio single-molecule real-time (SMRT) and next-generation sequencing (NGS) technologies, showed that Gypsy-like retrotransposons are the major contributors to genome size variation in African and Asian rice. Through the detection of genome-wide structural variations (SVs), we observed that besides 28 shared SV hot spots, another 67 hot spots existed in either the Asian or African rice genomes. Based on gene annotation information of the SVs, we established that organelle-to-nucleus DNA transfers resulted in numerous SVs that participated in the nuclear genome divergence of rice species and subspecies. We detected 52 giant nuclear integrants of organelle DNA (NORGs, defined as >10 kb) in six Oryza AA genomes. In addition, we developed an effective method to genotype giant NORGs, based on genome assembly, and first showed the dynamic change in the distribution of giant NORGs in rice natural population. Interestingly, 16 highly differentiated giant NORGs tended to accumulate in natural populations of Asian rice from higher latitude regions, grown at lower temperatures and light intensities. Our study provides new insight into the genome divergence of African and Asian rice, and establishes that organelle-to-nucleus DNA transfers, as potentially powerful contributors to environmental adaptation during rice evolution, play a major role in producing SVs in rice genomes.
Collapse
Affiliation(s)
- Xin Ma
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| | - Jinjian Fan
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| | - Yongzhen Wu
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Shuangshuang Zhao
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Xu Zheng
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
| | - Chuanqing Sun
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of Plant Physiology and BiochemistryChina Agricultural UniversityBeijing100193China
| | - Lubin Tan
- MOE Key Laboratory of Crop Heterosis and UtilizationNational Center for Evaluation of Agricultural Wild Plants (Rice)Department of Plant Genetics and BreedingChina Agricultural UniversityBeijing100193China
- State Key Laboratory of AgrobiotechnologyChina Agricultural UniversityBeijing100193China
| |
Collapse
|
68
|
Xu Y, Yang-Turner F, Volk D, Crook D. NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline. Nucleic Acids Res 2020; 48:W366-W371. [PMID: 32442274 PMCID: PMC7319573 DOI: 10.1093/nar/gkaa413] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/22/2020] [Accepted: 05/11/2020] [Indexed: 01/30/2023] Open
Abstract
Metagenomic sequencing combined with Oxford Nanopore Technology has the potential to become a point-of-care test for infectious disease in public health and clinical settings, providing rapid diagnosis of infection, guiding individual patient management and treatment strategies, and informing infection prevention and control practices. However, publicly available, streamlined, and reproducible pipelines for analyzing Nanopore metagenomic sequencing data are still lacking. Here we introduce NanoSPC, a scalable, portable and cloud compatible pipeline for analyzing Nanopore sequencing data. NanoSPC can identify potentially pathogenic viruses and bacteria simultaneously to provide comprehensive characterization of individual samples. The pipeline can also detect single nucleotide variants and assemble high quality complete consensus genome sequences, permitting high-resolution inference of transmission. We implement NanoSPC using Nextflow manager within Docker images to allow reproducibility and portability of the analysis. Moreover, we deploy NanoSPC to our scalable pathogen pipeline platform, enabling elastic computing for high throughput Nanopore data on HPC cluster as well as multiple cloud platforms, such as Google Cloud, Amazon Elastic Computing Cloud, Microsoft Azure and OpenStack. Users could either access our web interface (https://nanospc.mmmoxford.uk) to run cloud-based analysis, monitor process, and visualize results, as well as download Docker images and run command line to analyse data locally.
Collapse
Affiliation(s)
- Yifei Xu
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.,NIHR Oxford Biomedical Research Centre, University of Oxford, UK
| | - Fan Yang-Turner
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.,NIHR Oxford Biomedical Research Centre, University of Oxford, UK
| | - Denis Volk
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.,NIHR Oxford Biomedical Research Centre, University of Oxford, UK
| | - Derrick Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.,NIHR Oxford Biomedical Research Centre, University of Oxford, UK
| |
Collapse
|
69
|
Zhao H, Chen Y, Shen C, Li L, Li Q, Tan K, Huang H, Hu G. Breakpoint mapping of a t(9;22;12) chronic myeloid leukaemia patient with e14a3 BCR-ABL1 transcript using Nanopore sequencing. J Gene Med 2020; 23:e3276. [PMID: 32949441 DOI: 10.1002/jgm.3276] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 09/03/2020] [Accepted: 09/14/2020] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The genetic changes in chronic myeloid leukaemia (CML) have been well established, although challenges persist in cases with rare fusion transcripts or complex variant translocations. Here, we present a CML patient with e14a3 BCR-ABL1 transcript and t(9;22;12) variant Philadelphia (Ph) chromosome. METHODS Cytogenetic analysis and fluorescence in situ hybridization (FISH) was performed to identify the chromosomal aberrations and gene fusions. Rare fusion transcript was verified by a reverse transcription-polymerase chain reaction (RT-PCR). Breakpoints were characterized and validated using Oxford Nanopore Technologies (ONT) (Oxford, UK) and Sanger sequencing, respectively. RESULTS The karyotype showed the translocation t(9;22;12)(q34;q11.2;q24) [20] and FISH indicated 40% positive BCR-ABL1 fusion signals. The RT-PCR suggested e14a3 type fusion transcript. The ONT sequencing analysis identified specific positions of translocation breakpoints: chr22:23633040-chr9:133729579, chr12:121567595-chr22:24701405, which were confirmed using Sanger sequencing. The patient achieved molecular remission 3 months after imatinib therapy. CONCLUSIONS The present study indicates Nanopore sequencing as a valid strategy, which can characterize breakpoints precisely in special clinical cases with atypical structural variations. CML patients with e14a3 transcripts may have good clinical course in the tyrosine kinase inhibitor era, as reviewed here.
Collapse
Affiliation(s)
- Hu Zhao
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Yuan Chen
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Chanjuan Shen
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Lingshu Li
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Qingzhao Li
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Kui Tan
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Huang Huang
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| | - Guoyu Hu
- Department of Haematology, The Affiliated Zhuzhou Hospital, XiangYa Medical College, Central South University, Zhuzhou, Hunan, China
| |
Collapse
|
70
|
Yanagishita T, Imaizumi T, Yamamoto-Shimojima K, Yano T, Okamoto N, Nagata S, Yamamoto T. Breakpoint junction analysis for complex genomic rearrangements with the caldera volcano-like pattern. Hum Mutat 2020; 41:2119-2127. [PMID: 32906213 DOI: 10.1002/humu.24108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 08/25/2020] [Accepted: 09/06/2020] [Indexed: 12/16/2022]
Abstract
Chromosomal triplications can be classified into recurrent and nonrecurrent triplications. Most of the nonrecurrent triplications are embedded in duplicated segments, and duplication-inverted triplication-duplication (DUP-TRP/INV-DUP) has been established as one of the mechanisms of triplication. This study aimed to reveal the underlying mechanism of the TRP-DUP-TRP pattern of chromosomal aberrations, in which the appearance of moving averages obtained through array-based comparative genomic hybridization analysis is similar to the shadows of the caldera volcano-like pattern, which were first identified in two patients with neurodevelopmental disabilities. For this purpose, whole-genome sequencing using long-read Nanopore sequencing was carried out to confirm breakpoint junctions. Custom array analysis and Sanger sequencing were also used to detect all breakpoint junctions. As a result, the TRP-DUP-TRP pattern consisted of only two patterns of breakpoint junctions in both patients. In patient 1, microhomologies were identified in breakpoint junctions. In patient 2, more complex architectures with insertional segments were identified. Thus, replication-based mechanisms were considered as a mechanism of the TRP-DUP-TRP pattern.
Collapse
Affiliation(s)
- Tomoe Yanagishita
- Department of Pediatrics, Tokyo Women's Medical University, Tokyo, Japan.,Department of Genomic Medicine, Tokyo Women's Medical University, Tokyo, Japan
| | - Taichi Imaizumi
- Department of Genomic Medicine, Tokyo Women's Medical University, Tokyo, Japan.,Department of Pediatrics, St. Marianna University School of Medicine, Kawasaki, Japan
| | | | - Tamami Yano
- Department of Pediatrics, Akita University, Akita, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Satoru Nagata
- Department of Pediatrics, Tokyo Women's Medical University, Tokyo, Japan
| | - Toshiyuki Yamamoto
- Department of Pediatrics, Tokyo Women's Medical University, Tokyo, Japan.,Department of Genomic Medicine, Tokyo Women's Medical University, Tokyo, Japan.,Department of Pediatrics, St. Marianna University School of Medicine, Kawasaki, Japan.,Institute for Integrated Medical Sciences, Tokyo Women's Medical University, Tokyo, Japan
| |
Collapse
|
71
|
Spealman P, Burrell J, Gresham D. Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy. Nucleic Acids Res 2020; 48:4940-4945. [PMID: 32255181 PMCID: PMC7229812 DOI: 10.1093/nar/gkaa206] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/14/2020] [Accepted: 04/03/2020] [Indexed: 12/27/2022] Open
Abstract
Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.
Collapse
Affiliation(s)
- Pieter Spealman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Jaden Burrell
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
72
|
Aganezov S, Goodwin S, Sherman RM, Sedlazeck FJ, Arun G, Bhatia S, Lee I, Kirsche M, Wappel R, Kramer M, Kostroff K, Spector DL, Timp W, McCombie WR, Schatz MC. Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing. Genome Res 2020; 30:1258-1273. [PMID: 32887686 PMCID: PMC7545150 DOI: 10.1101/gr.260497.119] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/07/2020] [Indexed: 12/14/2022]
Abstract
Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×–30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gayatri Arun
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sonam Bhatia
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Isac Lee
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - David L Spector
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Winston Timp
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | | | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Department of Biology, Johns Hopkins University, Baltimore, Maryland 21211, USA
| |
Collapse
|
73
|
Yang L. A Practical Guide for Structural Variation Detection in the Human Genome. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 107:e103. [PMID: 32813322 PMCID: PMC7738216 DOI: 10.1002/cphg.103] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Profiling genetic variants-including single nucleotide variants, small insertions and deletions, copy number variations, and structural variations (SVs)-from both healthy individuals and individuals with disease is a key component of genetic and biomedical research. SVs are large-scale changes in the genome and involve breakage and rejoining of DNA fragments. They may affect thousands to millions of nucleotides and can lead to loss, gain, and reshuffling of genes and regulatory elements. SVs are known to impact gene expression and potentially result in altered phenotypes and diseases. Therefore, identifying SVs from the human genomes is particularly important. In this review, I describe advantages and disadvantages of the available high-throughput assays for the discovery of SVs, which are the most challenging genetic alterations to detect. A practical guide is offered to suggest the most suitable strategies for discovering different types of SVs including common germline, rare, somatic, and complex variants. I also discuss factors to be considered, such as cost and performance, for different strategies when designing experiments. Last, I present several approaches to identify potential SV artifacts caused by samples, experimental procedures, and computational analysis. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Lixing Yang
- Ben May Department for Cancer Research, Department of Human Genetics, University of Chicago, Chicago, Illinois
| |
Collapse
|
74
|
Perumal S, Koh CS, Jin L, Buchwaldt M, Higgins EE, Zheng C, Sankoff D, Robinson SJ, Kagale S, Navabi ZK, Tang L, Horner KN, He Z, Bancroft I, Chalhoub B, Sharpe AG, Parkin IAP. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. NATURE PLANTS 2020; 6:929-941. [PMID: 32782408 PMCID: PMC7419231 DOI: 10.1038/s41477-020-0735-y] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 06/28/2020] [Indexed: 05/19/2023]
Abstract
It is only recently, with the advent of long-read sequencing technologies, that we are beginning to uncover previously uncharted regions of complex and inherently recursive plant genomes. To comprehensively study and exploit the genome of the neglected oilseed Brassica nigra, we generated two high-quality nanopore de novo genome assemblies. The N50 contig lengths for the two assemblies were 17.1 Mb (12 contigs), one of the best among 324 sequenced plant genomes, and 0.29 Mb (424 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short-read assembly corroborated genome integrity and quantified sequence-related error rates (0.2%). The contiguity and coverage allowed unprecedented access to low-complexity regions of the genome. Pericentromeric regions and coincidence of hypomethylation enabled localization of active centromeres and identified centromere-associated ALE family retro-elements that appear to have proliferated through relatively recent nested transposition events (<1 Ma). Genomic distances calculated based on synteny relationships were used to define a post-triplication Brassica-specific ancestral genome, and to calculate the extensive rearrangements that define the evolutionary distance separating B. nigra from its diploid relatives.
Collapse
Affiliation(s)
- Sampath Perumal
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Chu Shin Koh
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Lingling Jin
- Department of Computing Science, Thompson Rivers University, Kamloops, British Columbia, Canada
| | - Miles Buchwaldt
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Erin E Higgins
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | | | - Sateesh Kagale
- National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Zahra-Katy Navabi
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Lily Tang
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Kyla N Horner
- Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Zhesi He
- Department of Biology, University of York, York, UK
| | - Ian Bancroft
- Department of Biology, University of York, York, UK
| | - Boulos Chalhoub
- Institute of Crop Science, Zhejiang University, Hangzhou, China
| | - Andrew G Sharpe
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.
| | | |
Collapse
|
75
|
Yano Y, Chiba T, Asahara H. Analysis of the Mouse Y Chromosome by Single-Molecule Sequencing With Y Chromosome Enrichment. Front Genet 2020; 11:406. [PMID: 32457799 PMCID: PMC7221202 DOI: 10.3389/fgene.2020.00406] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 03/31/2020] [Indexed: 02/02/2023] Open
Abstract
Since human and mouse Y chromosomes contain repeated sequences, it is difficult to determine the precise sequences and analyze the function of individual Y chromosome genes. Therefore, the causes of many diseases and abnormalities related to Y chromosome genes, such as male infertility, remain unclear. In this study, to elucidate the mouse Y chromosome, we enriched the mouse Y chromosome using a fluorescence-activated cell sorter (FACS) equipped with commonly used UV and blue 488 nm lasers and read the nucleotides using the Oxford Nanopore MinION long-read sequencer. This sequencing strategy allows us to cover the whole known region as well as the potential undetermined region of the Y chromosome. FACS-based chromosome enrichment and long-read sequencing are suitable for analysis of the Y chromosome sequences and may lead to further understanding of the physiological role of Y chromosome genes.
Collapse
Affiliation(s)
- Yuki Yano
- Department of Systems BioMedicine, Tokyo Medical and Dental University, Tokyo, Japan
| | - Tomoki Chiba
- Department of Systems BioMedicine, Tokyo Medical and Dental University, Tokyo, Japan
| | - Hiroshi Asahara
- Department of Systems BioMedicine, Tokyo Medical and Dental University, Tokyo, Japan.,Department of Molecular and Experimental Medicine, The Scripps Research Institute, San Diego, CA, United States
| |
Collapse
|
76
|
De Coster W, Stovner EB, Strazisar M. Methplotlib: analysis of modified nucleotides from nanopore sequencing. Bioinformatics 2020; 36:3236-3238. [PMID: 32053166 PMCID: PMC7214038 DOI: 10.1093/bioinformatics/btaa093] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 01/03/2020] [Accepted: 02/05/2020] [Indexed: 02/06/2023] Open
Abstract
SUMMARY Modified nucleotides play a crucial role in gene expression regulation. Here, we describe methplotlib, a tool developed for the visualization of modified nucleotides detected from Oxford Nanopore Technologies sequencing platforms, together with additional scripts for statistical analysis of allele-specific modification within-subjects and differential modification frequency across subjects. AVAILABILITY AND IMPLEMENTATION The methplotlib command-line tool is written in Python3, is compatible with Linux, Mac OS and the MS Windows 10 Subsystem for Linux and released under the MIT license. The source code can be found at https://github.com/wdecoster/methplotlib and can be installed from PyPI and bioconda. Our repository includes test data, and the tool is continuously tested at travis-ci.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Endre Bakken Stovner
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim 7013, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim 7013, Norway
| | | |
Collapse
|
77
|
Tham CY, Tirado-Magallanes R, Goh Y, Fullwood MJ, Koh BTH, Wang W, Ng CH, Chng WJ, Thiery A, Tenen DG, Benoukraf T. NanoVar: accurate characterization of patients' genomic structural variants using low-depth nanopore sequencing. Genome Biol 2020; 21:56. [PMID: 32127024 PMCID: PMC7055087 DOI: 10.1186/s13059-020-01968-7] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 02/21/2020] [Indexed: 12/19/2022] Open
Abstract
The recent advent of third-generation sequencing technologies brings promise for better characterization of genomic structural variants by virtue of having longer reads. However, long-read applications are still constrained by their high sequencing error rates and low sequencing throughput. Here, we present NanoVar, an optimized structural variant caller utilizing low-depth (8X) whole-genome sequencing data generated by Oxford Nanopore Technologies. NanoVar exhibits higher structural variant calling accuracy when benchmarked against current tools using low-depth simulated datasets. In patient samples, we successfully validate structural variants characterized by NanoVar and uncover normal alternative sequences or alleles which are present in healthy individuals.
Collapse
Affiliation(s)
- Cheng Yong Tham
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore
| | - Roberto Tirado-Magallanes
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore
| | - Yufen Goh
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore
| | - Melissa J Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Bryan T H Koh
- Department of Orthopedic Surgery, National University Health Systems, Singapore, 119228, Singapore
| | - Wilson Wang
- Department of Orthopedic Surgery, National University Health Systems, Singapore, 119228, Singapore.,Department of Orthopaedic Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Singapore
| | - Chin Hin Ng
- Department of Hematology-Oncology, National University Cancer Institute of Singapore, National University Health System, Singapore, 119228, Singapore
| | - Wee Joo Chng
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore.,Department of Hematology-Oncology, National University Cancer Institute of Singapore, National University Health System, Singapore, 119228, Singapore.,Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Singapore
| | - Alexandre Thiery
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546, Singapore
| | - Daniel G Tenen
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore.,Harvard Stem Cell Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Touati Benoukraf
- Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore. .,Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, A1B 3V6, Canada.
| |
Collapse
|
78
|
De Coster W, Strazisar M, De Rijk P. Critical length in long-read resequencing. NAR Genom Bioinform 2020; 2:lqz027. [PMID: 33575574 PMCID: PMC7671308 DOI: 10.1093/nargab/lqz027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 12/06/2019] [Accepted: 01/02/2020] [Indexed: 12/25/2022] Open
Abstract
Long-read sequencing has substantial advantages for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used long reads simulated from human genomes and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequencing projects.
Collapse
Affiliation(s)
- Wouter De Coster
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| | - Mojca Strazisar
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| | - Peter De Rijk
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| |
Collapse
|
79
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
80
|
Minervini CF, Cumbo C, Orsini P, Anelli L, Zagaria A, Specchia G, Albano F. Nanopore Sequencing in Blood Diseases: A Wide Range of Opportunities. Front Genet 2020; 11:76. [PMID: 32140171 PMCID: PMC7043087 DOI: 10.3389/fgene.2020.00076] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 01/23/2020] [Indexed: 12/20/2022] Open
Abstract
The molecular pathogenesis of hematological diseases is often driven by genetic and epigenetic alterations. Next-generation sequencing has considerably increased our genomic knowledge of these disorders becoming ever more widespread in clinical practice. In 2012 Oxford Nanopore Technologies (ONT) released the MinION, the first long-read nanopore-based sequencer, overcoming the main limits of short-reads sequences generation. In the last years, several nanopore sequencing approaches have been performed in various "-omic" sciences; this review focuses on the challenge to introduce ONT devices in the hematological field, showing advantages, disadvantages and future perspectives of this technology in the precision medicine era.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Albano
- Department of Emergency and Organ Transplantation (D.E.T.O.), Hematology Section, University of Bari, Bari, Italy
| |
Collapse
|
81
|
Simon R, Lischer HEL, Pieńkowska-Schelling A, Keller I, Häfliger IM, Letko A, Schelling C, Lühken G, Drögemüller C. New genomic features of the polled intersex syndrome variant in goats unraveled by long-read whole-genome sequencing. Anim Genet 2020; 51:439-448. [PMID: 32060960 DOI: 10.1111/age.12918] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 01/23/2020] [Accepted: 01/23/2020] [Indexed: 01/19/2023]
Abstract
In domestic goats, the polled intersex syndrome (PIS) refers to XX female-to-male sex reversal associated with the absence of horn growth (polled). The causal variant was previously reported as a 11.7 kb deletion at approximately 129 Mb on chromosome 1 that affects the transcription of both FOXL2 and several long non-coding RNAs. In the meantime the presence of different versions of the PIS deletion was postulated and trials to establish genetic testing with the existing molecular genetic information failed. Therefore, we revisited this variant by long-read whole-genome sequencing of two genetically female (XX) goats, a PIS-affected and a horned control. This revealed the presence of a more complex structural variant consisting of a deletion with a total length of 10 159 bp and an inversely inserted approximately 480 kb-sized duplicated segment of a region located approximately 21 Mb further downstream on chromosome 1 containing two genes, KCNJ15 and ERG. Publicly available short-read whole-genome sequencing data, Sanger sequencing of the breakpoints and FISH using BAC clones corresponding to both involved genome regions confirmed this structural variant. A diagnostic PCR was developed for simultaneous genotyping of carriers for this variant and determination of their genetic sex. We showed that the variant allele was present in all 334 genotyped polled goats of diverse breeds and that all analyzed 15 PIS-affected XX goats were homozygous. Our findings enable for the first time a precise genetic diagnosis for polledness and PIS in goats and add a further genomic feature to the complexity of the PIS phenomenon.
Collapse
Affiliation(s)
- R Simon
- Institute of Animal Breeding and Genetics, Justus Liebig University, Giessen, 35390, Germany
| | - H E L Lischer
- Interfaculty Bioinformatics Unit, University of Bern, Bern, 3001, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - A Pieńkowska-Schelling
- Institute of Genetics, University of Bern, Bern, 3001, Switzerland.,Clinic of Reproductive Medicine, Vetsuisse Faculty, University of Zürich, Zürich, 8057, Switzerland
| | - I Keller
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.,Department for BioMedical Research, University of Bern, Bern, 3001, Switzerland
| | - I M Häfliger
- Institute of Genetics, University of Bern, Bern, 3001, Switzerland
| | - A Letko
- Institute of Genetics, University of Bern, Bern, 3001, Switzerland
| | - C Schelling
- Clinic of Reproductive Medicine, Vetsuisse Faculty, University of Zürich, Zürich, 8057, Switzerland
| | - G Lühken
- Institute of Animal Breeding and Genetics, Justus Liebig University, Giessen, 35390, Germany
| | - C Drögemüller
- Institute of Genetics, University of Bern, Bern, 3001, Switzerland
| |
Collapse
|
82
|
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 768] [Impact Index Per Article: 192.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open
Abstract
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Collapse
Affiliation(s)
- Shanika L. Amarasinghe
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Parkville, 3052 Australia
- School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010 Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
- School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010 Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| |
Collapse
|
83
|
Abstract
Existing long-read assemblers require thousands of central processing unit hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2-17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future.
Collapse
Affiliation(s)
- Jue Ruan
- Agricultural Genomics Institute, Chinese Academy of Agriculture Sciences, Shenzhen, China.
- Peng Cheng Laboratory, Shenzhen, China.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
84
|
Guan DL, Hao XQ, Mi D, Peng J, Li Y, Xie JY, Huang H, Xu SQ. Draft Genome of a Blister Beetle Mylabris aulica. Front Genet 2020; 10:1281. [PMID: 32010178 PMCID: PMC6972506 DOI: 10.3389/fgene.2019.01281] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 11/21/2019] [Indexed: 11/13/2022] Open
Abstract
Mylabris aulica is a widely distributed blister beetle of the Meloidae family. It has the ability to synthesize a potent defensive secretion that includes cantharidin, a toxic compound used to treat many major illnesses. However, owing to the lack of genetic studies on cantharidin biosynthesis in M. aulica, the commercial use of this species is less extensive than that of other blister beetle species in China. This study reports a draft assembly and possible genes and pathways related to cantharidin biosynthesis for the M. aulica blister beetle using nanopore sequencing data. The draft genome assembly size was 288.5 Mb with a 467.8 Kb N50, and a repeat content of 50.62%. An integrated gene finding pipeline performed for assembly obtained 16,500 protein coding genes. Benchmarking universal single-copy orthologs assessment showed that this gene set included 94.4% complete Insecta universal single-copy orthologs. Over 99% of these genes were assigned functional annotations in the gene ontology, Kyoto Encyclopedia of Genes and Genomes, or Genbank non-redundant databases. Comparative genomic analysis showed that the completeness and continuity of our assembly was better than those of Hycleus cichorii and Hycleus phaleratus blister beetle genomes. The analysis of homologous orthologous genes and inference from evolutionary history imply that the Mylabris and Hycleus genera are genetically close, have a similar genetic background, and have differentiated within one million years. This M. aulica genome assembly provides a valuable resource for future blister beetle studies and will contribute to cantharidin biosynthesis.
Collapse
Affiliation(s)
- De-Long Guan
- College of Life Sciences, Shaanxi Normal University, Xi’an, China
| | - Xiao-Qian Hao
- College of Life Sciences, Shaanxi Normal University, Xi’an, China
| | - Da Mi
- NextOmics Biosciences Institute, Wuhan, China
| | - Jiong Peng
- NextOmics Biosciences Institute, Wuhan, China
| | - Yuan Li
- NextOmics Biosciences Institute, Wuhan, China
| | - Juan-Ying Xie
- College of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Huateng Huang
- College of Life Sciences, Shaanxi Normal University, Xi’an, China
| | - Sheng-Quan Xu
- College of Life Sciences, Shaanxi Normal University, Xi’an, China
| |
Collapse
|
85
|
Pereira R, Oliveira J, Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J Clin Med 2020; 9:E132. [PMID: 31947757 PMCID: PMC7019349 DOI: 10.3390/jcm9010132] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/15/2019] [Accepted: 12/30/2019] [Indexed: 12/13/2022] Open
Abstract
Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders.
Collapse
Affiliation(s)
- Rute Pereira
- Laboratory of Cell Biology, Department of Microscopy, Institute of Biomedical Sciences Abel Salazar (ICBAS), University of Porto (UP), 4050-313 Porto, Portugal;
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
| | - Jorge Oliveira
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
- UnIGENe and CGPP–Centre for Predictive and Preventive Genetics-Institute for Molecular and Cell Biology (IBMC), i3S-Institute for Research and Innovation in Health-UP, 4200-135 Porto, Portugal
| | - Mário Sousa
- Laboratory of Cell Biology, Department of Microscopy, Institute of Biomedical Sciences Abel Salazar (ICBAS), University of Porto (UP), 4050-313 Porto, Portugal;
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
| |
Collapse
|
86
|
Jenko Bizjan B, Katsila T, Tesovnik T, Šket R, Debeljak M, Matsoukas MT, Kovač J. Challenges in identifying large germline structural variants for clinical use by long read sequencing. Comput Struct Biotechnol J 2019; 18:83-92. [PMID: 32099591 PMCID: PMC7026727 DOI: 10.1016/j.csbj.2019.11.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Revised: 11/07/2019] [Accepted: 11/21/2019] [Indexed: 12/30/2022] Open
Abstract
Genomic structural variations, previously considered rare events, are widely recognized as a major source of inter-individual variability and hence, a major hurdle in optimum patient stratification and disease management. Herein, we focus on large complex germline structural variations and present challenges towards target treatment via the synergy of state-of-the-art approaches and information technology tools. A complex structural variation detection remains challenging, as there is no gold standard for identifying such genomic variations with long reads, especially when the chromosomal rearrangement in question is a few Mb in length. A clinical case with a large complex chromosomal rearrangement serves as a paradigm. We feel that functional validation and data interpretation are of outmost importance for information growth to be translated into knowledge growth and hence, new working practices are highlighted.
Collapse
Affiliation(s)
- Barbara Jenko Bizjan
- Clinical Institute of Special Laboratory Diagnostics, University Children’s Hospital, UMC, Ljubljana, Slovenia
| | - Theodora Katsila
- Institute of Chemical Biology, National Hellenic Research Centre, Athens, Greece
| | - Tine Tesovnik
- Clinical Institute of Special Laboratory Diagnostics, University Children’s Hospital, UMC, Ljubljana, Slovenia
| | - Robert Šket
- Clinical Institute of Special Laboratory Diagnostics, University Children’s Hospital, UMC, Ljubljana, Slovenia
| | - Maruša Debeljak
- Clinical Institute of Special Laboratory Diagnostics, University Children’s Hospital, UMC, Ljubljana, Slovenia
| | | | - Jernej Kovač
- Clinical Institute of Special Laboratory Diagnostics, University Children’s Hospital, UMC, Ljubljana, Slovenia
| |
Collapse
|
87
|
De Roeck A, De Coster W, Bossaerts L, Cacace R, De Pooter T, Van Dongen J, D’Hert S, De Rijk P, Strazisar M, Van Broeckhoven C, Sleegers K. NanoSatellite: accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION. Genome Biol 2019; 20:239. [PMID: 31727106 PMCID: PMC6857246 DOI: 10.1186/s13059-019-1856-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022] Open
Abstract
Technological limitations have hindered the large-scale genetic investigation of tandem repeats in disease. We show that long-read sequencing with a single Oxford Nanopore Technologies PromethION flow cell per individual achieves 30× human genome coverage and enables accurate assessment of tandem repeats including the 10,000-bp Alzheimer's disease-associated ABCA7 VNTR. The Guppy "flip-flop" base caller and tandem-genotypes tandem repeat caller are efficient for large-scale tandem repeat assessment, but base calling and alignment challenges persist. We present NanoSatellite, which analyzes tandem repeats directly on electric current data and improves calling of GC-rich tandem repeats, expanded alleles, and motif interruptions.
Collapse
Affiliation(s)
- Arne De Roeck
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Wouter De Coster
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Liene Bossaerts
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Rita Cacace
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Tim De Pooter
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Jasper Van Dongen
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Svenn D’Hert
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Peter De Rijk
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Mojca Strazisar
- Neuromics Support Facility, Center for Molecular Neurology, VIB - University of Antwerp, Antwerp, Belgium
| | - Christine Van Broeckhoven
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kristel Sleegers
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, University of Antwerp-CDE, Universiteitsplein 1, B-2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
88
|
Zhou A, Lin T, Xing J. Evaluating nanopore sequencing data processing pipelines for structural variation identification. Genome Biol 2019; 20:237. [PMID: 31727126 PMCID: PMC6857234 DOI: 10.1186/s13059-019-1858-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/10/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. RESULTS Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers' performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. CONCLUSIONS We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.
Collapse
Affiliation(s)
- Anbo Zhou
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Timothy Lin
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetics Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
89
|
Wijfjes RY, Smit S, de Ridder D. Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data. BMC Genomics 2019; 20:818. [PMID: 31699036 PMCID: PMC6836508 DOI: 10.1186/s12864-019-6153-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 09/30/2019] [Indexed: 01/27/2023] Open
Abstract
Background Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. Results To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. Conclusions Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Raúl Y Wijfjes
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands.
| | - Sandra Smit
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
90
|
|
91
|
Sone J, Mitsuhashi S, Fujita A, Mizuguchi T, Hamanaka K, Mori K, Koike H, Hashiguchi A, Takashima H, Sugiyama H, Kohno Y, Takiyama Y, Maeda K, Doi H, Koyano S, Takeuchi H, Kawamoto M, Kohara N, Ando T, Ieda T, Kita Y, Kokubun N, Tsuboi Y, Katoh K, Kino Y, Katsuno M, Iwasaki Y, Yoshida M, Tanaka F, Suzuki IK, Frith MC, Matsumoto N, Sobue G. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat Genet 2019; 51:1215-1221. [PMID: 31332381 DOI: 10.1038/s41588-019-0459-y] [Citation(s) in RCA: 295] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 05/29/2019] [Indexed: 12/20/2022]
Abstract
. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5' region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.
Collapse
Affiliation(s)
- Jun Sone
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan.,Department of Neurology, National hospital organization Suzuka National Hospital, Suzuka, Japan
| | - Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Keiko Mori
- Department of Neurology, Oyamada Memorial Spa Hospital, Yokkaichi, Japan
| | - Haruki Koike
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Akihiro Hashiguchi
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Hiroshi Takashima
- Department of Neurology and Geriatrics, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Hiroshi Sugiyama
- Department of Neurology, National Hospital Organization Utano National Hospital, Kyoto, Japan
| | - Yutaka Kohno
- Department of Neurology, Ibaraki Prefectural University of Health Sciences, Ibaraki, Japan
| | - Yoshihisa Takiyama
- Department of Neurology, University of Yamanashi, Chuo, Yamanashi, Japan
| | - Kengo Maeda
- Department of Neurology, National hospital organization Higashi-Ohmi General Medical Center, Higashi-Ohmi, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Shigeru Koyano
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hideyuki Takeuchi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Michi Kawamoto
- Department of Neurology, Kobe City Medical Center General Hospital, Kobe, Japan
| | - Nobuo Kohara
- Department of Neurology, Kobe City Medical Center General Hospital, Kobe, Japan
| | - Tetsuo Ando
- Department of Neurology, Anjo Kosei Hospital, Anjo, Japan
| | - Toshiaki Ieda
- Department of Neurology, Yokkaichi Municipal Hospital, Yokkaichi, Japan
| | - Yasushi Kita
- Department of Neurology, Hyogo Brain and Heart Center, Himeji, Japan
| | - Norito Kokubun
- Department of Neurology, Dokkyo Medical University, Tochigi, Japan
| | - Yoshio Tsuboi
- Department of Neurology, Fukuoka University, Fukuoka, Japan
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, Suita, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Yoshihiro Kino
- Department of Bioinformatics and Molecular Neuropathology, Meiji Pharmaceutical University, Tokyo, Japan
| | - Masahisa Katsuno
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yasushi Iwasaki
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Mari Yoshida
- Department of Neuropathology, Institute for Medical Science of Aging, Aichi Medical University, Nagakute, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ikuo K Suzuki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.
| | - Gen Sobue
- Department of Neurology, Nagoya University Graduate School of Medicine, Nagoya, Japan. .,Department of Neurology, and Brain and Mind Research Center, Nagoya University Graduate School of Medicine, Nagoya, Japan. .,Aichi Medical University, Nagakute, Aichi, Japan.
| |
Collapse
|
92
|
Ping Z, Ma D, Huang X, Chen S, Liu L, Guo F, Zhu SJ, Shen Y. Carbon-based archiving: current progress and future prospects of DNA-based data storage. Gigascience 2019; 8:giz075. [PMID: 31220251 PMCID: PMC6586197 DOI: 10.1093/gigascience/giz075] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 12/09/2018] [Accepted: 06/03/2019] [Indexed: 01/23/2023] Open
Abstract
The information explosion has led to a rapid increase in the amount of data requiring physical storage. However, in the near future, existing storage methods (i.e., magnetic and optical media) will be insufficient to store these exponentially growing data. Therefore, data scientists are continually looking for better, more stable, and space-efficient alternatives to store these huge datasets. Because of its unique biological properties, highly condensed DNA has great potential to become a storage material for the future. Indeed, DNA-based data storage has recently emerged as a promising approach for long-term digital information storage. This review summarizes state-of-the-art methods, including digital-to-DNA coding schemes and the media types used in DNA-based data storage, and provides an overview of recent progress achieved in this field and its exciting future.
Collapse
Affiliation(s)
- Zhi Ping
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Dongzhao Ma
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Xiaoluo Huang
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Shihong Chen
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Longying Liu
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Fei Guo
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Sha Joe Zhu
- Big Data Institute, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford OX3 7LF, UK
| | - Yue Shen
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|