201
|
Oleksyk TK, Wolfsberger WW, Weber AM, Shchubelka K, Oleksyk OT, Levchuk O, Patrus A, Lazar N, Castro-Marquez SO, Hasynets Y, Boldyzhar P, Neymet M, Urbanovych A, Stakhovska V, Malyar K, Chervyakova S, Podoroha O, Kovalchuk N, Rodriguez-Flores JL, Zhou W, Medley S, Battistuzzi F, Liu R, Hou Y, Chen S, Yang H, Yeager M, Dean M, Mills RE, Smolanka V. Genome diversity in Ukraine. Gigascience 2021; 10:6079618. [PMID: 33438729 PMCID: PMC7804371 DOI: 10.1093/gigascience/giaa159] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/21/2020] [Accepted: 12/15/2020] [Indexed: 01/21/2023] Open
Abstract
Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.
Collapse
Affiliation(s)
- Taras K Oleksyk
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine.,Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Walter W Wolfsberger
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine.,Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Alexandra M Weber
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Khrystyna Shchubelka
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA.,Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| | - Olga T Oleksyk
- A. Novak Transcarpathian Regional Clinical Hospital, Uzhhorod 88000, Ukraine
| | | | | | | | - Stephanie O Castro-Marquez
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Yaroslava Hasynets
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine
| | - Patricia Boldyzhar
- Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| | - Mikhailo Neymet
- Velyka Kopanya Family Hospital, Transcarpatia 90330, Ukraine
| | | | | | - Kateryna Malyar
- I.I.Mechnikov Dnipro Regional Clinical Hospital, Dnipro 49000, Ukraine
| | | | | | - Natalia Kovalchuk
- Rivne Regional Specialized Hospital of Radiation Protection, Rivne 33028, Ukraine
| | | | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sarah Medley
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA
| | - Fabia Battistuzzi
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA
| | - Ryan Liu
- BGI Shenzhen, Shenzhen, 518083, China
| | - Yong Hou
- BGI Shenzhen, Shenzhen, 518083, China
| | - Siru Chen
- BGI Shenzhen, Shenzhen, 518083, China
| | | | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Volodymyr Smolanka
- Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| |
Collapse
|
202
|
Bhattacharya S, Barseghyan H, Délot EC, Vilain E. nanotatoR: a tool for enhanced annotation of genomic structural variants. BMC Genomics 2021; 22:10. [PMID: 33407088 PMCID: PMC7789800 DOI: 10.1186/s12864-020-07182-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/22/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient's phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR's annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
Collapse
Affiliation(s)
- Surajit Bhattacharya
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA.,Bionano Genomics Inc, San Diego, CA, 92121, USA
| | - Emmanuèle C Délot
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA. .,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA.
| |
Collapse
|
203
|
Emerging molecular subtypes and therapeutic targets in B-cell precursor acute lymphoblastic leukemia. Front Med 2021; 15:347-371. [PMID: 33400146 DOI: 10.1007/s11684-020-0821-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 09/04/2020] [Indexed: 12/13/2022]
Abstract
B-cell precursor acute lymphoblastic leukemia (BCP-ALL) is characterized by genetic alterations with high heterogeneity. Precise subtypes with distinct genomic and/or gene expression patterns have been recently revealed using high-throughput sequencing technology. Most of these profiles are associated with recurrent non-overlapping rearrangements or hotspot point mutations that are analogous to the established subtypes, such as DUX4 rearrangements, MEF2D rearrangements, ZNF384/ZNF362 rearrangements, NUTM1 rearrangements, BCL2/MYC and/or BCL6 rearrangements, ETV6-RUNX1-like gene expression, PAX5alt (diverse PAX5 alterations, including rearrangements, intragenic amplifications, or mutations), and hotspot mutations PAX5 (p.Pro80Arg) with biallelic PAX5 alterations, IKZF1 (p.Asn159Tyr), and ZEB2 (p.His1038Arg). These molecular subtypes could be classified by gene expression patterns with RNA-seq technology. Refined molecular classification greatly improved the treatment strategy. Multiagent therapy regimens, including target inhibitors (e.g., imatinib), immunomodulators, monoclonal antibodies, and chimeric antigen receptor T-cell (CAR-T) therapy, are transforming the clinical practice from chemotherapy drugs to personalized medicine in the field of risk-directed disease management. We provide an update on our knowledge of emerging molecular subtypes and therapeutic targets in BCP-ALL.
Collapse
|
204
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021; 22:3. [PMID: 33397434 PMCID: PMC7780660 DOI: 10.1186/s13059-020-02224-8] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 12/07/2020] [Indexed: 01/13/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| |
Collapse
|
205
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021. [PMID: 33397434 DOI: 10.1186/s13059-020-02224-2228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
206
|
Qian Y, Li L, Sun Z, Liu J, Yuan W, Wang Z. A multi-omics view of the complex mechanism of vascular calcification. Biomed Pharmacother 2021; 135:111192. [PMID: 33401220 DOI: 10.1016/j.biopha.2020.111192] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/19/2020] [Accepted: 12/26/2020] [Indexed: 02/07/2023] Open
Abstract
Vascular calcification is a high incidence and high risk disease with increasing morbidity and high mortality, which is considered the consequence of smooth muscle cell transdifferentiation initiating the mechanism of accumulation of hydroxyl calcium phosphate. Vascular calcification is also thought to be strongly associated with poor outcomes in diabetes and chronic kidney disease. Numerous studies have been accomplished; however, the specific mechanism of the disease remains unclear. Development of the genome project enhanced the understanding of life science and has entered the post-genomic era resulting in a variety of omics techniques used in studies and a large amount of available data; thus, a new perspective on data analysis has been revealed. Omics has a broader perspective and is thus advantageous over a single pathway analysis in the study of complex vascular calcification mechanisms. This paper reviews in detail various omics studies including genomics, proteomics, transcriptomics, metabolomics and multiple group studies on vascular calcification. Advances and deficiencies in the use of omics to study vascular calcification are presented in a comprehensive view. We also review the methodology of the omics studies and omics data analysis and processing. In addition, the methodology and data processing presented here can be applied to other areas. An omics landscape perspective across the boundaries between genomics, transcriptomics, proteomics and metabolomics is used to examine the mechanisms of vascular calcification. The perspective combined with various technologies also provides a direction for the subsequent exploration of clinical significance.
Collapse
Affiliation(s)
- Yongjiang Qian
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Lihua Li
- Department of Pathology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Zhen Sun
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Jia Liu
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Wei Yuan
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Zhongqun Wang
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China.
| |
Collapse
|
207
|
Everhart S, Gambhir N, Stam R. Population Genomics of Filamentous Plant Pathogens-A Brief Overview of Research Questions, Approaches, and Pitfalls. PHYTOPATHOLOGY 2021; 111:12-22. [PMID: 33337245 DOI: 10.1094/phyto-11-20-0527-fi] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With ever-decreasing sequencing costs, research on the population biology of plant pathogens is transitioning from population genetics-using dozens of genetic markers or polymorphism data of several genes-to population genomics-using several hundred to tens of thousands of markers or whole-genome sequence data. The field of population genomics is characterized by rapid theoretical and methodological advances and by numerous steps and pitfalls in its technical and analytical workflow. In this article, we aim to provide a brief overview of topics relevant to the study of population genomics of filamentous plant pathogens and direct readers to more extensive reviews for in-depth understanding. We briefly discuss different types of population genomics-inspired research questions and give insights into the sampling strategies that can be used to answer such questions. We then consider different sequencing strategies, the various options available for data processing, and some of the currently available tools for population genomic data analysis. We conclude by highlighting some of the hurdles along the population genomic workflow, providing cautionary warnings relative to assumptions and technical challenges, and presenting our own future perspectives of the field of population genomics for filamentous plant pathogens.
Collapse
Affiliation(s)
- Sydney Everhart
- Department of Plant Pathology, University of Nebraska, Lincoln, NE 68583, U.S.A
| | - Nikita Gambhir
- Department of Plant Pathology, University of Nebraska, Lincoln, NE 68583, U.S.A
| | - Remco Stam
- Phytopathology, School of Life Sciences Weihenstephan, Technical University Munich, Germany
| |
Collapse
|
208
|
Heller D, Vingron M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 2020; 36:5519-5521. [PMID: 33346817 PMCID: PMC8016491 DOI: 10.1093/bioinformatics/btaa1034] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/16/2020] [Accepted: 12/12/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION With the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes. RESULTS We introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual. AVAILABILITY AND IMPLEMENTATION SVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/svim-asm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
209
|
Accurate mapping of mitochondrial DNA deletions and duplications using deep sequencing. PLoS Genet 2020; 16:e1009242. [PMID: 33315859 PMCID: PMC7769605 DOI: 10.1371/journal.pgen.1009242] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 12/28/2020] [Accepted: 11/02/2020] [Indexed: 12/21/2022] Open
Abstract
Deletions and duplications in mitochondrial DNA (mtDNA) cause mitochondrial disease and accumulate in conditions such as cancer and age-related disorders, but validated high-throughput methodology that can readily detect and discriminate between these two types of events is lacking. Here we establish a computational method, MitoSAlt, for accurate identification, quantification and visualization of mtDNA deletions and duplications from genomic sequencing data. Our method was tested on simulated sequencing reads and human patient samples with single deletions and duplications to verify its accuracy. Application to mouse models of mtDNA maintenance disease demonstrated the ability to detect deletions and duplications even at low levels of heteroplasmy.
Collapse
|
210
|
Identification and population genetic analyses of copy number variations in six domestic goat breeds and Bezoar ibexes using next-generation sequencing. BMC Genomics 2020; 21:840. [PMID: 33246410 PMCID: PMC7694352 DOI: 10.1186/s12864-020-07267-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 11/23/2020] [Indexed: 11/27/2022] Open
Abstract
Background Copy number variations (CNVs) are a major form of genetic variations and are involved in animal domestication and genetic adaptation to local environments. We investigated CNVs in the domestic goat (Capra hircus) using Illumina short-read sequencing data, by comparing our lab data for 38 goats from three Chinese breeds (Chengdu Brown, Jintang Black, and Tibetan Cashmere) to public data for 26 individuals from three other breeds (two Moroccan and one Chinese) and 21samples from Bezoar ibexes. Results We obtained a total of 2394 CNV regions (CNVRs) by merging 208,649 high-confidence CNVs, which spanned ~ 267 Mb of total length and accounted for 10.80% of the goat autosomal genome. Functional analyses showed that 2322 genes overlapping with the CNVRs were significantly enriched in 57 functional GO terms and KEGG pathways, most related to the nervous system, metabolic process, and reproduction system. Clustering patterns of all 85 samples generated separately from duplications and deletions were generally consistent with the results from SNPs, agreeing with the geographical origins of these goats. Based on genome-wide FST at each CNV locus, some genes overlapping with the highly divergent CNVs between domestic and wild goats were mainly enriched for several immunity-related pathways, whereas the genes overlapping with the highly differentiated CNVs between highland and lowland goats were mainly related to vitamin and lipid metabolism. Remarkably, a 507-bp deletion at ~ 14 kb downstream of FGF5 on chromosome 6 showed highly divergent (FST = 0.973) between the highland and lowland goats. Together with an enhancer activity of this sequence shown previously, the function of this duplication in regulating fiber growth deserved to be further investigated in detail. Conclusion We generated a comprehensive map of CNVs in goats. Many genetically differentiated CNVs among various goat populations might be associated with the population characteristics of domestic goat breeds. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07267-6.
Collapse
|
211
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
212
|
Lomov N, Zerkalenkova E, Lebedeva S, Viushkov V, Rubtsov MA. Cytogenetic and molecular genetic methods for chromosomal translocations detection with reference to the KMT2A/MLL gene. Crit Rev Clin Lab Sci 2020; 58:180-206. [PMID: 33205680 DOI: 10.1080/10408363.2020.1844135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Acute leukemias (ALs) are often associated with chromosomal translocations, in particular, KMT2A/MLL gene rearrangements. Identification or confirmation of these translocations is carried out by a number of genetic and molecular methods, some of which are routinely used in clinical practice, while others are primarily used for research purposes. In the clinic, these methods serve to clarify diagnoses and monitor the course of disease and therapy. On the other hand, the identification of new translocations and the confirmation of known translocations are of key importance in the study of disease mechanisms and further molecular classification. There are multiple methods for the detection of rearrangements that differ in their principle of operation, the type of problem being solved, and the cost-result ratio. This review is intended to help researchers and clinicians studying AL and related chromosomal translocations to navigate this variety of methods. All methods considered in the review are grouped by their principle of action and include karyotyping, fluorescence in situ hybridization (FISH) with probes for whole chromosomes or individual loci, PCR and reverse transcription-based methods, and high-throughput sequencing. Another characteristic of the described methods is the type of problem being solved. This can be the discovery of new rearrangements, the determination of unknown partner genes participating in the rearrangement, or the confirmation of the proposed rearrangement between the two genes. We consider the specifics of the application, the basic principle of each method, and its pros and cons. To illustrate the application, examples of studying the rearrangements of the KMT2A/MLL gene, one of the genes that are often rearranged in AL, are mentioned.
Collapse
Affiliation(s)
- Nikolai Lomov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Elena Zerkalenkova
- Laboratory of Cytogenetics and Molecular Genetics Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia
| | - Svetlana Lebedeva
- Laboratory of Cytogenetics and Molecular Genetics Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia
| | - Vladimir Viushkov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail A Rubtsov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia.,Department of Biochemistry, Institute for Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| |
Collapse
|
213
|
Dhiman H, Campbell M, Melcher M, Smith KD, Borth N. Predicting favorable landing pads for targeted integrations in Chinese hamster ovary cell lines by learning stability characteristics from random transgene integrations. Comput Struct Biotechnol J 2020; 18:3632-3648. [PMID: 33304461 PMCID: PMC7710658 DOI: 10.1016/j.csbj.2020.11.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 01/06/2023] Open
Abstract
Chinese Hamster Ovary (CHO) cell lines are considered to be the preferred platform for the production of biotherapeutics, but issues related to expression instability remain unresolved. In this study, we investigated potential causes for an unstable phenotype by comparing cell lines that express stably to such that undergo loss in titer across 10 passages. Factors related to transgene integrity and copy number as well as the genomic profile around the integration sites were analyzed. Horizon Discovery CHO-K1 (HD-BIOP3) derived production cell lines selected for phenotypes with low, medium or high copy number, each with stable and unstable transgene expression, were sequenced to capture changes at genomic and transcriptomic levels. The exact sites of the random integration events in each cell line were also identified, followed by profiling of the genomic, transcriptomic and epigenetic patterns around them. Based on the information deduced from these random integration events, genomic loci that potentially favor reliable and stable transgene expression were reported for use as targeted transgene integration sites. By comparing stable vs unstable phenotypes across these parameters, we could establish that expression stability may be controlled at three levels: 1) Good choice of integration site, 2) Ensuring integrity of transgene and observing concatemerization pattern after integration, and 3) Checking for potential stress related cellular processes. Genome wide favorable and unfavorable genomic loci for targeted transgene integration can be browsed at https://www.borthlabchoresources.boku.ac.at/
Collapse
Affiliation(s)
- Heena Dhiman
- University of Natural Resources and Life Sciences, Vienna, Austria.,Austrian Centre of Industrial Biotechnology, Vienna, Austria
| | | | - Michael Melcher
- University of Natural Resources and Life Sciences, Vienna, Austria
| | | | - Nicole Borth
- University of Natural Resources and Life Sciences, Vienna, Austria.,Austrian Centre of Industrial Biotechnology, Vienna, Austria
| |
Collapse
|
214
|
Rao J, Peng L, Liang X, Jiang H, Geng C, Zhao X, Liu X, Fan G, Chen F, Mu F. Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms. BMC Bioinformatics 2020; 21:518. [PMID: 33176676 PMCID: PMC7659224 DOI: 10.1186/s12859-020-03859-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 11/03/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear. RESULTS Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions. CONCLUSIONS We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.
Collapse
Affiliation(s)
- Junhua Rao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | | | | | - Hui Jiang
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | | | - Xia Zhao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, Shandong, China.,IGDB-BGI Joint Center for Omics, BGI-Shenzhen, Shenzhen, 518083, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, Shandong, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| | - Feng Mu
- MGI, BGI-Shenzhen, Shenzhen, 518083, China. .,MGI-Wuhan, BGI-Shenzhen, Wuhan, 430074, China.
| |
Collapse
|
215
|
Uchiyama Y, Yamaguchi D, Iwama K, Miyatake S, Hamanaka K, Tsuchida N, Aoi H, Azuma Y, Itai T, Saida K, Fukuda H, Sekiguchi F, Sakaguchi T, Lei M, Ohori S, Sakamoto M, Kato M, Koike T, Takahashi Y, Tanda K, Hyodo Y, Honjo RS, Bertola DR, Kim CA, Goto M, Okazaki T, Yamada H, Maegaki Y, Osaka H, Ngu LH, Siew CG, Teik KW, Akasaka M, Doi H, Tanaka F, Goto T, Guo L, Ikegawa S, Haginoya K, Haniffa M, Hiraishi N, Hiraki Y, Ikemoto S, Daida A, Hamano SI, Miura M, Ishiyama A, Kawano O, Kondo A, Matsumoto H, Okamoto N, Okanishi T, Oyoshi Y, Takeshita E, Suzuki T, Ogawa Y, Handa H, Miyazono Y, Koshimizu E, Fujita A, Takata A, Miyake N, Mizuguchi T, Matsumoto N. Efficient detection of copy-number variations using exome data: Batch- and sex-based analyses. Hum Mutat 2020; 42:50-65. [PMID: 33131168 DOI: 10.1002/humu.24129] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 09/29/2020] [Accepted: 10/15/2020] [Indexed: 12/16/2022]
Abstract
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Collapse
Affiliation(s)
- Yuri Uchiyama
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | - Kazuhiro Iwama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomi Tsuchida
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Aoi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiteru Azuma
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Toshiyuki Itai
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ken Saida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Futoshi Sekiguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohiro Sakaguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ming Lei
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Masamune Sakamoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan
| | - Takayoshi Koike
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Yukitoshi Takahashi
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Koichi Tanda
- Department of Pediatrics, Japanese Red Cross Kyoto Daiichi Hospital, Kyoto, Japan
| | - Yuki Hyodo
- Department of Child Neurology, Okayama University Hospital, Okayama, Japan
| | - Rachel S Honjo
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Debora Romeo Bertola
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Chong Ae Kim
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Masahide Goto
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Tetsuya Okazaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hiroyuki Yamada
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Yoshihiro Maegaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hitoshi Osaka
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Lock-Hock Ngu
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Ch'ng G Siew
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Keng W Teik
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Manami Akasaka
- Department of Pediatrics, Iwate Medical University School of Medicine, Morioka, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohide Goto
- Division of Neurology, Kanagawa Children's Medical Center, Yokohama, Japan
| | - Long Guo
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Kazuhiro Haginoya
- Department of Pediatric Neurology, Miyagi Children's Hospital, Sendai, Japan
| | - Muzhirah Haniffa
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Nozomi Hiraishi
- Department of Pediatrics, Yokohama City University Medical Center, Yokohama, Japan
| | - Yoko Hiraki
- Hiroshima Municipal Center for Child Health and Development, Hiroshima, Japan
| | - Satoru Ikemoto
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Atsuro Daida
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Shin-Ichiro Hamano
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Masaki Miura
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan.,Department of Pediatrics, Nagaoka Red Cross Hospital, Nagaoka, Japan
| | - Akihiko Ishiyama
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Osamu Kawano
- Department of Pediatrics, Hokkaido University Hospital, Sapporo, Japan
| | - Akane Kondo
- Clinical Genetics Center, Shikoku Medical Center for Children and Adults, National Hospital Organization, Kagawa, Japan
| | - Hiroshi Matsumoto
- Department of Pediatrics, National Defense Medical College, Saitama, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Tohru Okanishi
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan.,Department of Child Neurology, Comprehensive Epilepsy Center, Seirei Hamamatsu General Hospital, Hamamatsu, Japan
| | - Yukimi Oyoshi
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Eri Takeshita
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Toshifumi Suzuki
- Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiyuki Ogawa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Hiroshi Handa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Yayoi Miyazono
- Department of Child Health, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Noriko Miyake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
216
|
Delage WJ, Thevenon J, Lemaitre C. Towards a better understanding of the low recall of insertion variants with short-read based variant callers. BMC Genomics 2020; 21:762. [PMID: 33148192 PMCID: PMC7640490 DOI: 10.1186/s12864-020-07125-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 10/06/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Since 2009, numerous tools have been developed to detect structural variants using short read technologies. Insertions >50 bp are one of the hardest type to discover and are drastically underrepresented in gold standard variant callsets. The advent of long read technologies has completely changed the situation. In 2019, two independent cross technologies studies have published the most complete variant callsets with sequence resolved insertions in human individuals. Among the reported insertions, only 17 to 28% could be discovered with short-read based tools. RESULTS In this work, we performed an in-depth analysis of these unprecedented insertion callsets in order to investigate the causes of such failures. We have first established a precise classification of insertion variants according to four layers of characterization: the nature and size of the inserted sequence, the genomic context of the insertion site and the breakpoint junction complexity. Because these levels are intertwined, we then used simulations to characterize the impact of each complexity factor on the recall of several structural variant callers. We showed that most reported insertions exhibited characteristics that may interfere with their discovery: 63% were tandem repeat expansions, 38% contained homology larger than 10 bp within their breakpoint junctions and 70% were located in simple repeats. Consequently, the recall of short-read based variant callers was significantly lower for such insertions (6% for tandem repeats vs 56% for mobile element insertions). Simulations showed that the most impacting factor was the insertion type rather than the genomic context, with various difficulties being handled differently among the tested structural variant callers, and they highlighted the lack of sequence resolution for most insertion calls. CONCLUSIONS Our results explain the low recall by pointing out several difficulty factors among the observed insertion features and provide avenues for improving SV caller algorithms and their combinations.
Collapse
Affiliation(s)
| | - Julien Thevenon
- Inserm U1209, CNRS UMR 5309, Univ. Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France & Genetics, Genomics and Reproduction Service, Centre Hospitalo-Universitaire Grenoble-Alpes, Grenoble, France
| | | |
Collapse
|
217
|
Buckley RM, Davis BW, Brashear WA, Farias FHG, Kuroki K, Graves T, Hillier LW, Kremitzki M, Li G, Middleton RP, Minx P, Tomlinson C, Lyons LA, Murphy WJ, Warren WC. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. PLoS Genet 2020; 16:e1008926. [PMID: 33090996 PMCID: PMC7581003 DOI: 10.1371/journal.pgen.1008926] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 06/10/2020] [Indexed: 12/30/2022] Open
Abstract
The domestic cat (Felis catus) numbers over 94 million in the USA alone, occupies households as a companion animal, and, like humans, suffers from cancer and common and rare diseases. However, genome-wide sequence variant information is limited for this species. To empower trait analyses, a new cat genome reference assembly was developed from PacBio long sequence reads that significantly improve sequence representation and assembly contiguity. The whole genome sequences of 54 domestic cats were aligned to the reference to identify single nucleotide variants (SNVs) and structural variants (SVs). Across all cats, 16 SNVs predicted to have deleterious impacts and in a singleton state were identified as high priority candidates for causative mutations. One candidate was a stop gain in the tumor suppressor FBXW7. The SNV is found in cats segregating for feline mediastinal lymphoma and is a candidate for inherited cancer susceptibility. SV analysis revealed a complex deletion coupled with a nearby potential duplication event that was shared privately across three unrelated cats with dwarfism and is found within a known dwarfism associated region on cat chromosome B1. This SV interrupted UDP-glucose 6-dehydrogenase (UGDH), a gene involved in the biosynthesis of glycosaminoglycans. Importantly, UGDH has not yet been associated with human dwarfism and should be screened in undiagnosed patients. The new high-quality cat genome reference and the compilation of sequence variation demonstrate the importance of these resources when searching for disease causative alleles in the domestic cat and for identification of feline biomedical models. The practice of genomic medicine is predicated on the availability of a high quality reference genome and an understanding of the impact of genome variation. Such resources have lead to countless discoveries in humans, however by working exclusively within the framework of human genetics, our potential for understanding diseases biology is limited, as similar analyses in other species have often lead to novel insights. The generation of Felis_catus_9.0, a new high quality reference genome for the domestic cat, helps facilitate the expansion of genomic medicine into the Felis lineage. Using Felis_catus_9.0 we analyze the landscape of genomic variation from a collection of 54 cats within the context of human gene constraint. The distribution of variant impacts in cats is correlated with patterns of gene constraint in humans, indicating the utility of this reference for identifying novel mutations that cause phenotypes relevant to human and cat health. Moreover, structural variant analysis revealed a novel variant for feline dwarfism in UGDH, a gene that has not been associated with dwarfism in any other species, suggesting a role for UGDH in cases of undiagnosed dwarfism in humans.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Brian W. Davis
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley A. Brashear
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Fabiana H. G. Farias
- Department of Psychiatry, Washington University, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics, Washington University, St. Louis, Missouri, United States of America
| | - Kei Kuroki
- Veterinary Medical Diagnostic Laboratory, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Tina Graves
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - LaDeana W. Hillier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Gang Li
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | | | - Patrick Minx
- Donald Danforth Plant Science, St Louis, Missouri, United States of America
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Leslie A. Lyons
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley C. Warren
- Division of Animal Sciences, School of Medicine, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
218
|
Yang L, Niu Q, Zhang T, Zhao G, Zhu B, Chen Y, Zhang L, Gao X, Gao H, Liu GE, Li J, Xu L. Genomic sequencing analysis reveals copy number variations and their associations with economically important traits in beef cattle. Genomics 2020; 113:812-820. [PMID: 33080318 DOI: 10.1016/j.ygeno.2020.10.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/21/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
Copy number variation (CNV) represents a major source of genetic variation, which may have potentially large effects, including alternating gene regulation and dosage, as well as contributing to gene expression and risk for normal phenotypic variability. We carried out a comprehensive analysis of CNV based on whole genome sequencing in Chinese Simmental beef cattle. Totally, we found 9313 deletion and 234 duplication events, covering 147.5 Mb autosomal regions. Within them, 257 deletion events of high frequency overlapped with 193 known RefGenes. Among these genes, we observed several genes were related to economically important traits, like residual feed intake, immune responding, pregnancy rate and muscle differentiation. Using a locus-based analysis, we identified 11 deletions and 1 duplication, which were significantly associated with three traits including carcass weight, tenderloin and longissimus muscle area. Our sequencing-based study provided important insights into investigating the association of CNVs with important traits in beef cattle.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
| | - Qunhao Niu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tianliu Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Guoyao Zhao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Zhu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yan Chen
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Lupei Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xue Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Huijiang Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705, USA.
| | - Junya Li
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Lingyang Xu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
219
|
Bertolotti AC, Layer RM, Gundappa MK, Gallagher MD, Pehlivanoglu E, Nome T, Robledo D, Kent MP, Røsæg LL, Holen MM, Mulugeta TD, Ashton TJ, Hindar K, Sægrov H, Florø-Larsen B, Erkinaro J, Primmer CR, Bernatchez L, Martin SAM, Johnston IA, Sandve SR, Lien S, Macqueen DJ. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 2020; 11:5176. [PMID: 33056985 PMCID: PMC7560756 DOI: 10.1038/s41467-020-18972-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 09/23/2020] [Indexed: 12/25/2022] Open
Abstract
Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species. This study presents and validates a novel approach to reliably identify structural variations (SVs) in non-model genomes using whole genome sequencing, which was used to detect 15,483 SVs in 492 Atlantic salmon, shedding light on their roles in genome evolution and the genetic architecture of domestication.
Collapse
Affiliation(s)
- Alicia C Bertolotti
- School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen, UK.,The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.,Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Manu Kumar Gundappa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Michael D Gallagher
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Ege Pehlivanoglu
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Torfinn Nome
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Matthew P Kent
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Line L Røsæg
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Matilde M Holen
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Teshome D Mulugeta
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | - Kjetil Hindar
- Norwegian Institute for Nature Research (NINA), P.O. Box 5685 Torgarden, 7485, Trondheim, Norway
| | | | - Bjørn Florø-Larsen
- Norwegian Veterinary Institute, P.O. Box 750 Sentrum, 0106, Oslo, Norway
| | - Jaakko Erkinaro
- Natural Resources Institute Finland (Luke), P.O. Box 413, FI-90014, Oulu, Finland
| | - Craig R Primmer
- Institute for Biotechnology, University of Helsinki, Helsinki, Finland
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS) Pavillon Charles-Eugène Marchand, Université Laval Québec, Québec, QC, Canada
| | - Samuel A M Martin
- School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen, UK
| | | | - Simen R Sandve
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Daniel J Macqueen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
220
|
Zhang Q, Zhang X, Liu J, Mao C, Chen S, Zhang Y, Leng L. Identification of copy number variation and population analysis of the sacred lotus ( Nelumbo nucifera). Biosci Biotechnol Biochem 2020; 84:2037-2044. [PMID: 32594903 DOI: 10.1080/09168451.2020.1786351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The sacred lotus (Nelumbo nucifera) is widely cultured in East Asia for its horticultural, agricultural, and medicinal values. Although many molecular markers had been used to extrapolate population genetics of the sacred lotus, a study of large variations, such as copy number variation (CNV), are absent up to now. In this study, we applied whole-genome re-sequencing to 24 lotus accessions, and use read depth information to genotype and filter original CNV call. Totally 448 duplications and 4,267 deletions were identified in the final CNV set. Further analysis of population structure revealed that the population structure patterns revealed by CNV and SNP are largely consistent with each other. Our result indicated that deep sequencing followed by genotyping is a quick and straightforward way to mine out CNV from the population, and the CNV along with SNP could enable us to better comprehend the biology of the plant.
Collapse
Affiliation(s)
- Qing Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Xueting Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Jing Liu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Chaoyi Mao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Sha Chen
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Yujun Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Liang Leng
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| |
Collapse
|
221
|
Zhuang X, Ye R, So MT, Lam WY, Karim A, Yu M, Ngo ND, Cherny SS, Tam PKH, Garcia-Barcelo MM, Tang CSM, Sham PC. A random forest-based framework for genotyping and accuracy assessment of copy number variations. NAR Genom Bioinform 2020; 2:lqaa071. [PMID: 33575619 PMCID: PMC7671382 DOI: 10.1093/nargab/lqaa071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 08/18/2020] [Accepted: 08/26/2020] [Indexed: 12/24/2022] Open
Abstract
Detection of copy number variations (CNVs) is essential for uncovering genetic factors underlying human diseases. However, CNV detection by current methods is prone to error, and precisely identifying CNVs from paired-end whole genome sequencing (WGS) data is still challenging. Here, we present a framework, CNV-JACG, for Judging the Accuracy of CNVs and Genotyping using paired-end WGS data. CNV-JACG is based on a random forest model trained on 21 distinctive features characterizing the CNV region and its breakpoints. Using the data from the 1000 Genomes Project, Genome in a Bottle Consortium, the Human Genome Structural Variation Consortium and in-house technical replicates, we show that CNV-JACG has superior sensitivity over the latest genotyping method, SV2, particularly for the small CNVs (≤1 kb). We also demonstrate that CNV-JACG outperforms SV2 in terms of Mendelian inconsistency in trios and concordance between technical replicates. Our study suggests that CNV-JACG would be a useful tool in assessing the accuracy of CNVs to meet the ever-growing needs for uncovering the missing heritability linked to CNVs.
Collapse
Affiliation(s)
- Xuehan Zhuang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Rui Ye
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Man-Ting So
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Wai-Yee Lam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Anwarul Karim
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Michelle Yu
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Ngoc Diem Ngo
- National Hospital of Pediatrics, Ha Noi 100000, Vietnam
| | - Stacey S Cherny
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul Kwong-Hang Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | - Clara Sze-Man Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
222
|
Implications of germline copy-number variations in psychiatric disorders: review of large-scale genetic studies. J Hum Genet 2020; 66:25-37. [PMID: 32958875 DOI: 10.1038/s10038-020-00838-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023]
Abstract
Copy number variants (CNVs), defined as genome sequences of ≥50 bp that differ in copy number from that in a reference genome, are a common form of structural variation. Germline CNVs account for some of the missing heritability that single nucleotide polymorphisms could not account for. Recent technological advances have had a huge impact on CNV research. Microarray technology enables relatively low-cost, high-throughput, genome-wide measurements, and short-read sequencing technology enables the detection of short CNVs that cannot be detected by microarrays. As a result, large-scale genetic studies have been able to identify a variety of common and rare germline CNVs and their associations with diseases. Rare germline CNVs have been reported to be associated with neuropsychiatric disorders. In this review, we focused on germline CNVs and briefly described their functional characteristics, formation mechanisms, detection methods, related databases, and the latest findings. Finally, we introduced recent large-scale genetic studies to assess associations of CNVs with diseases, especially psychiatric disorders, and discussed the use of CNV-based animal models to investigate the molecular and cellular mechanisms underlying these disorders. The development and implementation of improved detection methods, such as long-read single-molecule sequencing, are expected to provide additional insight into the molecular basis of psychiatric disorders and other complex diseases, thus facilitating basic and clinical research on CNVs.
Collapse
|
223
|
Yang L. A Practical Guide for Structural Variation Detection in the Human Genome. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 107:e103. [PMID: 32813322 PMCID: PMC7738216 DOI: 10.1002/cphg.103] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Profiling genetic variants-including single nucleotide variants, small insertions and deletions, copy number variations, and structural variations (SVs)-from both healthy individuals and individuals with disease is a key component of genetic and biomedical research. SVs are large-scale changes in the genome and involve breakage and rejoining of DNA fragments. They may affect thousands to millions of nucleotides and can lead to loss, gain, and reshuffling of genes and regulatory elements. SVs are known to impact gene expression and potentially result in altered phenotypes and diseases. Therefore, identifying SVs from the human genomes is particularly important. In this review, I describe advantages and disadvantages of the available high-throughput assays for the discovery of SVs, which are the most challenging genetic alterations to detect. A practical guide is offered to suggest the most suitable strategies for discovering different types of SVs including common germline, rare, somatic, and complex variants. I also discuss factors to be considered, such as cost and performance, for different strategies when designing experiments. Last, I present several approaches to identify potential SV artifacts caused by samples, experimental procedures, and computational analysis. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Lixing Yang
- Ben May Department for Cancer Research, Department of Human Genetics, University of Chicago, Chicago, Illinois
| |
Collapse
|
224
|
Luebeck J, Coruh C, Dehkordi SR, Lange JT, Turner KM, Deshpande V, Pai DA, Zhang C, Rajkumar U, Law JA, Mischel PS, Bafna V. AmpliconReconstructor integrates NGS and optical mapping to resolve the complex structures of focal amplifications. Nat Commun 2020; 11:4374. [PMID: 32873787 PMCID: PMC7463033 DOI: 10.1038/s41467-020-18099-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 07/31/2020] [Indexed: 12/15/2022] Open
Abstract
Oncogene amplification, a major driver of cancer pathogenicity, is often mediated through focal amplification of genomic segments. Recent results implicate extrachromosomal DNA (ecDNA) as the primary driver of focal copy number amplification (fCNA) - enabling gene amplification, rapid tumor evolution, and the rewiring of regulatory circuitry. Resolving an fCNA's structure is a first step in deciphering the mechanisms of its genesis and the fCNA's subsequent biological consequences. We introduce a computational method, AmpliconReconstructor (AR), for integrating optical mapping (OM) of long DNA fragments (>150 kb) with next-generation sequencing (NGS) to resolve fCNAs at single-nucleotide resolution. AR uses an NGS-derived breakpoint graph alongside OM scaffolds to produce high-fidelity reconstructions. After validating its performance through multiple simulation strategies, AR reconstructed fCNAs in seven cancer cell lines to reveal the complex architecture of ecDNA, a breakage-fusion-bridge and other complex rearrangements. By reconstructing the rearrangement signatures associated with an fCNA's generative mechanism, AR enables a more thorough understanding of the origins of fCNAs.
Collapse
Affiliation(s)
- Jens Luebeck
- Bioinformatics and Systems Biology Graduate Program, University of California at San Diego, La Jolla, CA, 92093, USA
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Ceyda Coruh
- Plant Molecular and Cellular Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Siavash R Dehkordi
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Joshua T Lange
- Biomedical Sciences Graduate Program, University of California at San Diego, La Jolla, CA, 92093, USA
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Kristen M Turner
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Viraj Deshpande
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Dave A Pai
- Bionano Genomics, Inc., San Diego, CA, 92121, USA
| | - Chao Zhang
- Bioinformatics and Systems Biology Graduate Program, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Utkrisht Rajkumar
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Julie A Law
- Plant Molecular and Cellular Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Paul S Mischel
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, University of California at San Diego, La Jolla, CA, 92093, USA
- Department of Pathology, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
225
|
Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE, Bosworth C, Armstrong J, Tigyi K, Maurer N, Koren S, Sedlazeck FJ, Marschall T, Mayes S, Costa V, Zook JM, Liu KJ, Kilburn D, Sorensen M, Munson KM, Vollger MR, Monlong J, Garrison E, Eichler EE, Salama S, Haussler D, Green RE, Akeson M, Phillippy A, Miga KH, Carnevali P, Jain M, Paten B. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol 2020; 38:1044-1053. [PMID: 32686750 PMCID: PMC7483855 DOI: 10.1038/s41587-020-0503-6] [Citation(s) in RCA: 240] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Accepted: 03/26/2020] [Indexed: 01/05/2023]
Abstract
De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.
Collapse
Affiliation(s)
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | - Hugh E Olsen
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | | | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | | | | | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katy M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Erik Garrison
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Evan E Eichler
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sofie Salama
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | | | - Mark Akeson
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Adam Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Miten Jain
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
| | | |
Collapse
|
226
|
Meggendorfer M, Walter W, Haferlach T. WGS and WTS in leukaemia: A tool for diagnostics? Best Pract Res Clin Haematol 2020; 33:101190. [DOI: 10.1016/j.beha.2020.101190] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 05/27/2020] [Indexed: 12/20/2022]
|
227
|
Shiny-SoSV: A web-based performance calculator for somatic structural variant detection. PLoS One 2020; 15:e0238108. [PMID: 32853264 PMCID: PMC7451576 DOI: 10.1371/journal.pone.0238108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/10/2020] [Indexed: 11/19/2022] Open
Abstract
Somatic structural variants are an important contributor to cancer development and evolution. Accurate detection of these complex variants from whole genome sequencing data is influenced by a multitude of parameters. However, there are currently no tools for guiding study design nor are there applications that could predict the performance of somatic structural variant detection. To address this gap, we developed Shiny-SoSV, a user-friendly web-based calculator for determining the impact of common variables on the sensitivity, precision and F1 score of somatic structural variant detection, including choice of variant detection tool, sequencing depth of coverage, variant allele fraction, and variant breakpoint resolution. Using simulation studies, we determined singular and combinatoric effects of these variables, modelled the results using a generalised additive model, allowing structural variant detection performance to be predicted for any combination of predictors. Shiny-SoSV provides an interactive and visual platform for users to easily compare individual and combined impact of different parameters. It predicts the performance of a proposed study design, on somatic structural variant detection, prior to the commencement of benchwork. Shiny-SoSV is freely available at https://hcpcg.shinyapps.io/Shiny-SoSV with accompanying user’s guide and example use-cases.
Collapse
|
228
|
Detection of non-targeted transgenes by whole-genome resequencing for gene-doping control. Gene Ther 2020; 28:199-205. [PMID: 32770095 DOI: 10.1038/s41434-020-00185-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 02/07/2023]
Abstract
Gene doping has raised concerns in human and equestrian sports and the horseracing industry. There are two possible types of gene doping in the sports and racing industry: (1) administration of a gene-doping substance to postnatal animals and (2) generation of genetically engineered animals by modifying eggs. In this study, we aimed to identify genetically engineered animals by whole-genome resequencing (WGR) for gene-doping control. Transgenic cell lines, in which the erythropoietin gene (EPO) cDNA form was inserted into the genome of horse fibroblasts, were constructed as a model of genetically modified horse. Genome-wide screening of non-targeted transgenes was performed to find structural variation using DELLY based on split-read and paired-end algorithms and Control-FREEC based on read-depth algorithm. We detected the EPO transgene as an intron deletion in the WGR data by the split-read algorithm of DELLY. In addition, single-nucleotide polymorphisms and insertions/deletions artificially introduced in the EPO transgene were identified by WGR. Therefore, genome-wide screening using WGR can contribute to gene-doping control even if the targets are unknown. This is the first study to detect transgenes as intron deletions for gene-doping detection.
Collapse
|
229
|
Genomics of Clinal Local Adaptation in Pinus sylvestris Under Continuous Environmental and Spatial Genetic Setting. G3-GENES GENOMES GENETICS 2020; 10:2683-2696. [PMID: 32546502 PMCID: PMC7407466 DOI: 10.1534/g3.120.401285] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Understanding the consequences of local adaptation at the genomic diversity is a central goal in evolutionary genetics of natural populations. In species with large continuous geographical distributions the phenotypic signal of local adaptation is frequently clear, but the genetic basis often remains elusive. We examined the patterns of genetic diversity in Pinus sylvestris, a keystone species in many Eurasian ecosystems with a huge distribution range and decades of forestry research showing that it is locally adapted to the vast range of environmental conditions. Making P. sylvestris an even more attractive subject of local adaptation study, population structure has been shown to be weak previously and in this study. However, little is known about the molecular genetic basis of adaptation, as the massive size of gymnosperm genomes has prevented large scale genomic surveys. We generated a both geographically and genomically extensive dataset using a targeted sequencing approach. By applying divergence-based and landscape genomics methods we identified several loci contributing to local adaptation, but only few with large allele frequency changes across latitude. We also discovered a very large (ca. 300 Mbp) putative inversion potentially under selection, which to our knowledge is the first such discovery in conifers. Our results call for more detailed analysis of structural variation in relation to genomic basis of local adaptation, emphasize the lack of large effect loci contributing to local adaptation in the coding regions and thus point out the need for more attention toward multi-locus analysis of polygenic adaptation.
Collapse
|
230
|
Ver Donck F, Downes K, Freson K. Strengths and limitations of high-throughput sequencing for the diagnosis of inherited bleeding and platelet disorders. J Thromb Haemost 2020; 18:1839-1845. [PMID: 32521110 DOI: 10.1111/jth.14945] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 05/27/2020] [Accepted: 05/31/2020] [Indexed: 12/23/2022]
Abstract
Inherited bleeding and platelet disorders (BPD) are highly heterogeneous and their diagnosis involves a combination of clinical investigations, laboratory tests, and genetic screening. This review will outline some of the challenges that geneticists and experts in clinical hemostasis face when implementing high-throughput sequencing (HTS) for patient care. We will provide an overview of the strengths and limitations of the different HTS techniques that can be used to diagnose BPD. An HTS test is cost-efficient and expected to increase the diagnostic rate with a possibility to detect unexpected diagnoses and decrease the turnaround time to diagnose patients. On the other hand, technical shortcomings, variant interpretation difficulties, and ethical issues related to HTS for BPD will also be documented. Delivering a genetic diagnosis to patients is highly desirable to improve clinical management and allow family counseling, but making incorrect assumptions about variants and providing insufficient information to patients before initiating the test could be harmful. Data-sharing and improved HTS guidelines are essential to limit these major drawbacks of HTS.
Collapse
Affiliation(s)
- Fabienne Ver Donck
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, Leuven, Belgium
| | - Kate Downes
- East Midlands and East of England Genomics Laboratory Hub, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, Leuven, Belgium
| |
Collapse
|
231
|
Rebollo R, Galvão-Ferrarini M, Gagnier L, Zhang Y, Ferraj A, Beck CR, Lorincz MC, Mager DL. Inter-Strain Epigenomic Profiling Reveals a Candidate IAP Master Copy in C3H Mice. Viruses 2020; 12:v12070783. [PMID: 32708087 PMCID: PMC7411935 DOI: 10.3390/v12070783] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 07/03/2020] [Accepted: 07/13/2020] [Indexed: 12/15/2022] Open
Abstract
Insertions of endogenous retroviruses cause a significant fraction of mutations in inbred mice but not all strains are equally susceptible. Notably, most new Intracisternal A particle (IAP) ERV mutagenic insertions have occurred in C3H mice. We show here that strain-specific insertional polymorphic IAPs accumulate faster in C3H/HeJ mice, relative to other sequenced strains, and that IAP transcript levels are higher in C3H/HeJ embryonic stem (ES) cells compared to other ES cells. To investigate the mechanism for high IAP activity in C3H mice, we identified 61 IAP copies in C3H/HeJ ES cells enriched with H3K4me3 (a mark of active promoters) and, among those tested, all are unmethylated in C3H/HeJ ES cells. Notably, 13 of the 61 are specific to C3H/HeJ and are members of the non-autonomous 1Δ1 IAP subfamily that is responsible for nearly all new insertions in C3H. One copy is full length with intact open reading frames and hence potentially capable of providing proteins in trans to other 1Δ1 elements. This potential “master copy” is present in other strains, including 129, but its 5’ long terminal repeat (LTR) is methylated in 129 ES cells. Thus, the unusual IAP activity in C3H may be due to reduced epigenetic repression coupled with the presence of a master copy.
Collapse
Affiliation(s)
- Rita Rebollo
- Terry Fox Laboratory, British Columbia Cancer, Vancouver, BC V5Z1L3, Canada; (L.G.); (Y.Z.)
- University of Lyon, INSA-Lyon, INRA, BF2i, UMR0203, F-69621 Villeurbanne, France;
- Correspondence: (R.R.); (D.L.M.)
| | | | - Liane Gagnier
- Terry Fox Laboratory, British Columbia Cancer, Vancouver, BC V5Z1L3, Canada; (L.G.); (Y.Z.)
| | - Ying Zhang
- Terry Fox Laboratory, British Columbia Cancer, Vancouver, BC V5Z1L3, Canada; (L.G.); (Y.Z.)
| | - Ardian Ferraj
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA; (A.F.); (C.R.B.)
| | - Christine R. Beck
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06030, USA; (A.F.); (C.R.B.)
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Matthew C. Lorincz
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T1Z3, Canada;
| | - Dixie L. Mager
- Terry Fox Laboratory, British Columbia Cancer, Vancouver, BC V5Z1L3, Canada; (L.G.); (Y.Z.)
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T1Z3, Canada;
- Correspondence: (R.R.); (D.L.M.)
| |
Collapse
|
232
|
Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, Chen Y, Hurles ME, Tyler-Smith C, Xue Y. Population Structure, Stratification, and Introgression of Human Structural Variation. Cell 2020; 182:189-199.e15. [PMID: 32531199 PMCID: PMC7369638 DOI: 10.1016/j.cell.2020.05.024] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/04/2020] [Accepted: 05/12/2020] [Indexed: 02/07/2023]
Abstract
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
Collapse
Affiliation(s)
| | - Anders Bergström
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; The Francis Crick Institute, London NW1 1AT, UK
| | | | | | - Beiyuan Fu
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Alistair S Dunham
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; EMBL-EBI, Hinxton CB10 1SD, UK
| | - Yuan Chen
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | | | | | - Yali Xue
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK.
| |
Collapse
|
233
|
Mehawej C, Khayat CD, Hamdan N, Chouery E, Platt CD. A family history of SCID and unrevealing WES: An approach to management and guidance of patients. Clin Immunol 2020; 218:108520. [PMID: 32629161 DOI: 10.1016/j.clim.2020.108520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 06/12/2020] [Accepted: 06/27/2020] [Indexed: 10/23/2022]
Abstract
Severe Combined Immunodeficiency (SCID) is a genetically heterogeneous group of disorders characterized by severe T cell lymphopenia and defective T and B cell function. Without prompt diagnosis and early intervention, patients with SCID typically die from infection within the first year of life. Advances in molecular genetics have led to rapid and efficient diagnosis of SCID cases, particularly when paired with newborn screening. However, some cases remain unsolved, and this is of particular relevance to families that plan to have more children. Here we report a patient who died from complications of SCID in whom whole exome sequencing failed to reveal a candidate variant. We describe how Sanger sequencing of parents was used to study the genomic regions that were poorly covered by WES, and how immune phenotyping results were used in the setting of genetic counseling.
Collapse
Affiliation(s)
- Cybel Mehawej
- Medical Genetics Unit, Faculty of Medicine, Saint Joseph University, Beirut, Lebanon.
| | | | - Nadine Hamdan
- Medical Genetics Unit, Faculty of Medicine, Saint Joseph University, Beirut, Lebanon
| | - Eliane Chouery
- Medical Genetics Unit, Faculty of Medicine, Saint Joseph University, Beirut, Lebanon
| | - Craig D Platt
- Division of Immunology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
234
|
Abstract
Atrial fibrillation is a common heart rhythm disorder that leads to an increased risk for stroke and heart failure. Atrial fibrillation is a complex disease with both environmental and genetic risk factors that contribute to the arrhythmia. Over the last decade, rapid progress has been made in identifying the genetic basis for this common condition. In this review, we provide an overview of the primary types of genetic analyses performed for atrial fibrillation, including linkage studies, genome-wide association studies, and studies of rare coding variation. With these results in mind, we aim to highlighting the existing knowledge gaps and future directions for atrial fibrillation genetics research.
Collapse
Affiliation(s)
- Carolina Roselli
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, MA, USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen Groningen, the Netherlands
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen Groningen, the Netherlands
| | - Patrick T. Ellinor
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Cardiac Arrhythmia Service, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
235
|
Jakubosky D, Smith EN, D'Antonio M, Jan Bonder M, Young Greenwald WW, D'Antonio-Chronowska A, Matsui H, Stegle O, Montgomery SB, DeBoever C, Frazer KA. Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nat Commun 2020; 11:2928. [PMID: 32522985 PMCID: PMC7287045 DOI: 10.1038/s41467-020-16481-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 05/05/2020] [Indexed: 02/07/2023] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42×) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.
Collapse
Affiliation(s)
- David Jakubosky
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, 92093-0419, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093-0419, USA
| | - Erin N Smith
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Matteo D'Antonio
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | | | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Stephen B Montgomery
- Department of Pathology, Stanford University, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Christopher DeBoever
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Kelly A Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
236
|
Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H, Watts NA, Solomonson M, O'Donnell-Luria A, Baumann A, Munshi R, Walker M, Whelan CW, Huang Y, Brookings T, Sharpe T, Stone MR, Valkanas E, Fu J, Tiao G, Laricchia KM, Ruano-Rubio V, Stevens C, Gupta N, Cusick C, Margolin L, Taylor KD, Lin HJ, Rich SS, Post WS, Chen YDI, Rotter JI, Nusbaum C, Philippakis A, Lander E, Gabriel S, Neale BM, Kathiresan S, Daly MJ, Banks E, MacArthur DG, Talkowski ME. A structural variation reference for medical and population genetics. Nature 2020; 581:444-451. [PMID: 32461652 PMCID: PMC7334194 DOI: 10.1038/s41586-020-2287-8] [Citation(s) in RCA: 521] [Impact Index Per Article: 130.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 03/31/2020] [Indexed: 12/16/2022]
Abstract
Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening. A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
Collapse
Affiliation(s)
- Ryan L Collins
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Jessica Alföldi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Amit V Khera
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Chelsea Lowther
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Laura D Gauthier
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Harold Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Alexander Baumann
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ruchi Munshi
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Walker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Yongqing Huang
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ted Brookings
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ted Sharpe
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew R Stone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Elise Valkanas
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Jack Fu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Kristen M Laricchia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | | | - Christine Stevens
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Caroline Cusick
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lauren Margolin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Henry J Lin
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Wendy S Post
- Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Chad Nusbaum
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Cellarity Inc., Cambridge, MA, USA
| | - Anthony Philippakis
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Biology, MIT, Cambridge, MA, USA
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Medicine, Harvard Medical School, Boston, MA, USA.,Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric Banks
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.,Department of Medicine, Harvard Medical School, Boston, MA, USA.,Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia.,Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Australia
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. .,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. .,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
237
|
Sevim Bayrak C, Itan Y. Identifying disease-causing mutations in genomes of single patients by computational approaches. Hum Genet 2020; 139:769-776. [PMID: 32405658 DOI: 10.1007/s00439-020-02179-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 05/05/2020] [Indexed: 12/11/2022]
Abstract
Over the last decade next generation sequencing (NGS) has been extensively used to identify new pathogenic mutations and genes causing rare genetic diseases. The efficient analyses of NGS data is not trivial and requires a technically and biologically rigorous pipeline that addresses data quality control, accurate variant filtration to minimize false positives and false negatives, and prioritization of the remaining genes based on disease genomics and physiological knowledge. This review provides a pipeline including all these steps, describes popular software for each step of the analysis, and proposes a general framework for the identification of causal mutations and genes in individual patients of rare genetic diseases.
Collapse
Affiliation(s)
- Cigdem Sevim Bayrak
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, US.
| | - Yuval Itan
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, US.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, US
| |
Collapse
|
238
|
Lamb HJ, Ross EM, Nguyen LT, Lyons RE, Moore SS, Hayes BJ. Characterization of the poll allele in Brahman cattle using long-read Oxford Nanopore sequencing. J Anim Sci 2020; 98:5823688. [PMID: 32318708 DOI: 10.1093/jas/skaa127] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/20/2020] [Indexed: 12/13/2022] Open
Abstract
Brahman cattle (Bos indicus) are well adapted to thrive in tropical environments. Since their introduction to Australia in 1933, Brahman's ability to grow and reproduce on marginal lands has proven their value in the tropical beef industry. The poll phenotype, which describes the absence of horns, has become desirable in the cattle industry for animal welfare and handler safety concerns. The poll locus has been mapped to chromosome one. Four alleles, each a copy number variant, have been reported across this locus in B. indicus and Bos taurus. However, the causative mutation in Brahman cattle has not been fully characterized. Oxford Nanopore Technologies' minION sequencer was used to sequence four homozygous poll (PcPc), four homozygous horned (pp), and three heterozygous (Pcp) Brahmans to characterize the poll allele in Brahman cattle. A total of 98 Gb were sequenced and an average coverage of 3.33X was achieved. Read N50 scores ranged from 9.9 to 19 kb. Examination of the mapped reads across the poll locus revealed insertions approximately 200 bp in length in the poll animals that were absent in the horned animals. These results are consistent with the Celtic poll allele, a 212-bp duplication that replaces 10 bp. This provides direct evidence that the Celtic poll allele is segregating in the Australian Brahman population.
Collapse
Affiliation(s)
- Harrison J Lamb
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, Australia
| | - Elizabeth M Ross
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, Australia
| | - Loan T Nguyen
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, Australia
| | - Russell E Lyons
- Neogen Australasia, University of Queensland, Gatton, QLD, Australia
| | - Stephen S Moore
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, Australia
| | - Ben J Hayes
- Centre for Animal Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
239
|
Köster J, Dijkstra LJ, Marschall T, Schönhuth A. Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery. Genome Biol 2020; 21:98. [PMID: 32345333 PMCID: PMC7187499 DOI: 10.1186/s13059-020-01993-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/09/2020] [Indexed: 02/08/2023] Open
Affiliation(s)
- Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany. .,Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA. .,Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.
| | - Louis J Dijkstra
- Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.,Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Alexander Schönhuth
- Centrum Wiskunde & Informatica, Amsterdam, The Netherlands. .,Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
240
|
Mérot C, Oomen RA, Tigano A, Wellenreuther M. A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol Evol 2020; 35:561-572. [PMID: 32521241 DOI: 10.1016/j.tree.2020.03.002] [Citation(s) in RCA: 145] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 02/25/2020] [Accepted: 03/03/2020] [Indexed: 12/12/2022]
Abstract
Structural genomic variants (SVs) are ubiquitous and play a major role in adaptation and speciation. Yet, comparative and population genomics have focused predominantly on gene duplications and large-effect inversions. The lack of a common framework for studying all SVs is hampering progress towards a more systematic assessment of their evolutionary significance. Here we (i) review how different types of SVs affect ecological and evolutionary processes; (ii) suggest unifying definitions and recommendations for future studies; and (iii) provide a roadmap for the integration of SVs in ecoevolutionary studies. In doing so, we lay the foundation for population genomics, theoretical, and experimental approaches to understand how the full spectrum of SVs impacts ecological and evolutionary processes.
Collapse
Affiliation(s)
- Claire Mérot
- Université Laval, Institut de Biologie Intégrative des Systèmes, 1030 Avenue de la Médecine, G1V 0A6, Québec, QC, Canada.
| | - Rebekah A Oomen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Blindernveien 31, 0371 Oslo, Norway; Centre for Coastal Research, University of Agder, Universitetsveien 25, 4630 Kristiansand, Norway.
| | - Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA; Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, USA.
| | - Maren Wellenreuther
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand; The New Zealand Institute for Plant & Food Research Ltd, Nelson, New Zealand.
| |
Collapse
|
241
|
Shin W, Kim H, Oh DY, Kim DH, Han K. Quantitative evaluation of the molecular marker using droplet digital PCR. Genomics Inform 2020; 18:e4. [PMID: 32224837 PMCID: PMC7120350 DOI: 10.5808/gi.2020.18.1.e4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 12/05/2019] [Indexed: 11/24/2022] Open
Abstract
Transposable elements (TEs) constitute approximately half of Bovine genome. They can be a powerful species-specific marker without regression mutations by the structure variation (SV) at the time of genomic evolution. In a previous study, we identified the Hanwoo-specific SV that was generated by a TE–association deletion event using traditional PCR method and Sanger sequencing validation. It could be used as a molecular marker to distinguish different cattle breeds (i.e., Hanwoo vs. Holstein). However, PCR is defective with various final copy quantifications from every sample. Thus, we applied to the droplet digital PCR (ddPCR) platform for accurate quantitative detection of the Hanwoo-specific SV. Although samples have low allele frequency variation within Hanwoo population, ddPCR could perform high sensitive detection with absolute quantification. We aimed to use ddPCR for more accurate quantification than PCR. We suggest that the ddPCR platform is applicable for the quantitative evaluation of molecular markers.
Collapse
Affiliation(s)
- Wonseok Shin
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Korea.,Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan 31116, Korea
| | - Haneul Kim
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Korea.,Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan 31116, Korea
| | - Dong-Yep Oh
- Livestock Research Institute, Yeongju 36052, Korea
| | - Dong Hee Kim
- Department of Anesthesiology and Pain Management, Dankook University College of Medicine, Cheonan 31116, Korea
| | - Kyudong Han
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan 31116, Korea.,Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan 31116, Korea
| |
Collapse
|
242
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
243
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
244
|
Liu Y, Zhang M, Sun J, Chang W, Sun M, Zhang S, Wu J. Comparison of multiple algorithms to reliably detect structural variants in pears. BMC Genomics 2020; 21:61. [PMID: 31959124 PMCID: PMC6972009 DOI: 10.1186/s12864-020-6455-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Accepted: 01/07/2020] [Indexed: 01/01/2023] Open
Abstract
Background Structural variations (SVs) have been reported to play an important role in genetic diversity and trait regulation. Many computer algorithms detecting SVs have recently been developed, but the use of multiple algorithms to detect high-confidence SVs has not been studied. The most suitable sequencing depth for detecting SVs in pear is also not known. Results In this study, a pipeline to detect SVs using next-generation and long-read sequencing data was constructed. The performances of seven types of SV detection software using next-generation sequencing (NGS) data and two types of software using long-read sequencing data (SVIM and Sniffles), which are based on different algorithms, were compared. Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (> 90%). When the results from multiple SV detection tools were combined, the SVs identified by both MetaSV and IMR/DENOM, which use NGS data, were more accurate than those identified by both SVIM and Sniffles, with mean accuracies of 98.7 and 96.5%, respectively. The software packages using long-read sequencing data required fewer CPU cores and less memory and ran faster than those using NGS data. In addition, according to the performances of assembly-based algorithms using NGS data, we found that a sequencing depth of 50× is appropriate for detecting SVs in the pear genome. Conclusion This study provides strong evidence that more than one SV detection software package, each based on a different algorithm, should be used to detect SVs with higher confidence, and that long-read sequencing data are better than NGS data for SV detection. The SV detection pipeline that we have established will facilitate the study of diversity in other crops.
Collapse
Affiliation(s)
- Yueyuan Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Mingyue Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jieying Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Wenjing Chang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Manyi Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
245
|
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ 2020; 8:e8214. [PMID: 31934500 PMCID: PMC6951283 DOI: 10.7717/peerj.8214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 11/14/2019] [Indexed: 12/19/2022] Open
Abstract
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
Collapse
Affiliation(s)
| | | | | | - Luca Santuari
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Carl Shneider
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Wigard P Kloosterman
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| |
Collapse
|
246
|
Salk JJ, Kennedy SR. Next-Generation Genotoxicology: Using Modern Sequencing Technologies to Assess Somatic Mutagenesis and Cancer Risk. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2020; 61:135-151. [PMID: 31595553 PMCID: PMC7003768 DOI: 10.1002/em.22342] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 09/20/2019] [Accepted: 09/25/2019] [Indexed: 05/09/2023]
Abstract
Mutations have a profound effect on human health, particularly through an increased risk of carcinogenesis and genetic disease. The strong correlation between mutagenesis and carcinogenesis has been a driving force behind genotoxicity research for more than 50 years. The stochastic and infrequent nature of mutagenesis makes it challenging to observe and to study. Indeed, decades have been spent developing increasingly sophisticated assays and methods to study these low-frequency genetic errors, in hopes of better predicting which chemicals may be carcinogens, understanding their mode of action, and informing guidelines to prevent undue human exposure. While effective, widely used genetic selection-based technologies have a number of limitations that have hampered major advancements in the field of genotoxicity. Emerging new tools, in the form of enhanced next-generation sequencing platforms and methods, are changing this paradigm. In this review, we discuss rapidly evolving sequencing tools and technologies, such as error-corrected sequencing and single cell analysis, which we anticipate will fundamentally reshape the field. In addition, we consider a variety emerging applications for these new technologies, including the detection of DNA adducts, inference of mutational processes based on genomic site and local sequence contexts, and evaluation of genome engineering fidelity, as well as other cutting-edge challenges for the next 50 years of environmental and molecular mutagenesis research. Environ. Mol. Mutagen. 61:135-151, 2020. © 2019 The Authors. Environmental and Molecular Mutagenesis published by Wiley Periodicals, Inc. on behalf of Environmental Mutagen Society.
Collapse
Affiliation(s)
- Jesse J. Salk
- Department of Medicine, Division of Medical OncologyUniversity of Washington School of MedicineSeattleWashington
- TwinStrand BiosciencesSeattleWashington
| | - Scott R. Kennedy
- Department of PathologyUniversity of WashingtonSeattleWashington
| |
Collapse
|
247
|
Abstract
PURPOSE OF REVIEW An update is presented regarding neural tube defects (NTDs) including spina bifida and anencephaly, which are among the most common serious birth defects world-wide. Decades of research suggest that no single factor is responsible for neurulation failure, but rather NTDs arise from a complex interplay of disrupted gene regulatory networks, environmental influences and epigenetic regulation. A comprehensive understanding of these dynamics is critical to advance NTD research and prevention. RECENT FINDINGS Next-generation sequencing has ushered in a new era of genomic insight toward NTD pathophysiology, implicating novel gene associations with human NTD risk. Ongoing research is moving from a candidate gene approach toward genome-wide, systems-based investigations that are starting to uncover genetic and epigenetic complexities that underlie NTD manifestation. SUMMARY Neural tube closure is critical for the formation of the human brain and spinal cord. Broader, more all-inclusive perspectives are emerging to identify the genetic determinants of human NTDs.
Collapse
Affiliation(s)
- Paul Wolujewicz
- Center for Neurogenetics, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York, USA
| | | |
Collapse
|
248
|
Lindstrand A, Eisfeldt J, Pettersson M, Carvalho CMB, Kvarnung M, Grigelioniene G, Anderlid BM, Bjerin O, Gustavsson P, Hammarsjö A, Georgii-Hemming P, Iwarsson E, Johansson-Soller M, Lagerstedt-Robinson K, Lieden A, Magnusson M, Martin M, Malmgren H, Nordenskjöld M, Norling A, Sahlin E, Stranneheim H, Tham E, Wincent J, Ygberg S, Wedell A, Wirta V, Nordgren A, Lundin J, Nilsson D. From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability. Genome Med 2019; 11:68. [PMID: 31694722 PMCID: PMC6836550 DOI: 10.1186/s13073-019-0675-1] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022] Open
Abstract
Background Since different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test. Methods We analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n = 68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n = 156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure. Results First, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850 kb (min 500 bp, max 155 Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (> 10 kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data. Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively. Conclusion The overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.
Collapse
Affiliation(s)
- Anna Lindstrand
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden. .,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden. .,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.
| | - Jesper Eisfeldt
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Maria Pettersson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Malin Kvarnung
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Giedre Grigelioniene
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Britt-Marie Anderlid
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Olof Bjerin
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Peter Gustavsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Hammarsjö
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | | | - Erik Iwarsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Maria Johansson-Soller
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristina Lagerstedt-Robinson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Agne Lieden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Måns Magnusson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Marcel Martin
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Helena Malmgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Magnus Nordenskjöld
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ameli Norling
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Ellika Sahlin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Henrik Stranneheim
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Emma Tham
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Josephine Wincent
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Sofia Ygberg
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Wedell
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Valtteri Wirta
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden.,Science for Life Laboratory, Department of Microbiology, Tumor and Cell biology, Karolinska Institutet, Stockholm, Sweden
| | - Ann Nordgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Johanna Lundin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Daniel Nilsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
249
|
Wijfjes RY, Smit S, de Ridder D. Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data. BMC Genomics 2019; 20:818. [PMID: 31699036 PMCID: PMC6836508 DOI: 10.1186/s12864-019-6153-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 09/30/2019] [Indexed: 01/27/2023] Open
Abstract
Background Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. Results To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. Conclusions Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Raúl Y Wijfjes
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands.
| | - Sandra Smit
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
250
|
Robinson MD, Vitek O. Benchmarking comes of age. Genome Biol 2019; 20:205. [PMID: 31597556 PMCID: PMC6785869 DOI: 10.1186/s13059-019-1846-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 11/25/2022] Open
Affiliation(s)
- Mark D Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland.
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|