151
|
James KN, Lau M, Shayan K, Lenberg J, Mardach R, Ignacio R, Halbach J, Choi L, Kumar S, Ellsworth KA. Expanding the genotypic spectrum of ACTG2-related visceral myopathy. Cold Spring Harb Mol Case Stud 2021; 7:mcs.a006085. [PMID: 33883208 PMCID: PMC8208046 DOI: 10.1101/mcs.a006085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open
Abstract
Visceral myopathies (VMs) encompass a spectrum of disorders characterized by chronic disruption of gastrointestinal function, with or without urinary system involvement. Pathogenic missense variation in smooth muscle γ-actin gene (ACTG2) is associated with autosomal dominant VM. Whole-genome sequencing of an infant presenting with chronic intestinal pseudo-obstruction revealed a homozygous 187 bp (c.589_613 + 163del188) deletion spanning the exon 6–intron 6 boundary within ACTG2. The patient's clinical course was marked by prolonged hospitalizations, multiple surgeries, and intermittent total parenteral nutrition dependence. This case supports the emerging understanding of allelic heterogeneity in ACTG2-related VM, in which both biallelic and monoallelic variants in ACTG2 are associated with gastrointestinal dysfunction of similar severity and overlapped clinical presentation. Moreover, it illustrates the clinical utility of rapid whole-genome sequencing, which can comprehensively and precisely detect different types of genomic variants including small deletions, leading to guidance of clinical care decisions.
Collapse
Affiliation(s)
- Kiely N James
- Rady Children's Institute for Genomic Medicine, San Diego, California 92123, USA
| | - Megan Lau
- UC San Diego School of Medicine, La Jolla, California 92093, USA
| | - Katayoon Shayan
- Pathology Department, Hepatology and Nutrition, Rady Children's Hospital, San Diego, California 92123, USA
| | - Jerica Lenberg
- Rady Children's Institute for Genomic Medicine, San Diego, California 92123, USA
| | - Rebecca Mardach
- Rady Children's Institute for Genomic Medicine, San Diego, California 92123, USA
| | - Romeo Ignacio
- Division of Pediatric Surgery, Hepatology and Nutrition, Rady Children's Hospital, San Diego, California 92123, USA
| | - Jonathan Halbach
- Division of Pediatric Surgery, Hepatology and Nutrition, Rady Children's Hospital, San Diego, California 92123, USA
| | - Lillian Choi
- Division of Gastroenterology, Hepatology and Nutrition, Rady Children's Hospital, San Diego, California 92123, USA
| | - Soma Kumar
- Division of Gastroenterology, Hepatology and Nutrition, Rady Children's Hospital, San Diego, California 92123, USA
| | | |
Collapse
|
152
|
Guo M, Li S, Zhou Y, Li M, Wen Z. Comparative Analysis for the Performance of Long-Read-Based Structural Variation Detection Pipelines in Tandem Repeat Regions. Front Pharmacol 2021; 12:658072. [PMID: 34163355 PMCID: PMC8215501 DOI: 10.3389/fphar.2021.658072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/14/2021] [Indexed: 12/04/2022] Open
Abstract
There has been growing recognition of the vital links between structural variations (SVs) and diverse diseases. Research suggests that, with much longer DNA fragments and abundant contextual information, long-read technologies have advantages in SV detection even in complex repetitive regions. So far, several pipelines for calling SVs from long-read sequencing data have been proposed and used in human genome research. However, the performance of these pipelines is still lack of deep exploration and adequate comparison. In this study, we comprehensively evaluated the performance of three commonly used long-read SV detection pipelines, namely PBSV, Sniffles and PBHoney, especially the performance on detecting the SVs in tandem repeat regions (TRRs). Evaluated by using a robust benchmark for germline SV detection as the gold standard, we thoroughly estimated the precision, recall and F1 score of insertions and deletions detected by the pipelines. Our results revealed that all these pipelines clearly exhibited better performance outside TRRs than that in TRRs. The F1 scores of Sniffles in and outside TRRs were 0.60 and 0.76, respectively. The performance of PBSV was similar to that of Sniffles, and was generally higher than that of PBHoney. In conclusion, our findings can be benefit for choosing the appropriate pipelines in real practice and are good complementary to the application of long-read sequencing technologies in the research of rare diseases.
Collapse
Affiliation(s)
- Mingkun Guo
- College of Chemistry, Sichuan University, Chengdu, China
| | - Shihai Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Yifan Zhou
- College of Chemistry, Sichuan University, Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, China.,Medical Big Data Center, Sichuan University, Chengdu, China
| |
Collapse
|
153
|
Kobren SN, Baldridge D, Velinder M, Krier JB, LeBlanc K, Esteves C, Pusey BN, Züchner S, Blue E, Lee H, Huang A, Bastarache L, Bican A, Cogan J, Marwaha S, Alkelai A, Murdock DR, Liu P, Wegner DJ, Paul AJ, Sunyaev SR, Kohane IS. Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases. Genet Med 2021; 23:1075-1085. [PMID: 33580225 PMCID: PMC8187147 DOI: 10.1038/s41436-020-01084-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/31/2022] Open
Abstract
PURPOSE Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. METHODS We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. RESULTS We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. CONCLUSION The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.
Collapse
Affiliation(s)
| | - Dustin Baldridge
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Matt Velinder
- Center for Genomic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Joel B Krier
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kimberly LeBlanc
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Cecilia Esteves
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Barbara N Pusey
- National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH), Bethesda, MD, USA
| | - Stephan Züchner
- Department of Human Genetics and Hussman Institute for Human Genomics, University of Miami Health System, Miami, FL, USA
| | - Elizabeth Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Hane Lee
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
| | - Alden Huang
- Department of Human Genetics, David Geffen School of Medicine at the University of California, Los Angeles, CA, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Anna Bican
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joy Cogan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Shruti Marwaha
- Stanford Center for Undiagnosed Diseases, Stanford, CA, USA
| | - Anna Alkelai
- Institute for Genomic Medicine, Columbia University Medical Center, New York City, NY, USA
| | - David R Murdock
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Baylor Genetics, Houston, TX, USA
| | - Daniel J Wegner
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Alexander J Paul
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
154
|
Tang H, He Z. Advances and challenges in quantitative delineation of the genetic architecture of complex traits. QUANTITATIVE BIOLOGY 2021; 9:168-184. [PMID: 35492964 PMCID: PMC9053444 DOI: 10.15302/j-qb-021-0249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases. Results This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted. Conclusion GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Hua Tang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
155
|
Nandolo W, Mészáros G, Wurzinger M, Banda LJ, Gondwe TN, Mulindwa HA, Nakimbugwe HN, Clark EL, Woodward-Greene MJ, Liu M, Liu GE, Van Tassell CP, Rosen BD, Sölkner J. Detection of copy number variants in African goats using whole genome sequence data. BMC Genomics 2021; 22:398. [PMID: 34051743 PMCID: PMC8164248 DOI: 10.1186/s12864-021-07703-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 05/11/2021] [Indexed: 12/21/2022] Open
Abstract
Background Copy number variations (CNV) are a significant source of variation in the genome and are therefore essential to the understanding of genetic characterization. The aim of this study was to develop a fine-scaled copy number variation map for African goats. We used sequence data from multiple breeds and from multiple African countries. Results A total of 253,553 CNV (244,876 deletions and 8677 duplications) were identified, corresponding to an overall average of 1393 CNV per animal. The mean CNV length was 3.3 kb, with a median of 1.3 kb. There was substantial differentiation between the populations for some CNV, suggestive of the effect of population-specific selective pressures. A total of 6231 global CNV regions (CNVR) were found across all animals, representing 59.2 Mb (2.4%) of the goat genome. About 1.6% of the CNVR were present in all 34 breeds and 28.7% were present in all 5 geographical areas across Africa, where animals had been sampled. The CNVR had genes that were highly enriched in important biological functions, molecular functions, and cellular components including retrograde endocannabinoid signaling, glutamatergic synapse and circadian entrainment. Conclusions This study presents the first fine CNV map of African goat based on WGS data and adds to the growing body of knowledge on the genetic characterization of goats. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07703-1.
Collapse
Affiliation(s)
- Wilson Nandolo
- University of Natural Resources and Life Sciences, Vienna, Austria.,Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | - Gábor Mészáros
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Maria Wurzinger
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Liveness J Banda
- Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | - Timothy N Gondwe
- Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | | | | | - Emily L Clark
- The Roslin Institute, University of Edinburgh, Edinburgh, Scotland, UK
| | - M Jennifer Woodward-Greene
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.,National Agricultural Library, USDA-ARS, Beltsville, MD, USA
| | - Mei Liu
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - George E Liu
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.
| | - Johann Sölkner
- University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
156
|
Mechanisms of Immune Escape and Resistance to Checkpoint Inhibitor Therapies in Mismatch Repair Deficient Metastatic Colorectal Cancers. Cancers (Basel) 2021; 13:cancers13112638. [PMID: 34072037 PMCID: PMC8199207 DOI: 10.3390/cancers13112638] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 05/20/2021] [Accepted: 05/21/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary A subset of colorectal cancers (CRCs) is characterized by a mismatch repair deficiency that is frequently associated with microsatellite instability (MSI). The compromised DNA repair machinery leads to the accumulation of tumor neoantigens affecting the sensitivity of MSI metastatic CRC to immune checkpoint inhibitors (CPIs), both upfront and in later lines of treatment. However, up to 30% of MSI CRCs exhibit primary resistance to frontline immune based therapy, and an additional subset develops acquired resistance. Here, we first discuss the clinical and molecular features of MSI CRCs and then we review how the loss of antigenicity, immunogenicity, and a hostile tumor microenvironment could influence primary and acquired resistance to CPIs. Finally, we describe strategies to improve the outcome of MSI CRC patients upon CPI treatment. Abstract Immune checkpoint inhibitors (CPIs) represent an effective therapeutic strategy for several different types of solid tumors and are remarkably effective in mismatch repair deficient (MMRd) tumors, including colorectal cancer (CRC). The prevalent view is that the elevated and dynamic neoantigen burden associated with the mutator phenotype of MMRd fosters enhanced immune surveillance of these cancers. In addition, recent findings suggest that MMRd tumors have increased cytosolic DNA, which triggers the cGAS STING pathway, leading to interferon-mediated immune response. Unfortunately, approximately 30% of MMRd CRC exhibit primary resistance to CPIs, while a substantial fraction of tumors acquires resistance after an initial benefit. Profiling of clinical samples and preclinical studies suggests that alterations in the Wnt and the JAK-STAT signaling pathways are associated with refractoriness to CPIs. Intriguingly, mutations in the antigen presentation machinery, such as loss of MHC or Beta-2 microglobulin (B2M), are implicated in initial immune evasion but do not impair response to CPIs. In this review, we outline how understanding the mechanistic basis of immune evasion and CPI resistance in MMRd CRC provides the rationale for innovative strategies to increase the subset of patients benefiting from CPIs.
Collapse
|
157
|
Belyeu JR, Chowdhury M, Brown J, Pedersen BS, Cormier MJ, Quinlan AR, Layer RM. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol 2021; 22:161. [PMID: 34034781 PMCID: PMC8145817 DOI: 10.1186/s13059-021-02380-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 05/10/2021] [Indexed: 12/15/2022] Open
Abstract
Visual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot .
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Murad Chowdhury
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Michael J Cormier
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.
- Department of Computer Science, University of Colorado, Boulder, CO, USA.
| |
Collapse
|
158
|
Valle-Inclan JE, Stangl C, de Jong AC, van Dessel LF, van Roosmalen MJ, Helmijr JCA, Renkens I, Janssen R, de Blank S, de Witte CJ, Martens JWM, Jansen MPHM, Lolkema MP, Kloosterman WP. Optimizing Nanopore sequencing-based detection of structural variants enables individualized circulating tumor DNA-based disease monitoring in cancer patients. Genome Med 2021; 13:86. [PMID: 34006333 PMCID: PMC8130429 DOI: 10.1186/s13073-021-00899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 04/27/2021] [Indexed: 12/18/2022] Open
Abstract
Here, we describe a novel approach for rapid discovery of a set of tumor-specific genomic structural variants (SVs), based on a combination of low coverage cancer genome sequencing using Oxford Nanopore with an SV calling and filtering pipeline. We applied the method to tumor samples of high-grade ovarian and prostate cancer patients and validated on average ten somatic SVs per patient with breakpoint-spanning PCR mini-amplicons. These SVs could be quantified in ctDNA samples of patients with metastatic prostate cancer using a digital PCR assay. The results suggest that SV dynamics correlate with and may improve existing treatment-response biomarkers such as PSA. https://github.com/UMCUGenetics/SHARC .
Collapse
Affiliation(s)
- Jose Espejo Valle-Inclan
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Christina Stangl
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Division of Molecular Oncology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Anouk C de Jong
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Lisanne F van Dessel
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Markus J van Roosmalen
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jean C A Helmijr
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Ivo Renkens
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Roel Janssen
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Sam de Blank
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Chris J de Witte
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - John W M Martens
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Maurice P H M Jansen
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Martijn P Lolkema
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands.
| | - Wigard P Kloosterman
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands. .,Cyclomics, Utrecht, The Netherlands. .,Frame Cancer Therapeutics, Amsterdam, The Netherlands.
| |
Collapse
|
159
|
Cameron DL, Jacobs N, Roepman P, Priestley P, Cuppen E, Papenfuss AT. VIRUSBreakend: Viral Integration Recognition Using Single Breakends. Bioinformatics 2021; 37:3115-3119. [PMID: 33973999 PMCID: PMC8504616 DOI: 10.1093/bioinformatics/btab343] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/25/2021] [Accepted: 05/03/2021] [Indexed: 12/17/2022] Open
Abstract
Motivation Integration of viruses into infected host cell DNA can cause DNA damage and disrupt genes. Recent cost reductions and growth of whole genome sequencing has produced a wealth of data in which viral presence and integration detection is possible. While key research and clinically relevant insights can be uncovered, existing software has not achieved widespread adoption, limited in part due to high computational costs, the inability to detect a wide range of viruses, as well as precision and sensitivity. Results Here, we describe VIRUSBreakend, a high-speed tool that identifies viral DNA presence and genomic integration. It utilizes single breakends, breakpoints in which only one side can be unambiguously placed, in a novel virus-centric variant calling and assembly approach to identify viral integrations with high sensitivity and a near-zero false discovery rate. VIRUSBreakend detects viral integrations anywhere in the host genome including regions such as centromeres and telomeres unable to be called by existing tools. Applying VIRUSBreakend to a large metastatic cancer cohort, we demonstrate that it can reliably detect clinically relevant viral presence and integration including HPV, HBV, MCPyV, EBV and HHV-8. Availability and implementation VIRUSBreakend is part of the Genomic Rearrangement IDentification Software Suite (GRIDSS). It is available under a GPLv3 license from https://github.com/PapenfussLab/VIRUSBreakend. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel L Cameron
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, University of Melbourne, Australia.,Hartwig Medical Foundation Australia, Sydney, Australia
| | - Nina Jacobs
- Hartwig Medical Foundation, Amsterdam, The Netherlands
| | - Paul Roepman
- Hartwig Medical Foundation, Amsterdam, The Netherlands
| | | | - Edwin Cuppen
- Hartwig Medical Foundation, Amsterdam, The Netherlands.,Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Anthony T Papenfuss
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, University of Melbourne, Australia.,Peter MacCallum Cancer Centre, Melbourne, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Australia
| |
Collapse
|
160
|
Liao Z, Zhang X, Zhang S, Lin Z, Zhang X, Ming R. Structural variations in papaya genomes. BMC Genomics 2021; 22:335. [PMID: 33971825 PMCID: PMC8108470 DOI: 10.1186/s12864-021-07665-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/29/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Structural variations (SVs) are a type of mutations that have not been widely detected in plant genomes and studies in animals have shown their role in the process of domestication. An in-depth study of SVs will help us to further understand the impact of SVs on the phenotype and environmental adaptability during papaya domestication and provide genomic resources for the development of molecular markers. RESULTS We detected a total of 8083 SVs, including 5260 deletions, 552 tandem duplications and 2271 insertions with deletion being the predominant, indicating the universality of deletion in the evolution of papaya genome. The distribution of these SVs is non-random in each chromosome. A total of 1794 genes overlaps with SV, of which 1350 genes are expressed in at least one tissue. The weighted correlation network analysis (WGCNA) of these expressed genes reveals co-expression relationship between SVs-genes and different tissues, and functional enrichment analysis shows their role in biological growth and environmental responses. We also identified some domesticated SVs genes related to environmental adaptability, sexual reproduction, and important agronomic traits during the domestication of papaya. Analysis of artificially selected copy number variant genes (CNV-genes) also revealed genes associated with plant growth and environmental stress. CONCLUSIONS SVs played an indispensable role in the process of papaya domestication, especially in the reproduction traits of hermaphrodite plants. The detection of genome-wide SVs and CNV-genes between cultivated gynodioecious populations and wild dioecious populations provides a reference for further understanding of the evolution process from male to hermaphrodite in papaya.
Collapse
Affiliation(s)
- Zhenyang Liao
- College of Life Science, Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Xunxiao Zhang
- College of Life Science, Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Shengcheng Zhang
- College of Life Science, Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Zhicong Lin
- College of Life Science, Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Xingtan Zhang
- College of Life Science, Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China.
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
161
|
Zhao X, Collins RL, Lee WP, Weber AM, Jun Y, Zhu Q, Weisburd B, Huang Y, Audano PA, Wang H, Walker M, Lowther C, Fu J, Gerstein MB, Devine SE, Marschall T, Korbel JO, Eichler EE, Chaisson MJP, Lee C, Mills RE, Brand H, Talkowski ME. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am J Hum Genet 2021; 108:919-928. [PMID: 33789087 PMCID: PMC8206509 DOI: 10.1016/j.ajhg.2021.03.014] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/12/2021] [Indexed: 12/13/2022] Open
Abstract
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Collapse
Affiliation(s)
- Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Wan-Ping Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexandra M Weber
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Yukyung Jun
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Yongqing Huang
- Data Sciences Platform, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Mark Walker
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Mark B Gerstein
- Yale University Medical School, Computational Biology and Bioinformatics Program, New Haven, CT 06520, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Department of Graduate Studies - Life Sciences, Ewha Womans University, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, South Korea; Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an 710061, Shaanxi, People's Republic of China
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
162
|
Abe‐Hatano C, Iida A, Kosugi S, Momozawa Y, Terao C, Ishikawa K, Okubo M, Hachiya Y, Nishida H, Nakamura K, Miyata R, Murakami C, Takahashi K, Hoshino K, Sakamoto H, Ohta S, Kubota M, Takeshita E, Ishiyama A, Nakagawa E, Sasaki M, Kato M, Matsumoto N, Kamatani Y, Kubo M, Takahashi Y, Natsume J, Inoue K, Goto Y. Whole genome sequencing of 45 Japanese patients with intellectual disability. Am J Med Genet A 2021; 185:1468-1480. [PMID: 33624935 PMCID: PMC8247954 DOI: 10.1002/ajmg.a.62138] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/23/2020] [Accepted: 02/06/2021] [Indexed: 02/06/2023]
Abstract
Intellectual disability (ID) is characterized by significant limitations in both intellectual functioning and adaptive behaviors, originating before the age of 18 years. However, the genetic etiologies of ID are still incompletely elucidated due to the wide range of clinical and genetic heterogeneity. Whole genome sequencing (WGS) has been applied as a single-step clinical diagnostic tool for ID because it detects genetic variations with a wide range of resolution from single nucleotide variants (SNVs) to structural variants (SVs). To explore the causative genes for ID, we employed WGS in 45 patients from 44 unrelated Japanese families and performed a stepwise screening approach focusing on the coding variants in the genes. Here, we report 12 pathogenic and likely pathogenic variants: seven heterozygous variants of ADNP, SATB2, ANKRD11, PTEN, TCF4, SPAST, and KCNA2, three hemizygous variants of SMS, SLC6A8, and IQSEC2, and one homozygous variant in AGTPBP1. Of these, four were considered novel. Furthermore, a novel 76 kb deletion containing exons 1 and 2 in DYRK1A was identified. We confirmed the clinical and genetic heterogeneity and high frequency of de novo causative variants (8/12, 66.7%). This is the first report of WGS analysis in Japanese patients with ID. Our results would provide insight into the correlation between novel variants and expanded phenotypes of the disease.
Collapse
Affiliation(s)
- Chihiro Abe‐Hatano
- Department of Mental Retardation and Birth Defect ResearchNational Institute of Neuroscience, National Center of Neurology and PsychiatryTokyoJapan
- Department of PediatricsNagoya University Graduate School of MedicineAichiJapan
| | - Aritoshi Iida
- Medical Genome CenterNational Center of Neurology and PsychiatryTokyoJapan
| | - Shunichi Kosugi
- Laboratory for Statistical and Translational GeneticsRIKEN Center for Integrative Medical SciencesKanagawaJapan
| | - Yukihide Momozawa
- Laboratory for Genotyping DevelopmentRIKEN Center for Integrative Medical SciencesKanagawaJapan
| | - Chikashi Terao
- Laboratory for Statistical and Translational GeneticsRIKEN Center for Integrative Medical SciencesKanagawaJapan
- Clinical Research CenterShizuoka General HospitalShizuokaJapan
- The Department of Applied GeneticsThe School of Pharmaceutical Sciences, University of ShizuokaShizuokaJapan
| | - Keiko Ishikawa
- Medical Genome CenterNational Center of Neurology and PsychiatryTokyoJapan
| | - Mariko Okubo
- Department of Child NeurologyNational Center Hospital, National Center of Neurology and PsychiatryTokyoJapan
| | - Yasuo Hachiya
- Department of NeuropediatricsTokyo Metropolitan Neurological HospitalTokyoJapan
| | - Hiroya Nishida
- Department of NeuropediatricsTokyo Metropolitan Neurological HospitalTokyoJapan
| | - Kazuyuki Nakamura
- Department of PediatricsYamagata University Faculty of MedicineYamagataJapan
| | - Rie Miyata
- Department of PediatricsTokyo‐Kita Medical CenterTokyoJapan
| | - Chie Murakami
- Department of PediatricsKitakyusyu Children's Rehabilitation CenterFukuokaJapan
| | - Kan Takahashi
- Department of PediatricsOme Municipal General HospitalTokyoJapan
| | - Kyoko Hoshino
- Department of PediatricsMinami Wakayama Medical CenterWakayamaJapan
| | - Haruko Sakamoto
- Department of NeonatologyJapanese Red Cross Osaka HospitalOsakaJapan
| | - Sayaka Ohta
- Division of NeurologyNational Center for Child Health and DevelopmentTokyoJapan
| | - Masaya Kubota
- Division of NeurologyNational Center for Child Health and DevelopmentTokyoJapan
| | - Eri Takeshita
- Department of Child NeurologyNational Center Hospital, National Center of Neurology and PsychiatryTokyoJapan
| | - Akihiko Ishiyama
- Department of Child NeurologyNational Center Hospital, National Center of Neurology and PsychiatryTokyoJapan
| | - Eiji Nakagawa
- Department of Child NeurologyNational Center Hospital, National Center of Neurology and PsychiatryTokyoJapan
| | - Masayuki Sasaki
- Department of Child NeurologyNational Center Hospital, National Center of Neurology and PsychiatryTokyoJapan
| | - Mitsuhiro Kato
- Department of PediatricsYamagata University Faculty of MedicineYamagataJapan
- Department of PediatricsShowa University School of MedicineTokyoJapan
| | - Naomichi Matsumoto
- Department of Human GeneticsYokohama City University Graduate School of MedicineKanagawaJapan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational GeneticsRIKEN Center for Integrative Medical SciencesKanagawaJapan
- Department of Computational Biology and Medical SciencesGraduate School of Frontier Sciences, The University of TokyoTokyoJapan
| | - Michiaki Kubo
- Laboratory for Genotyping DevelopmentRIKEN Center for Integrative Medical SciencesKanagawaJapan
| | - Yoshiyuki Takahashi
- Department of PediatricsNagoya University Graduate School of MedicineAichiJapan
| | - Jun Natsume
- Department of PediatricsNagoya University Graduate School of MedicineAichiJapan
| | - Ken Inoue
- Department of Mental Retardation and Birth Defect ResearchNational Institute of Neuroscience, National Center of Neurology and PsychiatryTokyoJapan
| | - Yu‐Ichi Goto
- Department of Mental Retardation and Birth Defect ResearchNational Institute of Neuroscience, National Center of Neurology and PsychiatryTokyoJapan
- Medical Genome CenterNational Center of Neurology and PsychiatryTokyoJapan
| |
Collapse
|
163
|
Yan C, He J, Luo J, Wang J, Zhang G, Luo H. SIns: A Novel Insertion Detection Approach Based on Soft-Clipped Reads. Front Genet 2021; 12:665812. [PMID: 33995493 PMCID: PMC8120196 DOI: 10.3389/fgene.2021.665812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
As a common type of structural variation, an insertion refers to the addition of a DNA sequence into an individual genome and is usually associated with some inherited diseases. In recent years, many methods have been proposed for detecting insertions. However, the accurate calling of insertions is also a challenging task. In this study, we propose a novel insertion detection approach based on soft-clipped reads, which is called SIns. First, based on the alignments between paired reads and the reference genome, SIns extracts breakpoints from soft-clipped reads and determines insertion locations. The insert size information about paired reads is then further clustered to determine the genotype, and SIns subsequently adopts Minia to assemble the insertion sequences. Experimental results show that SIns can achieve better performance than other methods in terms of the F-score value for simulated and true datasets.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junyi He
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
164
|
Sonehara K, Okada Y. Obelisc: an identical-by-descent mapping tool based on SNP streak. Bioinformatics 2021; 36:5567-5570. [PMID: 33135050 PMCID: PMC8023673 DOI: 10.1093/bioinformatics/btaa940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 09/30/2020] [Accepted: 10/26/2020] [Indexed: 11/19/2022] Open
Abstract
Motivation Genetic linkage analysis has made a huge contribution to the genetic mapping of Mendelian diseases. However, most previously available linkage analysis methods have limited applicability. Since parametric linkage analysis requires predefined model of inheritance with a fixed set of parameters, it is inapplicable without fully structured pedigree information. Furthermore, the analytical results are dependent on the specification of model parameters. While non-parametric linkage analysis can avoid these problems, the runs of homozygosity (ROH) mapping, a widely used non-parametric linkage analysis method, can only deal with recessive inheritance. The implementation of non-parametric linkage analyses capable of dealing with both dominant and recessive inheritance has been required. Results We have developed the Obelisc (Observational linkage scan), a flexibly applicable user-friendly non-parametric linkage analysis tool, which also provides an intuitive visualization of the analytical results. Obelisc is based on the SNP streak approach, which does not require any predefined inheritance model with parameters. In contrast to the ROH mapping, the SNP streak approach is applicable to both dominant and recessive traits. To illustrate the performance of Obelisc, we generated a pseudo-pedigree from the publicly available BioBank Japan Project genome-wide genotype dataset (n > 180 000). By applying Obelisc to this pseudo-pedigree, we successfully identified the regions with inherited identical-by-descent haplotypes shared among the members of the pseudo-pedigree, which was validated by the population-based haplotype phasing approach. Availability and implementation Obelisc is feely available at https://github.com/qsonehara/Obelisc as a python package with example datasets. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kyuto Sonehara
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan.,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita 565-0871, Japan.,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
| |
Collapse
|
165
|
Feng Y, McQuillan MA, Tishkoff SA. Evolutionary genetics of skin pigmentation in African populations. Hum Mol Genet 2021; 30:R88-R97. [PMID: 33438000 PMCID: PMC8117430 DOI: 10.1093/hmg/ddab007] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 01/07/2021] [Accepted: 01/07/2021] [Indexed: 12/14/2022] Open
Abstract
Skin color is a highly heritable human trait, and global variation in skin pigmentation has been shaped by natural selection, migration and admixture. Ethnically diverse African populations harbor extremely high levels of genetic and phenotypic diversity, and skin pigmentation varies widely across Africa. Recent genome-wide genetic studies of skin pigmentation in African populations have advanced our understanding of pigmentation biology and human evolutionary history. For example, novel roles in skin pigmentation for loci near MFSD12 and DDB1 have recently been identified in African populations. However, due to an underrepresentation of Africans in human genetic studies, there is still much to learn about the evolutionary genetics of skin pigmentation. Here, we summarize recent progress in skin pigmentation genetics in Africans and discuss the importance of including more ethnically diverse African populations in future genetic studies. In addition, we discuss methods for functional validation of adaptive variants related to skin pigmentation.
Collapse
Affiliation(s)
- Yuanqing Feng
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael A McQuillan
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sarah A Tishkoff
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
166
|
Göktay M, Fulgione A, Hancock AM. A New Catalog of Structural Variants in 1,301 A. thaliana Lines from Africa, Eurasia, and North America Reveals a Signature of Balancing Selection at Defense Response Genes. Mol Biol Evol 2021; 38:1498-1511. [PMID: 33247723 PMCID: PMC8042739 DOI: 10.1093/molbev/msaa309] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Genomic variation in the model plant Arabidopsis thaliana has been extensively used to understand evolutionary processes in natural populations, mainly focusing on single-nucleotide polymorphisms. Conversely, structural variation has been largely ignored in spite of its potential to dramatically affect phenotype. Here, we identify 155,440 indels and structural variants ranging in size from 1 bp to 10 kb, including presence/absence variants (PAVs), inversions, and tandem duplications in 1,301 A. thaliana natural accessions from Morocco, Madeira, Europe, Asia, and North America. We show evidence for strong purifying selection on PAVs in genes, in particular for housekeeping genes and homeobox genes, and we find that PAVs are concentrated in defense-related genes (R-genes, secondary metabolites) and F-box genes. This implies the presence of a "core" genome underlying basic cellular processes and a "flexible" genome that includes genes that may be important in spatially or temporally varying selection. Further, we find an excess of intermediate frequency PAVs in defense response genes in nearly all populations studied, consistent with a history of balancing selection on this class of genes. Finally, we find that PAVs in genes involved in the cold requirement for flowering (vernalization) and drought response are strongly associated with temperature at the sites of origin.
Collapse
Affiliation(s)
- Mehmet Göktay
- Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Andrea Fulgione
- Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Angela M Hancock
- Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|
167
|
Belyeu JR, Brand H, Wang H, Zhao X, Pedersen BS, Feusier J, Gupta M, Nicholas TJ, Brown J, Baird L, Devlin B, Sanders SJ, Jorde LB, Talkowski ME, Quinlan AR. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families. Am J Hum Genet 2021; 108:597-607. [PMID: 33675682 PMCID: PMC8059337 DOI: 10.1016/j.ajhg.2021.02.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 02/12/2021] [Indexed: 01/05/2023] Open
Abstract
Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Julie Feusier
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - Meenal Gupta
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Lisa Baird
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA.
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
168
|
Vaughn JN, Korani W, Stein JC, Edwards JD, Peterson DG, Simpson SA, Youngblood RC, Grimwood J, Chougule K, Ware DH, McClung AM, Scheffler BE. Gene disruption by structural mutations drives selection in US rice breeding over the last century. PLoS Genet 2021; 17:e1009389. [PMID: 33735256 PMCID: PMC7971508 DOI: 10.1371/journal.pgen.1009389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/28/2021] [Indexed: 12/30/2022] Open
Abstract
The genetic basis of general plant vigor is of major interest to food producers, yet the trait is recalcitrant to genetic mapping because of the number of loci involved, their small effects, and linkage. Observations of heterosis in many crops suggests that recessive, malfunctioning versions of genes are a major cause of poor performance, yet we have little information on the mutational spectrum underlying these disruptions. To address this question, we generated a long-read assembly of a tropical japonica rice (Oryza sativa) variety, Carolina Gold, which allowed us to identify structural mutations (>50 bp) and orient them with respect to their ancestral state using the outgroup, Oryza glaberrima. Supporting prior work, we find substantial genome expansion in the sativa branch. While transposable elements (TEs) account for the largest share of size variation, the majority of events are not directly TE-mediated. Tandem duplications are the most common source of insertions and are highly enriched among 50-200bp mutations. To explore the relative impact of various mutational classes on crop fitness, we then track these structural events over the last century of US rice improvement using 101 resequenced varieties. Within this material, a pattern of temporary hybridization between medium and long-grain varieties was followed by recent divergence. During this long-term selection, structural mutations that impact gene exons have been removed at a greater rate than intronic indels and single-nucleotide mutations. These results support the use of ab initio estimates of mutational burden, based on structural data, as an orthogonal predictor in genomic selection. Some crop varieties have superior performance across years and environments. In hybrids, harmful mutations in one parent are masked by the ancestral alleles in the other parent, resulting in increased vigor. Unfortunately, these mutations are very difficult to identify precisely because, individually, they only have a small effect. In this study, we use long-read sequencing to characterize the entire mutational spectrum between two rice varieties. We then track these mutations through the last century of rice breeding. We show that large structural mutations in exons are selected against at a greater rate than any other mutational class. These findings illuminate the nature of deleterious alleles and will guide attempts to predict variety vigor based solely on genomic information.
Collapse
Affiliation(s)
- Justin N. Vaughn
- USDA-ARS, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi, United States of America
- University of Georgia, Athens, Institute of Plant Breeding, Genetics, and Genomics, Athens, Georgia, United States of America
- * E-mail: (JNV); (BES)
| | - Walid Korani
- University of Georgia, Athens, Institute of Plant Breeding, Genetics, and Genomics, Athens, Georgia, United States of America
| | - Joshua C. Stein
- Cold Spring Harbor Laboratory, Cold Springs Harbor, New York, United States of America
| | - Jeremy D. Edwards
- USDA-ARS, Dale Bumpers National Rice Research Center, Stuttgart, Arkansas, United States of America
| | - Daniel G. Peterson
- Mississippi State University, Institute for Genomics, Biocomputing & Biotechnology, Starkville, Mississippi, United States of America
| | - Sheron A. Simpson
- USDA-ARS, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi, United States of America
| | - Ramey C. Youngblood
- Mississippi State University, Institute for Genomics, Biocomputing & Biotechnology, Starkville, Mississippi, United States of America
| | - Jane Grimwood
- Hudson-Alpha Institute for Biotechnology, Huntsville, Alabama, United States of America
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Springs Harbor, New York, United States of America
| | - Doreen H. Ware
- Cold Spring Harbor Laboratory, Cold Springs Harbor, New York, United States of America
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| | - Anna M. McClung
- USDA-ARS, Dale Bumpers National Rice Research Center, Stuttgart, Arkansas, United States of America
| | - Brian E. Scheffler
- USDA-ARS, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi, United States of America
- * E-mail: (JNV); (BES)
| |
Collapse
|
169
|
Savarese M, Välipakka S, Johari M, Hackman P, Udd B. Is Gene-Size an Issue for the Diagnosis of Skeletal Muscle Disorders? J Neuromuscul Dis 2021; 7:203-216. [PMID: 32176652 PMCID: PMC7369045 DOI: 10.3233/jnd-190459] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Human genes have a variable length. Those having a coding sequence of extraordinary length and a high number of exons were almost impossible to sequence using the traditional Sanger-based gene-by-gene approach. High-throughput sequencing has partly overcome the size-related technical issues, enabling a straightforward, rapid and relatively inexpensive analysis of large genes. Several large genes (e.g. TTN, NEB, RYR1, DMD) are recognized as disease-causing in patients with skeletal muscle diseases. However, because of their sheer size, the clinical interpretation of variants in these genes is probably the most challenging aspect of the high-throughput genetic investigation in the field of skeletal muscle diseases. The main aim of this review is to discuss the technical and interpretative issues related to the diagnostic investigation of large genes and to reflect upon the current state of the art and the future advancements in the field.
Collapse
Affiliation(s)
- Marco Savarese
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Salla Välipakka
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Mridul Johari
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Peter Hackman
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland
| | - Bjarne Udd
- Folkhälsan Research Center, Helsinki, Finland.,Department of Medical Genetics, Medicum, University of Helsinki, Helsinki, Finland.,Neuromuscular Research Center, Tampere University and University Hospital, Tampere, Finland.,Department of Neurology, Vaasa Central Hospital, Vaasa, Finland
| |
Collapse
|
170
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
171
|
Robust Benchmark Structural Variant Calls of An Asian Using the State-of-art Long Fragment Sequencing Technologies. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 20:192-204. [PMID: 33662625 PMCID: PMC9510867 DOI: 10.1016/j.gpb.2020.10.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 09/17/2020] [Accepted: 12/26/2020] [Indexed: 12/12/2022]
Abstract
The importance of structural variants (SVs) for human phenotypes and diseases is now recognized. Although a variety of SV detection platforms and strategies that vary in sensitivity and specificity have been developed, few benchmarking procedures are available to confidently assess their performances in biological and clinical research. To facilitate the validation and application of these SV detection approaches, we established an Asian reference material by characterizing the genome of an Epstein-Barr virus (EBV)-immortalized B lymphocyte line along with identified benchmark regions and high-confidence SV calls. We established a high-confidence SV callset with 8938 SVs by integrating four alignment-based SV callers, including 109× Pacific Biosciences (PacBio) continuous long reads (CLRs), 22× PacBio circular consensus sequencing (CCS) reads, 104× Oxford Nanopore Technologies (ONT) long reads, and 114× Bionano optical mapping platform, and one de novo assembly-based SV caller using CCS reads. A total of 544 randomly selected SVs were validated by PCR amplification and Sanger sequencing, demonstrating the robustness of our SV calls. Combining trio-binning-based haplotype assemblies, we established an SV benchmark for identifying false negatives and false positives by constructing the continuous high-confidence regions (CHCRs), which covered 1.46 gigabase pairs (Gb) and 6882 SVs supported by at least one diploid haplotype assembly. Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical research.
Collapse
|
172
|
Weisheit I, Kroeger JA, Malik R, Wefers B, Lichtner P, Wurst W, Dichgans M, Paquet D. Simple and reliable detection of CRISPR-induced on-target effects by qgPCR and SNP genotyping. Nat Protoc 2021; 16:1714-1739. [PMID: 33597771 DOI: 10.1038/s41596-020-00481-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 12/09/2020] [Indexed: 01/31/2023]
Abstract
The recent CRISPR revolution has provided researchers with powerful tools to perform genome editing in a variety of organisms. However, recent reports indicate widespread occurrence of unintended CRISPR-induced on-target effects (OnTEs) at the edited site in mice and human induced pluripotent stem cells (iPSCs) that escape standard quality controls. By altering gene expression of targeted or neighbouring genes, OnTEs can severely affect phenotypes of CRISPR-edited cells and organisms and thus lead to data misinterpretation, which can undermine the reliability of CRISPR-based studies. Here we describe a broadly applicable framework for detecting OnTEs in genome-edited cells and organisms after non-homologous end joining-mediated and homology-directed repair-mediated editing. Our protocol enables identification of OnTEs such as large deletions, large insertions, rearrangements or loss of heterozygosity (LOH). This is achieved by subjecting genomic DNA first to quantitative genotyping PCR (qgPCR), which determines the number of intact alleles at the target site using the same PCR amplicon that has been optimized for genotyping. This combination of genotyping and quantitation makes it possible to exclude clones with monoallelic OnTEs and hemizygous editing, which are often mischaracterized as correctly edited in standard Sanger sequencing. Second, occurrence of LOH around the edited locus is detected by genotyping neighbouring single-nucleotide polymorphisms (SNPs), using either a Sanger sequencing-based method or SNP microarrays. All steps are optimized to maximize simplicity and minimize cost to promote wide dissemination and applicability across the field. The entire protocol from genomic DNA extraction to OnTE exclusion can be performed in 6-9 d.
Collapse
Affiliation(s)
- Isabel Weisheit
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany
- Graduate School of Systemic Neurosciences, LMU Munich, Planegg-Martinsried, Germany
| | - Joseph A Kroeger
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany
- Graduate School of Systemic Neurosciences, LMU Munich, Planegg-Martinsried, Germany
| | - Rainer Malik
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany
| | - Benedikt Wefers
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany
- Institute of Developmental Genetics (IDG), HelmholtzZentrum München, Neuherberg, Germany
| | - Peter Lichtner
- Core Facility NGS, HelmholtzZentrum München, Neuherberg, Germany
| | - Wolfgang Wurst
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany
- Institute of Developmental Genetics (IDG), HelmholtzZentrum München, Neuherberg, Germany
- Technische Universität München-Weihenstephan, Neuherberg, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Martin Dichgans
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany
- German Center for Neurodegenerative Diseases (DZNE) Munich, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Dominik Paquet
- Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany.
- Graduate School of Systemic Neurosciences, LMU Munich, Planegg-Martinsried, Germany.
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany.
| |
Collapse
|
173
|
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing. Methods Mol Biol 2021; 2243:1-25. [PMID: 33606250 DOI: 10.1007/978-1-0716-1103-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is now feasible to interrogate all major classes of variation in an individual across the entire genome for less than $1000 USD. While the generation of patient sequence information using these technologies has become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant information required for clinical report generation. We discuss methods to detect each variant class and describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we describe a sample workflow for synthesizing large amount of genetic information into concise clinical reports.
Collapse
|
174
|
Wang C, Wallerman O, Arendt ML, Sundström E, Karlsson Å, Nordin J, Mäkeläinen S, Pielberg GR, Hanson J, Ohlsson Å, Saellström S, Rönnberg H, Ljungvall I, Häggström J, Bergström TF, Hedhammar Å, Meadows JRS, Lindblad-Toh K. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun Biol 2021; 4:185. [PMID: 33568770 PMCID: PMC7875987 DOI: 10.1038/s42003-021-01698-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/17/2020] [Indexed: 12/13/2022] Open
Abstract
We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine "dark" regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.
Collapse
Affiliation(s)
- Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | - Ola Wallerman
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Maja-Louise Arendt
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Veterinary Clinical Sciences, University of Copenhagen, Frederiksberg D, Denmark
| | - Elisabeth Sundström
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessika Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Suvi Mäkeläinen
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Gerli Rosengren Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jeanette Hanson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åsa Ohlsson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Sara Saellström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Henrik Rönnberg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ingrid Ljungvall
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jens Häggström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Tomas F Bergström
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
175
|
Xu P, Chen Y, Gao M, Chong Z. ClipSV: improving structural variation detection by read extension, spliced alignment and tree-based decision rules. NAR Genom Bioinform 2021; 3:lqab003. [PMID: 33554118 PMCID: PMC7850140 DOI: 10.1093/nargab/lqab003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 12/02/2020] [Accepted: 01/05/2021] [Indexed: 11/14/2022] Open
Abstract
Structural variation (SV), which consists of genomic variation from 50 to millions of base pairs, confers considerable impacts on human diseases, complex traits and evolution. Accurately detecting SV is a fundamental step to characterize the features of individual genomes. Currently, several methods have been proposed to detect SVs using the next-generation sequencing (NGS) platform. However, due to the short length of sequencing reads and the complexity of SV content, the SV-detecting tools are still limited by low sensitivity, especially for insertion detection. In this study, we developed a novel tool, ClipSV, to improve SV discovery. ClipSV utilizes a read extension and spliced alignment approach to overcoming the limitation of read length. By reconstructing long sequences from SV-associated short reads, ClipSV discovers deletions and short insertions from the long sequence alignments. To comprehensively characterize insertions, ClipSV implements tree-based decision rules that can efficiently utilize SV-containing reads. Based on the evaluations of both simulated and real sequencing data, ClipSV exhibited an overall better performance compared to currently popular tools, especially for insertion detection. As NGS platform represents the mainstream sequencing capacity for routine genomic applications, we anticipate ClipSV will serve as an important tool for SV characterization in future genomic studies.
Collapse
Affiliation(s)
- Peng Xu
- Department of Genetics, the University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Yu Chen
- Department of Genetics, the University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Min Gao
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Zechen Chong
- Department of Genetics, the University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| |
Collapse
|
176
|
Ohori S, Tsuburaya RS, Kinoshita M, Miyagi E, Mizuguchi T, Mitsuhashi S, Frith MC, Matsumoto N. Long-read whole-genome sequencing identified a partial MBD5 deletion in an exome-negative patient with neurodevelopmental disorder. J Hum Genet 2021; 66:697-705. [PMID: 33510365 DOI: 10.1038/s10038-020-00893-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 11/02/2020] [Accepted: 11/09/2020] [Indexed: 12/14/2022]
Abstract
Whole-exome sequencing (WES) can detect not only single-nucleotide variants in causal genes, but also pathogenic copy-number variations using several methods. However, there may be overlooked pathogenic variations in the out of target genome regions of WES analysis (e.g., promoters), leaving many patients undiagnosed. Whole-genome sequencing (WGS) can potentially analyze such regions. We applied long-read nanopore WGS and our recently developed analysis pipeline "dnarrange" to a patient who was undiagnosed by trio-based WES analysis, and identified a heterozygous 97-kb deletion partially involving 5'-untranslated exons of MBD5, which was outside the WES target regions. The phenotype of the patient, a 32-year-old male, was consistent with haploinsufficiency of MBD5. The transcript level of MBD5 in the patient's lymphoblastoid cells was reduced. We therefore concluded that the partial MBD5 deletion is the culprit for this patient. Furthermore, we found other rare structural variations (SVs) in this patient, i.e., a large inversion and a retrotransposon insertion, which were not seen in 33 controls. Although we considered that they are benign SVs, this finding suggests that our pipeline using long-read WGS is useful for investigating various types of potentially pathogenic SVs. In conclusion, we identified a 97-kb deletion, which causes haploinsufficiency of MBD5 in a patient with neurodevelopmental disorder, demonstrating that long-read WGS is a powerful technique to discover pathogenic SVs.
Collapse
Affiliation(s)
- Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.,Department of Obstetrics and Gynecology, Yokohama City University School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa, 236-0004, Japan
| | - Rie S Tsuburaya
- Department of Pediatric Neurology, National Hospital Organization Utano National Hospital, 8 Ondoyamacho, Ukyo-ku, Kyoto, 616-8255, Japan
| | - Masako Kinoshita
- Department of Neurology, National Hospital Organization Utano National Hospital, 8 Ondoyamacho, Ukyo-ku, Kyoto, 616-8255, Japan
| | - Etsuko Miyagi
- Department of Obstetrics and Gynecology, Yokohama City University School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa, 236-0004, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Kashiwa-city, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
177
|
Vijg J, Dong X. Pathogenic Mechanisms of Somatic Mutation and Genome Mosaicism in Aging. Cell 2021; 182:12-23. [PMID: 32649873 DOI: 10.1016/j.cell.2020.06.024] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/03/2020] [Accepted: 06/11/2020] [Indexed: 12/17/2022]
Abstract
Age-related accumulation of postzygotic DNA mutations results in tissue genetic heterogeneity known as somatic mosaicism. Although implicated in aging as early as the 1950s, somatic mutations in normal tissue have been difficult to study because of their low allele fractions. With the recent emergence of cost-effective high-throughput sequencing down to the single-cell level, enormous progress has been made in our capability to quantitatively analyze somatic mutations in human tissue in relation to aging and disease. Here we first review how recent technological progress has opened up this field, providing the first broad sets of quantitative information on somatic mutations in vivo necessary to gain insight into their possible causal role in human aging and disease. We then propose three major mechanisms that can lead from accumulated de novo mutations across tissues to cell functional loss and human disease.
Collapse
Affiliation(s)
- Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, New York, NY 10461, USA; Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| | - Xiao Dong
- Department of Genetics, Albert Einstein College of Medicine, New York, NY 10461, USA
| |
Collapse
|
178
|
Abstract
Gains and losses of large segments of genomic DNA, known as copy number variants (CNVs) gained considerable interest in clinical diagnostics lately, as particular forms may lead to inherited genetic diseases. In recent decades, researchers developed a wide variety of cytogenetic and molecular methods with different detection capabilities to detect clinically relevant CNVs. In this review, we summarize methodological progress from conventional approaches to current state of the art techniques capable of detecting CNVs from a few bases up to several megabases. Although the recent rapid progress of sequencing methods has enabled precise detection of CNVs, determining their functional effect on cellular and whole-body physiology remains a challenge. Here, we provide a comprehensive list of databases and bioinformatics tools that may serve as useful assets for researchers, laboratory diagnosticians, and clinical geneticists facing the challenge of CNV detection and interpretation.
Collapse
|
179
|
Oleksyk TK, Wolfsberger WW, Weber AM, Shchubelka K, Oleksyk OT, Levchuk O, Patrus A, Lazar N, Castro-Marquez SO, Hasynets Y, Boldyzhar P, Neymet M, Urbanovych A, Stakhovska V, Malyar K, Chervyakova S, Podoroha O, Kovalchuk N, Rodriguez-Flores JL, Zhou W, Medley S, Battistuzzi F, Liu R, Hou Y, Chen S, Yang H, Yeager M, Dean M, Mills RE, Smolanka V. Genome diversity in Ukraine. Gigascience 2021; 10:6079618. [PMID: 33438729 PMCID: PMC7804371 DOI: 10.1093/gigascience/giaa159] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/21/2020] [Accepted: 12/15/2020] [Indexed: 01/21/2023] Open
Abstract
Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.
Collapse
Affiliation(s)
- Taras K Oleksyk
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine.,Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Walter W Wolfsberger
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine.,Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Alexandra M Weber
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Khrystyna Shchubelka
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA.,Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| | - Olga T Oleksyk
- A. Novak Transcarpathian Regional Clinical Hospital, Uzhhorod 88000, Ukraine
| | | | | | | | - Stephanie O Castro-Marquez
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA.,Departamento de Biología, Universidad de Puerto Rico, Mayagüez, PR 00682, USA
| | - Yaroslava Hasynets
- Department of Biological Sciences, Uzhhorod National University, 32 Voloshyna Str., Uzhhorod 88000, Ukraine
| | - Patricia Boldyzhar
- Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| | - Mikhailo Neymet
- Velyka Kopanya Family Hospital, Transcarpatia 90330, Ukraine
| | | | | | - Kateryna Malyar
- I.I.Mechnikov Dnipro Regional Clinical Hospital, Dnipro 49000, Ukraine
| | | | | | - Natalia Kovalchuk
- Rivne Regional Specialized Hospital of Radiation Protection, Rivne 33028, Ukraine
| | | | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sarah Medley
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA
| | - Fabia Battistuzzi
- Department of Biological Sciences,Oakland University, Dodge Hall, 118 Library Dr., Rochester, MI 48309, USA
| | - Ryan Liu
- BGI Shenzhen, Shenzhen, 518083, China
| | - Yong Hou
- BGI Shenzhen, Shenzhen, 518083, China
| | - Siru Chen
- BGI Shenzhen, Shenzhen, 518083, China
| | | | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Volodymyr Smolanka
- Department of Medicine, Uzhhorod National University, Uzhhorod 88000, Ukraine
| |
Collapse
|
180
|
Bhattacharya S, Barseghyan H, Délot EC, Vilain E. nanotatoR: a tool for enhanced annotation of genomic structural variants. BMC Genomics 2021; 22:10. [PMID: 33407088 PMCID: PMC7789800 DOI: 10.1186/s12864-020-07182-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/22/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient's phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR's annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
Collapse
Affiliation(s)
- Surajit Bhattacharya
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA.,Bionano Genomics Inc, San Diego, CA, 92121, USA
| | - Emmanuèle C Délot
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA.,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA. .,Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA.
| |
Collapse
|
181
|
Emerging molecular subtypes and therapeutic targets in B-cell precursor acute lymphoblastic leukemia. Front Med 2021; 15:347-371. [PMID: 33400146 DOI: 10.1007/s11684-020-0821-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 09/04/2020] [Indexed: 12/13/2022]
Abstract
B-cell precursor acute lymphoblastic leukemia (BCP-ALL) is characterized by genetic alterations with high heterogeneity. Precise subtypes with distinct genomic and/or gene expression patterns have been recently revealed using high-throughput sequencing technology. Most of these profiles are associated with recurrent non-overlapping rearrangements or hotspot point mutations that are analogous to the established subtypes, such as DUX4 rearrangements, MEF2D rearrangements, ZNF384/ZNF362 rearrangements, NUTM1 rearrangements, BCL2/MYC and/or BCL6 rearrangements, ETV6-RUNX1-like gene expression, PAX5alt (diverse PAX5 alterations, including rearrangements, intragenic amplifications, or mutations), and hotspot mutations PAX5 (p.Pro80Arg) with biallelic PAX5 alterations, IKZF1 (p.Asn159Tyr), and ZEB2 (p.His1038Arg). These molecular subtypes could be classified by gene expression patterns with RNA-seq technology. Refined molecular classification greatly improved the treatment strategy. Multiagent therapy regimens, including target inhibitors (e.g., imatinib), immunomodulators, monoclonal antibodies, and chimeric antigen receptor T-cell (CAR-T) therapy, are transforming the clinical practice from chemotherapy drugs to personalized medicine in the field of risk-directed disease management. We provide an update on our knowledge of emerging molecular subtypes and therapeutic targets in BCP-ALL.
Collapse
|
182
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021; 22:3. [PMID: 33397434 PMCID: PMC7780660 DOI: 10.1186/s13059-020-02224-8] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 12/07/2020] [Indexed: 01/13/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 USA
| |
Collapse
|
183
|
Della Coletta R, Qiu Y, Ou S, Hufford MB, Hirsch CN. How the pan-genome is changing crop genomics and improvement. Genome Biol 2021. [PMID: 33397434 DOI: 10.1186/s13059-020-02224-2228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
184
|
Qian Y, Li L, Sun Z, Liu J, Yuan W, Wang Z. A multi-omics view of the complex mechanism of vascular calcification. Biomed Pharmacother 2021; 135:111192. [PMID: 33401220 DOI: 10.1016/j.biopha.2020.111192] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/19/2020] [Accepted: 12/26/2020] [Indexed: 02/07/2023] Open
Abstract
Vascular calcification is a high incidence and high risk disease with increasing morbidity and high mortality, which is considered the consequence of smooth muscle cell transdifferentiation initiating the mechanism of accumulation of hydroxyl calcium phosphate. Vascular calcification is also thought to be strongly associated with poor outcomes in diabetes and chronic kidney disease. Numerous studies have been accomplished; however, the specific mechanism of the disease remains unclear. Development of the genome project enhanced the understanding of life science and has entered the post-genomic era resulting in a variety of omics techniques used in studies and a large amount of available data; thus, a new perspective on data analysis has been revealed. Omics has a broader perspective and is thus advantageous over a single pathway analysis in the study of complex vascular calcification mechanisms. This paper reviews in detail various omics studies including genomics, proteomics, transcriptomics, metabolomics and multiple group studies on vascular calcification. Advances and deficiencies in the use of omics to study vascular calcification are presented in a comprehensive view. We also review the methodology of the omics studies and omics data analysis and processing. In addition, the methodology and data processing presented here can be applied to other areas. An omics landscape perspective across the boundaries between genomics, transcriptomics, proteomics and metabolomics is used to examine the mechanisms of vascular calcification. The perspective combined with various technologies also provides a direction for the subsequent exploration of clinical significance.
Collapse
Affiliation(s)
- Yongjiang Qian
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Lihua Li
- Department of Pathology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Zhen Sun
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Jia Liu
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Wei Yuan
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China
| | - Zhongqun Wang
- Department of Cardiology, Affiliated Hospital of Jiangsu University, 212000, Zhenjiang, China.
| |
Collapse
|
185
|
Everhart S, Gambhir N, Stam R. Population Genomics of Filamentous Plant Pathogens-A Brief Overview of Research Questions, Approaches, and Pitfalls. PHYTOPATHOLOGY 2021; 111:12-22. [PMID: 33337245 DOI: 10.1094/phyto-11-20-0527-fi] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With ever-decreasing sequencing costs, research on the population biology of plant pathogens is transitioning from population genetics-using dozens of genetic markers or polymorphism data of several genes-to population genomics-using several hundred to tens of thousands of markers or whole-genome sequence data. The field of population genomics is characterized by rapid theoretical and methodological advances and by numerous steps and pitfalls in its technical and analytical workflow. In this article, we aim to provide a brief overview of topics relevant to the study of population genomics of filamentous plant pathogens and direct readers to more extensive reviews for in-depth understanding. We briefly discuss different types of population genomics-inspired research questions and give insights into the sampling strategies that can be used to answer such questions. We then consider different sequencing strategies, the various options available for data processing, and some of the currently available tools for population genomic data analysis. We conclude by highlighting some of the hurdles along the population genomic workflow, providing cautionary warnings relative to assumptions and technical challenges, and presenting our own future perspectives of the field of population genomics for filamentous plant pathogens.
Collapse
Affiliation(s)
- Sydney Everhart
- Department of Plant Pathology, University of Nebraska, Lincoln, NE 68583, U.S.A
| | - Nikita Gambhir
- Department of Plant Pathology, University of Nebraska, Lincoln, NE 68583, U.S.A
| | - Remco Stam
- Phytopathology, School of Life Sciences Weihenstephan, Technical University Munich, Germany
| |
Collapse
|
186
|
Heller D, Vingron M. SVIM-asm: Structural variant detection from haploid and diploid genome assemblies. Bioinformatics 2020; 36:5519-5521. [PMID: 33346817 PMCID: PMC8016491 DOI: 10.1093/bioinformatics/btaa1034] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/16/2020] [Accepted: 12/12/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION With the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes. RESULTS We introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual. AVAILABILITY AND IMPLEMENTATION SVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/svim-asm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Heller
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Martin Vingron
- Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
187
|
Accurate mapping of mitochondrial DNA deletions and duplications using deep sequencing. PLoS Genet 2020; 16:e1009242. [PMID: 33315859 PMCID: PMC7769605 DOI: 10.1371/journal.pgen.1009242] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 12/28/2020] [Accepted: 11/02/2020] [Indexed: 12/21/2022] Open
Abstract
Deletions and duplications in mitochondrial DNA (mtDNA) cause mitochondrial disease and accumulate in conditions such as cancer and age-related disorders, but validated high-throughput methodology that can readily detect and discriminate between these two types of events is lacking. Here we establish a computational method, MitoSAlt, for accurate identification, quantification and visualization of mtDNA deletions and duplications from genomic sequencing data. Our method was tested on simulated sequencing reads and human patient samples with single deletions and duplications to verify its accuracy. Application to mouse models of mtDNA maintenance disease demonstrated the ability to detect deletions and duplications even at low levels of heteroplasmy.
Collapse
|
188
|
Identification and population genetic analyses of copy number variations in six domestic goat breeds and Bezoar ibexes using next-generation sequencing. BMC Genomics 2020; 21:840. [PMID: 33246410 PMCID: PMC7694352 DOI: 10.1186/s12864-020-07267-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 11/23/2020] [Indexed: 11/27/2022] Open
Abstract
Background Copy number variations (CNVs) are a major form of genetic variations and are involved in animal domestication and genetic adaptation to local environments. We investigated CNVs in the domestic goat (Capra hircus) using Illumina short-read sequencing data, by comparing our lab data for 38 goats from three Chinese breeds (Chengdu Brown, Jintang Black, and Tibetan Cashmere) to public data for 26 individuals from three other breeds (two Moroccan and one Chinese) and 21samples from Bezoar ibexes. Results We obtained a total of 2394 CNV regions (CNVRs) by merging 208,649 high-confidence CNVs, which spanned ~ 267 Mb of total length and accounted for 10.80% of the goat autosomal genome. Functional analyses showed that 2322 genes overlapping with the CNVRs were significantly enriched in 57 functional GO terms and KEGG pathways, most related to the nervous system, metabolic process, and reproduction system. Clustering patterns of all 85 samples generated separately from duplications and deletions were generally consistent with the results from SNPs, agreeing with the geographical origins of these goats. Based on genome-wide FST at each CNV locus, some genes overlapping with the highly divergent CNVs between domestic and wild goats were mainly enriched for several immunity-related pathways, whereas the genes overlapping with the highly differentiated CNVs between highland and lowland goats were mainly related to vitamin and lipid metabolism. Remarkably, a 507-bp deletion at ~ 14 kb downstream of FGF5 on chromosome 6 showed highly divergent (FST = 0.973) between the highland and lowland goats. Together with an enhancer activity of this sequence shown previously, the function of this duplication in regulating fiber growth deserved to be further investigated in detail. Conclusion We generated a comprehensive map of CNVs in goats. Many genetically differentiated CNVs among various goat populations might be associated with the population characteristics of domestic goat breeds. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07267-6.
Collapse
|
189
|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020; 16:e1008397. [PMID: 33226985 PMCID: PMC7721175 DOI: 10.1371/journal.pcbi.1008397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 12/07/2020] [Accepted: 09/24/2020] [Indexed: 11/19/2022] Open
Abstract
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Cancer and many other diseases are often driven by structural rearrangements in the patients. Their precise identification is necessary to understand evolution and cure for the disease. In this study, we have compared two sequencing technologies for the identification of structural variations i.e. Illumina’s short-reads and 10X Genomics linked-reads sequencing. Short-reads sequencing is already known to have high false discovery rate for structural variations, while, an unbiased performance evaluation of linked-reads sequencing is missing. Hence, we evaluate the performance of these two technologies using computational and PCR based methodologies. Moreover, we also present a statistical approach to increase their performance, supporting better detection of structural variations and thus further research into disease biology.
Collapse
|
190
|
Lomov N, Zerkalenkova E, Lebedeva S, Viushkov V, Rubtsov MA. Cytogenetic and molecular genetic methods for chromosomal translocations detection with reference to the KMT2A/MLL gene. Crit Rev Clin Lab Sci 2020; 58:180-206. [PMID: 33205680 DOI: 10.1080/10408363.2020.1844135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Acute leukemias (ALs) are often associated with chromosomal translocations, in particular, KMT2A/MLL gene rearrangements. Identification or confirmation of these translocations is carried out by a number of genetic and molecular methods, some of which are routinely used in clinical practice, while others are primarily used for research purposes. In the clinic, these methods serve to clarify diagnoses and monitor the course of disease and therapy. On the other hand, the identification of new translocations and the confirmation of known translocations are of key importance in the study of disease mechanisms and further molecular classification. There are multiple methods for the detection of rearrangements that differ in their principle of operation, the type of problem being solved, and the cost-result ratio. This review is intended to help researchers and clinicians studying AL and related chromosomal translocations to navigate this variety of methods. All methods considered in the review are grouped by their principle of action and include karyotyping, fluorescence in situ hybridization (FISH) with probes for whole chromosomes or individual loci, PCR and reverse transcription-based methods, and high-throughput sequencing. Another characteristic of the described methods is the type of problem being solved. This can be the discovery of new rearrangements, the determination of unknown partner genes participating in the rearrangement, or the confirmation of the proposed rearrangement between the two genes. We consider the specifics of the application, the basic principle of each method, and its pros and cons. To illustrate the application, examples of studying the rearrangements of the KMT2A/MLL gene, one of the genes that are often rearranged in AL, are mentioned.
Collapse
Affiliation(s)
- Nikolai Lomov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Elena Zerkalenkova
- Laboratory of Cytogenetics and Molecular Genetics Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia
| | - Svetlana Lebedeva
- Laboratory of Cytogenetics and Molecular Genetics Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia
| | - Vladimir Viushkov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail A Rubtsov
- Department of Molecular Biology, Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia.,Department of Biochemistry, Institute for Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| |
Collapse
|
191
|
Dhiman H, Campbell M, Melcher M, Smith KD, Borth N. Predicting favorable landing pads for targeted integrations in Chinese hamster ovary cell lines by learning stability characteristics from random transgene integrations. Comput Struct Biotechnol J 2020; 18:3632-3648. [PMID: 33304461 PMCID: PMC7710658 DOI: 10.1016/j.csbj.2020.11.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 01/06/2023] Open
Abstract
Chinese Hamster Ovary (CHO) cell lines are considered to be the preferred platform for the production of biotherapeutics, but issues related to expression instability remain unresolved. In this study, we investigated potential causes for an unstable phenotype by comparing cell lines that express stably to such that undergo loss in titer across 10 passages. Factors related to transgene integrity and copy number as well as the genomic profile around the integration sites were analyzed. Horizon Discovery CHO-K1 (HD-BIOP3) derived production cell lines selected for phenotypes with low, medium or high copy number, each with stable and unstable transgene expression, were sequenced to capture changes at genomic and transcriptomic levels. The exact sites of the random integration events in each cell line were also identified, followed by profiling of the genomic, transcriptomic and epigenetic patterns around them. Based on the information deduced from these random integration events, genomic loci that potentially favor reliable and stable transgene expression were reported for use as targeted transgene integration sites. By comparing stable vs unstable phenotypes across these parameters, we could establish that expression stability may be controlled at three levels: 1) Good choice of integration site, 2) Ensuring integrity of transgene and observing concatemerization pattern after integration, and 3) Checking for potential stress related cellular processes. Genome wide favorable and unfavorable genomic loci for targeted transgene integration can be browsed at https://www.borthlabchoresources.boku.ac.at/
Collapse
Affiliation(s)
- Heena Dhiman
- University of Natural Resources and Life Sciences, Vienna, Austria.,Austrian Centre of Industrial Biotechnology, Vienna, Austria
| | | | - Michael Melcher
- University of Natural Resources and Life Sciences, Vienna, Austria
| | | | - Nicole Borth
- University of Natural Resources and Life Sciences, Vienna, Austria.,Austrian Centre of Industrial Biotechnology, Vienna, Austria
| |
Collapse
|
192
|
Rao J, Peng L, Liang X, Jiang H, Geng C, Zhao X, Liu X, Fan G, Chen F, Mu F. Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms. BMC Bioinformatics 2020; 21:518. [PMID: 33176676 PMCID: PMC7659224 DOI: 10.1186/s12859-020-03859-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 11/03/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear. RESULTS Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions. CONCLUSIONS We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.
Collapse
Affiliation(s)
- Junhua Rao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | | | | | - Hui Jiang
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | | | - Xia Zhao
- MGI, BGI-Shenzhen, Shenzhen, 518083, China
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, Shandong, China.,IGDB-BGI Joint Center for Omics, BGI-Shenzhen, Shenzhen, 518083, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, Shandong, China.,State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Fang Chen
- MGI, BGI-Shenzhen, Shenzhen, 518083, China. .,BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.
| | - Feng Mu
- MGI, BGI-Shenzhen, Shenzhen, 518083, China. .,MGI-Wuhan, BGI-Shenzhen, Wuhan, 430074, China.
| |
Collapse
|
193
|
Uchiyama Y, Yamaguchi D, Iwama K, Miyatake S, Hamanaka K, Tsuchida N, Aoi H, Azuma Y, Itai T, Saida K, Fukuda H, Sekiguchi F, Sakaguchi T, Lei M, Ohori S, Sakamoto M, Kato M, Koike T, Takahashi Y, Tanda K, Hyodo Y, Honjo RS, Bertola DR, Kim CA, Goto M, Okazaki T, Yamada H, Maegaki Y, Osaka H, Ngu LH, Siew CG, Teik KW, Akasaka M, Doi H, Tanaka F, Goto T, Guo L, Ikegawa S, Haginoya K, Haniffa M, Hiraishi N, Hiraki Y, Ikemoto S, Daida A, Hamano SI, Miura M, Ishiyama A, Kawano O, Kondo A, Matsumoto H, Okamoto N, Okanishi T, Oyoshi Y, Takeshita E, Suzuki T, Ogawa Y, Handa H, Miyazono Y, Koshimizu E, Fujita A, Takata A, Miyake N, Mizuguchi T, Matsumoto N. Efficient detection of copy-number variations using exome data: Batch- and sex-based analyses. Hum Mutat 2020; 42:50-65. [PMID: 33131168 DOI: 10.1002/humu.24129] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 09/29/2020] [Accepted: 10/15/2020] [Indexed: 12/16/2022]
Abstract
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Collapse
Affiliation(s)
- Yuri Uchiyama
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | - Kazuhiro Iwama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomi Tsuchida
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Aoi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiteru Azuma
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Toshiyuki Itai
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ken Saida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Futoshi Sekiguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohiro Sakaguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ming Lei
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Masamune Sakamoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan
| | - Takayoshi Koike
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Yukitoshi Takahashi
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Koichi Tanda
- Department of Pediatrics, Japanese Red Cross Kyoto Daiichi Hospital, Kyoto, Japan
| | - Yuki Hyodo
- Department of Child Neurology, Okayama University Hospital, Okayama, Japan
| | - Rachel S Honjo
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Debora Romeo Bertola
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Chong Ae Kim
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Masahide Goto
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Tetsuya Okazaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hiroyuki Yamada
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Yoshihiro Maegaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hitoshi Osaka
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Lock-Hock Ngu
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Ch'ng G Siew
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Keng W Teik
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Manami Akasaka
- Department of Pediatrics, Iwate Medical University School of Medicine, Morioka, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohide Goto
- Division of Neurology, Kanagawa Children's Medical Center, Yokohama, Japan
| | - Long Guo
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Kazuhiro Haginoya
- Department of Pediatric Neurology, Miyagi Children's Hospital, Sendai, Japan
| | - Muzhirah Haniffa
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Nozomi Hiraishi
- Department of Pediatrics, Yokohama City University Medical Center, Yokohama, Japan
| | - Yoko Hiraki
- Hiroshima Municipal Center for Child Health and Development, Hiroshima, Japan
| | - Satoru Ikemoto
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Atsuro Daida
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Shin-Ichiro Hamano
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Masaki Miura
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan.,Department of Pediatrics, Nagaoka Red Cross Hospital, Nagaoka, Japan
| | - Akihiko Ishiyama
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Osamu Kawano
- Department of Pediatrics, Hokkaido University Hospital, Sapporo, Japan
| | - Akane Kondo
- Clinical Genetics Center, Shikoku Medical Center for Children and Adults, National Hospital Organization, Kagawa, Japan
| | - Hiroshi Matsumoto
- Department of Pediatrics, National Defense Medical College, Saitama, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Tohru Okanishi
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan.,Department of Child Neurology, Comprehensive Epilepsy Center, Seirei Hamamatsu General Hospital, Hamamatsu, Japan
| | - Yukimi Oyoshi
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Eri Takeshita
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Toshifumi Suzuki
- Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiyuki Ogawa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Hiroshi Handa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Yayoi Miyazono
- Department of Child Health, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Noriko Miyake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
194
|
Delage WJ, Thevenon J, Lemaitre C. Towards a better understanding of the low recall of insertion variants with short-read based variant callers. BMC Genomics 2020; 21:762. [PMID: 33148192 PMCID: PMC7640490 DOI: 10.1186/s12864-020-07125-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 10/06/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Since 2009, numerous tools have been developed to detect structural variants using short read technologies. Insertions >50 bp are one of the hardest type to discover and are drastically underrepresented in gold standard variant callsets. The advent of long read technologies has completely changed the situation. In 2019, two independent cross technologies studies have published the most complete variant callsets with sequence resolved insertions in human individuals. Among the reported insertions, only 17 to 28% could be discovered with short-read based tools. RESULTS In this work, we performed an in-depth analysis of these unprecedented insertion callsets in order to investigate the causes of such failures. We have first established a precise classification of insertion variants according to four layers of characterization: the nature and size of the inserted sequence, the genomic context of the insertion site and the breakpoint junction complexity. Because these levels are intertwined, we then used simulations to characterize the impact of each complexity factor on the recall of several structural variant callers. We showed that most reported insertions exhibited characteristics that may interfere with their discovery: 63% were tandem repeat expansions, 38% contained homology larger than 10 bp within their breakpoint junctions and 70% were located in simple repeats. Consequently, the recall of short-read based variant callers was significantly lower for such insertions (6% for tandem repeats vs 56% for mobile element insertions). Simulations showed that the most impacting factor was the insertion type rather than the genomic context, with various difficulties being handled differently among the tested structural variant callers, and they highlighted the lack of sequence resolution for most insertion calls. CONCLUSIONS Our results explain the low recall by pointing out several difficulty factors among the observed insertion features and provide avenues for improving SV caller algorithms and their combinations.
Collapse
Affiliation(s)
| | - Julien Thevenon
- Inserm U1209, CNRS UMR 5309, Univ. Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France & Genetics, Genomics and Reproduction Service, Centre Hospitalo-Universitaire Grenoble-Alpes, Grenoble, France
| | | |
Collapse
|
195
|
Buckley RM, Davis BW, Brashear WA, Farias FHG, Kuroki K, Graves T, Hillier LW, Kremitzki M, Li G, Middleton RP, Minx P, Tomlinson C, Lyons LA, Murphy WJ, Warren WC. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. PLoS Genet 2020; 16:e1008926. [PMID: 33090996 PMCID: PMC7581003 DOI: 10.1371/journal.pgen.1008926] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 06/10/2020] [Indexed: 12/30/2022] Open
Abstract
The domestic cat (Felis catus) numbers over 94 million in the USA alone, occupies households as a companion animal, and, like humans, suffers from cancer and common and rare diseases. However, genome-wide sequence variant information is limited for this species. To empower trait analyses, a new cat genome reference assembly was developed from PacBio long sequence reads that significantly improve sequence representation and assembly contiguity. The whole genome sequences of 54 domestic cats were aligned to the reference to identify single nucleotide variants (SNVs) and structural variants (SVs). Across all cats, 16 SNVs predicted to have deleterious impacts and in a singleton state were identified as high priority candidates for causative mutations. One candidate was a stop gain in the tumor suppressor FBXW7. The SNV is found in cats segregating for feline mediastinal lymphoma and is a candidate for inherited cancer susceptibility. SV analysis revealed a complex deletion coupled with a nearby potential duplication event that was shared privately across three unrelated cats with dwarfism and is found within a known dwarfism associated region on cat chromosome B1. This SV interrupted UDP-glucose 6-dehydrogenase (UGDH), a gene involved in the biosynthesis of glycosaminoglycans. Importantly, UGDH has not yet been associated with human dwarfism and should be screened in undiagnosed patients. The new high-quality cat genome reference and the compilation of sequence variation demonstrate the importance of these resources when searching for disease causative alleles in the domestic cat and for identification of feline biomedical models. The practice of genomic medicine is predicated on the availability of a high quality reference genome and an understanding of the impact of genome variation. Such resources have lead to countless discoveries in humans, however by working exclusively within the framework of human genetics, our potential for understanding diseases biology is limited, as similar analyses in other species have often lead to novel insights. The generation of Felis_catus_9.0, a new high quality reference genome for the domestic cat, helps facilitate the expansion of genomic medicine into the Felis lineage. Using Felis_catus_9.0 we analyze the landscape of genomic variation from a collection of 54 cats within the context of human gene constraint. The distribution of variant impacts in cats is correlated with patterns of gene constraint in humans, indicating the utility of this reference for identifying novel mutations that cause phenotypes relevant to human and cat health. Moreover, structural variant analysis revealed a novel variant for feline dwarfism in UGDH, a gene that has not been associated with dwarfism in any other species, suggesting a role for UGDH in cases of undiagnosed dwarfism in humans.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Brian W. Davis
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley A. Brashear
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Fabiana H. G. Farias
- Department of Psychiatry, Washington University, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics, Washington University, St. Louis, Missouri, United States of America
| | - Kei Kuroki
- Veterinary Medical Diagnostic Laboratory, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Tina Graves
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - LaDeana W. Hillier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Gang Li
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | | | - Patrick Minx
- Donald Danforth Plant Science, St Louis, Missouri, United States of America
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Leslie A. Lyons
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley C. Warren
- Division of Animal Sciences, School of Medicine, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
196
|
Yang L, Niu Q, Zhang T, Zhao G, Zhu B, Chen Y, Zhang L, Gao X, Gao H, Liu GE, Li J, Xu L. Genomic sequencing analysis reveals copy number variations and their associations with economically important traits in beef cattle. Genomics 2020; 113:812-820. [PMID: 33080318 DOI: 10.1016/j.ygeno.2020.10.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/21/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
Copy number variation (CNV) represents a major source of genetic variation, which may have potentially large effects, including alternating gene regulation and dosage, as well as contributing to gene expression and risk for normal phenotypic variability. We carried out a comprehensive analysis of CNV based on whole genome sequencing in Chinese Simmental beef cattle. Totally, we found 9313 deletion and 234 duplication events, covering 147.5 Mb autosomal regions. Within them, 257 deletion events of high frequency overlapped with 193 known RefGenes. Among these genes, we observed several genes were related to economically important traits, like residual feed intake, immune responding, pregnancy rate and muscle differentiation. Using a locus-based analysis, we identified 11 deletions and 1 duplication, which were significantly associated with three traits including carcass weight, tenderloin and longissimus muscle area. Our sequencing-based study provided important insights into investigating the association of CNVs with important traits in beef cattle.
Collapse
Affiliation(s)
- Liu Yang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
| | - Qunhao Niu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tianliu Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Guoyao Zhao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Zhu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yan Chen
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Lupei Zhang
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Xue Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Huijiang Gao
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705, USA.
| | - Junya Li
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| | - Lingyang Xu
- Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
| |
Collapse
|
197
|
Bertolotti AC, Layer RM, Gundappa MK, Gallagher MD, Pehlivanoglu E, Nome T, Robledo D, Kent MP, Røsæg LL, Holen MM, Mulugeta TD, Ashton TJ, Hindar K, Sægrov H, Florø-Larsen B, Erkinaro J, Primmer CR, Bernatchez L, Martin SAM, Johnston IA, Sandve SR, Lien S, Macqueen DJ. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 2020; 11:5176. [PMID: 33056985 PMCID: PMC7560756 DOI: 10.1038/s41467-020-18972-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 09/23/2020] [Indexed: 12/25/2022] Open
Abstract
Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species. This study presents and validates a novel approach to reliably identify structural variations (SVs) in non-model genomes using whole genome sequencing, which was used to detect 15,483 SVs in 492 Atlantic salmon, shedding light on their roles in genome evolution and the genetic architecture of domestication.
Collapse
Affiliation(s)
- Alicia C Bertolotti
- School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen, UK.,The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.,Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Manu Kumar Gundappa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Michael D Gallagher
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Ege Pehlivanoglu
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Torfinn Nome
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Matthew P Kent
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Line L Røsæg
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Matilde M Holen
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Teshome D Mulugeta
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | - Kjetil Hindar
- Norwegian Institute for Nature Research (NINA), P.O. Box 5685 Torgarden, 7485, Trondheim, Norway
| | | | - Bjørn Florø-Larsen
- Norwegian Veterinary Institute, P.O. Box 750 Sentrum, 0106, Oslo, Norway
| | - Jaakko Erkinaro
- Natural Resources Institute Finland (Luke), P.O. Box 413, FI-90014, Oulu, Finland
| | - Craig R Primmer
- Institute for Biotechnology, University of Helsinki, Helsinki, Finland
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS) Pavillon Charles-Eugène Marchand, Université Laval Québec, Québec, QC, Canada
| | - Samuel A M Martin
- School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen, UK
| | | | - Simen R Sandve
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway.
| | - Daniel J Macqueen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
198
|
Zhang Q, Zhang X, Liu J, Mao C, Chen S, Zhang Y, Leng L. Identification of copy number variation and population analysis of the sacred lotus ( Nelumbo nucifera). Biosci Biotechnol Biochem 2020; 84:2037-2044. [PMID: 32594903 DOI: 10.1080/09168451.2020.1786351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The sacred lotus (Nelumbo nucifera) is widely cultured in East Asia for its horticultural, agricultural, and medicinal values. Although many molecular markers had been used to extrapolate population genetics of the sacred lotus, a study of large variations, such as copy number variation (CNV), are absent up to now. In this study, we applied whole-genome re-sequencing to 24 lotus accessions, and use read depth information to genotype and filter original CNV call. Totally 448 duplications and 4,267 deletions were identified in the final CNV set. Further analysis of population structure revealed that the population structure patterns revealed by CNV and SNP are largely consistent with each other. Our result indicated that deep sequencing followed by genotyping is a quick and straightforward way to mine out CNV from the population, and the CNV along with SNP could enable us to better comprehend the biology of the plant.
Collapse
Affiliation(s)
- Qing Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Xueting Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Jing Liu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Chaoyi Mao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Sha Chen
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Yujun Zhang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| | - Liang Leng
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences , Beijing, China
| |
Collapse
|
199
|
Zhuang X, Ye R, So MT, Lam WY, Karim A, Yu M, Ngo ND, Cherny SS, Tam PKH, Garcia-Barcelo MM, Tang CSM, Sham PC. A random forest-based framework for genotyping and accuracy assessment of copy number variations. NAR Genom Bioinform 2020; 2:lqaa071. [PMID: 33575619 PMCID: PMC7671382 DOI: 10.1093/nargab/lqaa071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 08/18/2020] [Accepted: 08/26/2020] [Indexed: 12/24/2022] Open
Abstract
Detection of copy number variations (CNVs) is essential for uncovering genetic factors underlying human diseases. However, CNV detection by current methods is prone to error, and precisely identifying CNVs from paired-end whole genome sequencing (WGS) data is still challenging. Here, we present a framework, CNV-JACG, for Judging the Accuracy of CNVs and Genotyping using paired-end WGS data. CNV-JACG is based on a random forest model trained on 21 distinctive features characterizing the CNV region and its breakpoints. Using the data from the 1000 Genomes Project, Genome in a Bottle Consortium, the Human Genome Structural Variation Consortium and in-house technical replicates, we show that CNV-JACG has superior sensitivity over the latest genotyping method, SV2, particularly for the small CNVs (≤1 kb). We also demonstrate that CNV-JACG outperforms SV2 in terms of Mendelian inconsistency in trios and concordance between technical replicates. Our study suggests that CNV-JACG would be a useful tool in assessing the accuracy of CNVs to meet the ever-growing needs for uncovering the missing heritability linked to CNVs.
Collapse
Affiliation(s)
- Xuehan Zhuang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Rui Ye
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Man-Ting So
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Wai-Yee Lam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Anwarul Karim
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Michelle Yu
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Ngoc Diem Ngo
- National Hospital of Pediatrics, Ha Noi 100000, Vietnam
| | - Stacey S Cherny
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul Kwong-Hang Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | - Clara Sze-Man Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
200
|
Implications of germline copy-number variations in psychiatric disorders: review of large-scale genetic studies. J Hum Genet 2020; 66:25-37. [PMID: 32958875 DOI: 10.1038/s10038-020-00838-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023]
Abstract
Copy number variants (CNVs), defined as genome sequences of ≥50 bp that differ in copy number from that in a reference genome, are a common form of structural variation. Germline CNVs account for some of the missing heritability that single nucleotide polymorphisms could not account for. Recent technological advances have had a huge impact on CNV research. Microarray technology enables relatively low-cost, high-throughput, genome-wide measurements, and short-read sequencing technology enables the detection of short CNVs that cannot be detected by microarrays. As a result, large-scale genetic studies have been able to identify a variety of common and rare germline CNVs and their associations with diseases. Rare germline CNVs have been reported to be associated with neuropsychiatric disorders. In this review, we focused on germline CNVs and briefly described their functional characteristics, formation mechanisms, detection methods, related databases, and the latest findings. Finally, we introduced recent large-scale genetic studies to assess associations of CNVs with diseases, especially psychiatric disorders, and discussed the use of CNV-based animal models to investigate the molecular and cellular mechanisms underlying these disorders. The development and implementation of improved detection methods, such as long-read single-molecule sequencing, are expected to provide additional insight into the molecular basis of psychiatric disorders and other complex diseases, thus facilitating basic and clinical research on CNVs.
Collapse
|