1
|
D'Amico A, Sung H, Arbona-Lampaya A, Freifeld A, Hosey K, Garcia J, Lacbawan L, Besançon E, Kassem L, Akula N, Knowles EEM, Dickinson D, McMahon FJ. Independent inheritance of cognition and bipolar disorder in a family sample. Am J Med Genet B Neuropsychiatr Genet 2024:e33001. [PMID: 39011872 DOI: 10.1002/ajmg.b.33001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/28/2024] [Accepted: 06/17/2024] [Indexed: 07/17/2024]
Abstract
Cognitive deficits in people with bipolar disorder (BD) may be the result of the illness or its treatment, but they could also reflect genetic risk factors shared between BD and cognition. We investigated this question using empirical genetic relationships within a sample of patients with BD and their unaffected relatives. Participants with bipolar I, II, or schizoaffective disorder ("narrow" BD, n = 69), related mood disorders ("broad" BD, n = 135), and their clinically unaffected relatives (n = 227) completed five cognitive tests. General cognitive function (g) was quantified via principal components analysis (PCA). Heritability and genetic correlations were estimated with SOLAR-Eclipse. Participants with "narrow" or "broad" diagnoses showed deficits in g, although affect recognition was unimpaired. Cognitive performance was significantly heritable (h2 = 0.322 for g, p < 0.005). Coheritability between psychopathology and g was small (0.0184 for narrow and 0.0327 for broad) and healthy relatives of those with BD were cognitively unimpaired. In this family sample, cognitive deficits were present in participants with BD but were not explained by substantial overlaps in genetic determinants of mood and cognition. These findings support the view that cognitive deficits in BD are largely the result of the illness or its treatment.
Collapse
Affiliation(s)
- Alexander D'Amico
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Heejong Sung
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Alejandro Arbona-Lampaya
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Ally Freifeld
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Katie Hosey
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Joshua Garcia
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Ley Lacbawan
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Emily Besançon
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Layla Kassem
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Nirmala Akula
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | | | - Dwight Dickinson
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| | - Francis J McMahon
- Intramural Research Program, National Institute of Mental Health, NIH, DHHS, Bethesda, Maryland, USA
| |
Collapse
|
2
|
Kim J, Haley J, Hatton JN, Mirshahi UL, Rao HS, Ramos MF, Smelser D, Urban G, Schultz KAP, Carey DJ, Stewart DR. A genome-first approach to characterize DICER1 pathogenic variant prevalence, penetrance and cancer, thyroid, and other phenotypes in 2 population-scale cohorts. GENETICS IN MEDICINE OPEN 2024; 2:101846. [PMID: 39070603 PMCID: PMC11271802 DOI: 10.1016/j.gimo.2024.101846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Purpose Population-scale, exome-sequenced cohorts with linked electronic health records (EHR) permit genome-first exploration of phenotype. Phenotype and cancer risk are well-characterized in children with a pathogenic DICER1 (HGNC ID:17098) variant. Here, the prevalence, penetrance and phenotype of pathogenic germline DICER1 variants in adults was investigated in two population-scale cohorts. Methods Variant pathogenicity was classified using published DICER1 ClinGen criteria in the UK Biobank (469,787 exomes; unrelated: 437,663) and Geisinger (170,503 exomes; unrelated: 109,789) cohorts. In the UK Biobank cohort, cancer diagnoses in the EHR, cancer and death registry were queried. For the Geisinger cohort, the Geisinger Cancer Registry and EHR were queried. Results In the UK Biobank, there were 46 unique pathogenic DICER1 variants in 57 individuals (1:8,242;95%CI:1:6,362-1:10,677). In Geisinger, there were 16 unique pathogenic DICER1 variants (including one microdeletion) in 21 individuals (1:8,119;95%CI:1:5,310-1:12,412). Cohorts were well-powered to find larger effect sizes for common cancers. Cancers were not significantly enriched in DICER1 heterozygotes; however, there was a ~4-fold increased risk for thyroid disease in both cohorts. There were multiple ICD10 codes enriched >2-fold in both cohorts. Conclusion Estimates of pathogenic germline DICER1 prevalence, thyroid disease penetrance and cancer phenotype from genomically ascertained adults are determined in two large cohorts.
Collapse
Affiliation(s)
- Jung Kim
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Jeremy Haley
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - Jessica N. Hatton
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Uyenlinh L. Mirshahi
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - H. Shanker Rao
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - Mark F. Ramos
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| | - Diane Smelser
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - Gretchen Urban
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - Kris Ann P. Schultz
- International Pleuropulmonary Blastoma (PPB)/DICER1 Registry, Children’s Minnesota, Minneapolis, MN
- Cancer and Blood Disorders, Children’s Minnesota, Minneapolis, MN
- International Ovarian and Testicular Stromal Tumor Registry, Children’s Minnesota, Minneapolis, MN
| | - David J. Carey
- Department of Genomic Health, Weis Center for Research, Geisinger Medical Center, Danville, PA USA
| | - Douglas R. Stewart
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD
| |
Collapse
|
3
|
Mougeot JLC, Beckman MF, Alexander AS, Hovan AJ, Hasséus B, Legert KG, Johansson JE, von Bültzingslöwen I, Brennan MT, Mougeot FB. Single nucleotide polymorphisms conferring susceptibility to leukemia and oral mucositis: a multi-center pilot study of patients prior to conditioning therapy for hematopoietic cell transplant. Support Care Cancer 2024; 32:220. [PMID: 38467943 DOI: 10.1007/s00520-024-08408-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
PURPOSE Leukemias have been associated with oral manifestations, reflecting susceptibility to cancer therapy-induced oral mucositis. We sought to identify SNPs associated with both leukemia and oral mucositis (OM). METHODS Whole exome sequencing was performed on leukemia and non-cancer blood disorder (ncBD) patients' saliva samples (N = 50) prior to conditioning therapy. WHO OM grading scores were determined: moderate to severe (OM2-4) vs. none to mild (OM0-1). Reads were processed using Trim Galorev0.6.7, Bowtie2v2.4.1, Samtoolsv1.10, Genome Analysis Toolkit (GATK)v4.2.6.1, and DeepVariantv1.4.0. We utilized the following pipelines: P1 analysis with PLINK2v3.7, SNP2GENEv1.4.1 and MAGMAv1.07b, and P2 [leukemia (N = 42) vs. ncBDs (N = 8)] and P3 [leukemia + OM2-4 (N = 18) vs. leukemia + OM0-1 (N = 24)] with Z-tests of genotypes and protein-protein interaction determination. GeneCardsSuitev5.14 was used to identify phenotypes (P1 and P2, leukemia; P3, oral mucositis) and average disease-causing likelihood and DGIdb for drug interactions. P1 and P2 genes were analyzed with CytoScape plugin BiNGOv3.0.3 to retrieve overrepresented Gene Ontology (GO) terms and Ensembl's VEP for SNP outcomes. RESULTS In P1, 457 candidate SNPs (28 genes) were identified and 21,604 SNPs (1016 genes) by MAGMAv1.07b. Eighteen genes were associated with "leukemia" per VarElectv5.14 analysis and predicted to be deleterious. In P2 and P3, 353 and 174 SNPs were significant, respectively. STRINGv12.0 returned 77 and 32 genes (C.L. = 0.7) for P2 and P3, respectively. VarElectv5.14 determined 60 genes from P2 associated with "leukemia" and 11 with "oral mucositis" from P3. Overrepresented GO terms included "cellular process," "signaling," "hemopoiesis," and "regulation of immune response." CONCLUSIONS We identified candidate SNPs possibly conferring susceptibility to develop leukemia and oral mucositis.
Collapse
Affiliation(s)
- Jean-Luc C Mougeot
- Translational Research Laboratories, Department of Oral Medicine/Oral & Maxillofacial Surgery, Atrium Health Carolinas Medical Center, Charlotte, NC, USA.
- Department of Otolaryngology/Head & Neck Surgery, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
| | - Micaela F Beckman
- Translational Research Laboratories, Department of Oral Medicine/Oral & Maxillofacial Surgery, Atrium Health Carolinas Medical Center, Charlotte, NC, USA
- Department of Otolaryngology/Head & Neck Surgery, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Adam S Alexander
- Translational Research Laboratories, Department of Oral Medicine/Oral & Maxillofacial Surgery, Atrium Health Carolinas Medical Center, Charlotte, NC, USA
- Department of Otolaryngology/Head & Neck Surgery, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Allan J Hovan
- BC Cancer, Oral Oncology and Dentistry, Vancouver, BC, Canada
| | - Bengt Hasséus
- Department of Oral Medicine and Pathology, University of Gothenburg, Gothenburg, Sweden
| | - Karin Garming Legert
- Department of Dental Medicine, University Dental Clinic, Karolinska Institutet, Huddinge, Sweden
| | - Jan-Erik Johansson
- Department of Hematology and Coagulation, Sahlgrenska University Hospital, Gothenburg, Sweden
| | | | - Michael T Brennan
- Department of Otolaryngology/Head & Neck Surgery, Wake Forest University School of Medicine, Winston-Salem, NC, USA
- Department of Oral Medicine/Oral & Maxillofacial Surgery, Atrium Health Carolinas Medical Center, Charlotte, NC, USA
| | - Farah Bahrani Mougeot
- Translational Research Laboratories, Department of Oral Medicine/Oral & Maxillofacial Surgery, Atrium Health Carolinas Medical Center, Charlotte, NC, USA.
- Department of Otolaryngology/Head & Neck Surgery, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
| |
Collapse
|
4
|
Du H, Dardas Z, Jolly A, Grochowski CM, Jhangiani SN, Li H, Muzny D, Fatih JM, Yesil G, Elçioglu NH, Gezdirici A, Marafi D, Pehlivan D, Calame DG, Carvalho CMB, Posey JE, Gambin T, Coban-Akdemir Z, Lupski JR. HMZDupFinder: a robust computational approach for detecting intragenic homozygous duplications from exome sequencing data. Nucleic Acids Res 2024; 52:e18. [PMID: 38153174 PMCID: PMC10899794 DOI: 10.1093/nar/gkad1223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/18/2023] [Accepted: 12/13/2023] [Indexed: 12/29/2023] Open
Abstract
Homozygous duplications contribute to genetic disease by altering gene dosage or disrupting gene regulation and can be more deleterious to organismal biology than heterozygous duplications. Intragenic exonic duplications can result in loss-of-function (LoF) or gain-of-function (GoF) alleles that when homozygosed, i.e. brought to homozygous state at a locus by identity by descent or state, could potentially result in autosomal recessive (AR) rare disease traits. However, the detection and functional interpretation of homozygous duplications from exome sequencing data remains a challenge. We developed a framework algorithm, HMZDupFinder, that is designed to detect exonic homozygous duplications from exome sequencing (ES) data. The HMZDupFinder algorithm can efficiently process large datasets and accurately identifies small intragenic duplications, including those associated with rare disease traits. HMZDupFinder called 965 homozygous duplications with three or less exons from 8,707 ES with a recall rate of 70.9% and a precision of 16.1%. We experimentally confirmed 8/10 rare homozygous duplications. Pathogenicity assessment of these copy number variant alleles allowed clinical genomics contextualization for three homozygous duplications alleles, including two affecting known OMIM disease genes EDAR (MIM# 224900), TNNT1(MIM# 605355), and one variant in a novel candidate disease gene: PAAF1.
Collapse
Affiliation(s)
- Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zain Dardas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angad Jolly
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - He Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gozde Yesil
- Department of Medical Genetics, Istanbul Medical Faculty, Istanbul 34093, Turkey
| | - Nursel H Elçioglu
- Department of Pediatric Genetics, Marmara University Medical Faculty, Istanbul and Eastern Mediterranean University Faculty of Medicine, Mersin 10, Turkey
| | - Alper Gezdirici
- Department of Medical Genetics, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, 34480 Istanbul, Turkey
| | - Dana Marafi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pediatrics, Faculty of Medicine, Kuwait University, Kuwait
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX 77030, USA
| | - Daniel G Calame
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX 77030, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tomasz Gambin
- Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland
| | - Zeynep Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
- Texas Children's Hospital, Houston, TX 77030, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
5
|
Zhao H, Baudis M. labelSeg: segment annotation for tumor copy number alteration profiles. Brief Bioinform 2024; 25:bbad541. [PMID: 38300514 PMCID: PMC10833088 DOI: 10.1093/bib/bbad541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/09/2023] [Accepted: 12/28/2024] [Indexed: 02/02/2024] Open
Abstract
Somatic copy number alterations (SCNAs) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating results consisting of frequently large sets of SCNA segments. However, the automated annotation and integration of such data are particularly challenging because the measured signals reflect biased, relative copy number ratios. In this study, we introduce labelSeg, an algorithm designed for rapid and accurate annotation of CNA segments, with the aim of enhancing the interpretation of tumor SCNA profiles. Leveraging density-based clustering and exploiting the length-amplitude relationships of SCNA, our algorithm proficiently identifies distinct relative copy number states from individual segment profiles. Its compatibility with most CNA measurement platforms makes it suitable for large-scale integrative data analysis. We confirmed its performance on both simulated and sample-derived data from The Cancer Genome Atlas reference dataset, and we demonstrated its utility in integrating heterogeneous segment profiles from different data sources and measurement platforms. Our comparative and integrative analysis revealed common SCNA patterns in cancer and protein-coding genes with a strong correlation between SCNA and messenger RNA expression, promoting the investigation into the role of SCNA in cancer development.
Collapse
Affiliation(s)
- Hangjia Zhao
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
- Computational Oncogenomics Group, Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
- Computational Oncogenomics Group, Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| |
Collapse
|
6
|
Louw N, Carstens N, Lombard Z. Incorporating CNV analysis improves the yield of exome sequencing for rare monogenic disorders-an important consideration for resource-constrained settings. Front Genet 2023; 14:1277784. [PMID: 38155715 PMCID: PMC10753787 DOI: 10.3389/fgene.2023.1277784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/22/2023] [Indexed: 12/30/2023] Open
Abstract
Exome sequencing (ES) is a recommended first-tier diagnostic test for many rare monogenic diseases. It allows for the detection of both single-nucleotide variants (SNVs) and copy number variants (CNVs) in coding exonic regions of the genome in a single test, and this dual analysis is a valuable approach, especially in limited resource settings. Single-nucleotide variants are well studied; however, the incorporation of copy number variant analysis tools into variant calling pipelines has not been implemented yet as a routine diagnostic test, and chromosomal microarray is still more widely used to detect copy number variants. Research shows that combined single and copy number variant analysis can lead to a diagnostic yield of up to 58%, increasing the yield with as much as 18% from the single-nucleotide variant only pipeline. Importantly, this is achieved with the consideration of computational costs only, without incurring any additional sequencing costs. This mini review provides an overview of copy number variant analysis from exome data and what the current recommendations are for this type of analysis. We also present an overview on rare monogenic disease research standard practices in resource-limited settings. We present evidence that integrating copy number variant detection tools into a standard exome sequencing analysis pipeline improves diagnostic yield and should be considered a significantly beneficial addition, with relatively low-cost implications. Routine implementation in underrepresented populations and limited resource settings will promote generation and sharing of CNV datasets and provide momentum to build core centers for this niche within genomic medicine.
Collapse
Affiliation(s)
- Nadja Louw
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Nadia Carstens
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Genomics Platform, South African Medical Research Council, Cape Town, South Africa
| | - Zané Lombard
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | |
Collapse
|
7
|
Yang H, Shen H, Zhu G, Shao X, Chen Q, Yang F, Zhang Y, Zhang Y, Zhao K, Luo M, Zhou Z, Shu C. Molecular characterization and clinical investigation of patients with heritable thoracic aortic aneurysm and dissection. J Thorac Cardiovasc Surg 2023; 166:1594-1603.e5. [PMID: 36517271 DOI: 10.1016/j.jtcvs.2022.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 10/28/2022] [Accepted: 11/07/2022] [Indexed: 11/13/2022]
Abstract
OBJECTIVES Thoracic aortic aneurysm and dissection has a genetic predisposition and a variety of clinical manifestations. This study aimed to investigate the clinical and molecular characterizations of patients with thoracic aortic aneurysm and dissection and further explore the relationship between the genotype and phenotype, as well as their postoperative outcomes. METHODS A total of 1095 individuals with thoracic aortic aneurysm and dissection admitted to our hospital between 2013 and 2022 were included. Next-generation sequencing and multiplex ligation-dependent probe amplification were performed, and mosaicism analysis was additionally implemented to identify the genetic causes. RESULTS A total of 376 causative variants were identified in 83.5% of patients with syndromic thoracic aortic aneurysm and dissection and 18.7% of patients with nonsyndromic thoracic aortic aneurysm and dissection, including 8 copy number variations and 2 mosaic variants. Patients in the "pathogenic" and "variant of uncertain significance" groups had younger ages of aortic events and higher aortic reintervention risks compared with genetically negative cases. In addition, patients with FBN1 haploinsufficiency variants had shorter reintervention-free survival than those with FBN1 dominant negative variants. CONCLUSIONS Our data expanded the genetic spectrum of heritable thoracic aortic aneurysm and dissection and indicated that copy number variations and mosaic variants contributed to a small proportion of the disease-causing alterations. Moreover, positive genetic results might have a possible predictive value for aortic event severity and postoperative risk stratification.
Collapse
Affiliation(s)
- Hang Yang
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Huayan Shen
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Guoyan Zhu
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xinyang Shao
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qianlong Chen
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Fangfang Yang
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yinhui Zhang
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yujing Zhang
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Kun Zhao
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mingyao Luo
- State Key Laboratory of Cardiovascular Disease, Center of Vascular Surgery, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; Department of Vascular Surgery, Fuwai Yunnan Cardiovascular Hospital, Affiliated Cardiovascular Hospital of Kunming Medical University, Kunming, Yunnan, China.
| | - Zhou Zhou
- State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, Diagnostic Laboratory Service, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Chang Shu
- State Key Laboratory of Cardiovascular Disease, Center of Vascular Surgery, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| |
Collapse
|
8
|
Tian Q, Tang J, Wang L, Liu J, Li X, Cao Z, Tian Z. Idiopathic hypogonadotropic hypogonadism caused by compound heterozygosity for two novel mutations in the GNRH1 gene: a case report. BMC Endocr Disord 2023; 23:213. [PMID: 37798680 PMCID: PMC10557371 DOI: 10.1186/s12902-023-01455-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 09/11/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Idiopathic hypogonadotropic hypogonadism (IHH) is a rare congenital or acquired genetic disorder caused by gonadotropin-releasing hormone (GnRH) deficiency. IHH patients are divided into two major groups, hyposmic or anosmic IHH (Kallmann syndrome) and normosmic IHH (nIHH), according to whether their sense of smell is intact. Here we report a case of novel compound heterozygous mutations in the GNRH1 gene in a 15-year-old male with nIHH. CASE PRESENTATION The patient presented typical clinical symptoms of delayed testicular development, with testosterone < 3.5 mmol/L and reduced gonadotropin (follicle-stimulating hormone, luteinizing hormone) levels. Two heterozygous variants of the GNRH1 gene were detected, nonsense variant 1: c.85G > T:p.G29* and variant 2: c.1A > G:p.M1V, which disrupted the start codon. CONCLUSIONS Two GNRH1 mutations responsible for nIHH are identified in this study. Our findings extend the mutational spectrum of GNRH1 by revealing novel causative mutations of nIHH.
Collapse
Affiliation(s)
- Qingqing Tian
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
- Medical School of Yan'an University, Yan'an, 716000, Shaanxi, China
| | - Jingjing Tang
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
| | - Lihong Wang
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
- Medical School of Yan'an University, Yan'an, 716000, Shaanxi, China
| | - Jiaojiao Liu
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
- Medical School of Yan'an University, Yan'an, 716000, Shaanxi, China
| | - Xiangshan Li
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
- Medical School of Yan'an University, Yan'an, 716000, Shaanxi, China
| | - Zhuozhuo Cao
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China
- Medical School of Yan'an University, Yan'an, 716000, Shaanxi, China
| | - Zhufang Tian
- Department of Endocrinology, Xi'an Central Hospital, No. 161 Xiwu Road, Xi'an, 710003, Shaanxi, China.
| |
Collapse
|
9
|
Babadi M, Fu JM, Lee SK, Smirnov AN, Gauthier LD, Walker M, Benjamin DI, Zhao X, Karczewski KJ, Wong I, Collins RL, Sanchis-Juan A, Brand H, Banks E, Talkowski ME. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat Genet 2023; 55:1589-1597. [PMID: 37604963 PMCID: PMC10904014 DOI: 10.1038/s41588-023-01449-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 06/16/2023] [Indexed: 08/23/2023]
Abstract
Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
Collapse
Affiliation(s)
- Mehrtash Babadi
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jack M Fu
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel K Lee
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrey N Smirnov
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura D Gauthier
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Walker
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - David I Benjamin
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Isaac Wong
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Banks
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
10
|
Safonov A, Nomakuchi TT, Chao E, Horton C, Dolinsky JS, Yussuf A, Richardson M, Speare V, Li S, Bogus ZC, Bonanni M, Raper A, Kallish S, Ritchie MD, Nathanson KL, Drivas TG. A genotype-first approach identifies high incidence of NF1 pathogenic variants with distinct disease associations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.08.23293676. [PMID: 37609227 PMCID: PMC10441497 DOI: 10.1101/2023.08.08.23293676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Loss of function variants in the NF1 gene cause neurofibromatosis type 1 (NF1), a genetic disorder characterized by complete penetrance, prevalence of 1 in 3,000, characteristic physical exam findings, and a substantially increased risk for malignancy. However, our understanding of the disorder is entirely based on patients ascertained through phenotype-first approaches. Leveraging a genotype-first approach in two large patient cohorts, we demonstrate unexpectedly high prevalence (1 in 450-750) of NF1 pathogenic variants. Half were identified in individuals lacking clinical features of NF1, with many appearing to have post-zygotic mosaicism for the identified variant. Incidentally discovered variants were not associated with classic NF1 features but were associated with an increased incidence of malignancy compared to a control population. Our findings suggest that NF1 pathogenic variants are substantially more common than previously thought, often characterized by somatic mosaicism and reduced penetrance, and are important contributors to cancer risk in the general population.
Collapse
|
11
|
Yu J, Chen N, Zheng Z, Gao M, Liang N, Wong KC. Chromothripsis detection with multiple myeloma patients based on deep graph learning. Bioinformatics 2023; 39:btad422. [PMID: 37399092 PMCID: PMC10343948 DOI: 10.1093/bioinformatics/btad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/20/2023] [Accepted: 06/30/2023] [Indexed: 07/05/2023] Open
Abstract
MOTIVATION Chromothripsis, associated with poor clinical outcomes, is prognostically vital in multiple myeloma. The catastrophic event is reported to be detectable prior to the progression of multiple myeloma. As a result, chromothripsis detection can contribute to risk estimation and early treatment guidelines for multiple myeloma patients. However, manual diagnosis remains the gold standard approach to detect chromothripsis events with the whole-genome sequencing technology to retrieve both copy number variation (CNV) and structural variation data. Meanwhile, CNV data are much easier to obtain than structural variation data. Hence, in order to reduce the reliance on human experts' efforts and structural variation data extraction, it is necessary to establish a reliable and accurate chromothripsis detection method based on CNV data. RESULTS To address those issues, we propose a method to detect chromothripsis solely based on CNV data. With the help of structure learning, the intrinsic relationship-directed acyclic graph of CNV features is inferred to derive a CNV embedding graph (i.e. CNV-DAG). Subsequently, a neural network based on Graph Transformer, local feature extraction, and non-linear feature interaction, is proposed with the embedding graph as the input to distinguish whether the chromothripsis event occurs. Ablation experiments, clustering, and feature importance analysis are also conducted to enable the proposed model to be explained by capturing mechanistic insights. AVAILABILITY AND IMPLEMENTATION The source code and data are freely available at https://github.com/luvyfdawnYu/CNV_chromothripsis.
Collapse
Affiliation(s)
- Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| | - Ming Gao
- School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian 116025, China
| | - Ning Liang
- University of Michigan, Ann Arbor, MI 48105, United States
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen 518057, China
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon, 999077, Hong Kong
| |
Collapse
|
12
|
Xun Z, Gao P, Du Y, Yan X, Yang J, Wang Z. Novel Intronic Mutations of the SLC12A3 Gene in Patients with Gitelman Syndrome. Int J Gen Med 2023; 16:1797-1806. [PMID: 37197138 PMCID: PMC10184854 DOI: 10.2147/ijgm.s408631] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 05/01/2023] [Indexed: 05/19/2023] Open
Abstract
Aim Mutations in the SLC12A3 gene have been reported to cause Gitelman syndrome (GS), characterized by hypokalemic metabolic alkalosis. The aim of this research is to investigate the genetic mutations and clinical features of patients with clinical suspicion of GS. Methods Six families were enrolled. The symptoms, clinical examination, laboratory results, genotypes, and effect of mutations on mRNA splicing were analyzed. Genomic DNA was screened for gene variations using whole exome sequence and Sanger sequencing. DNA sequences were compared with reference sequences. Results Genetic analysis revealed nine genetic variants of SLC12A3, including three novel heterozygous mutations (c.1096-2A>G, c.1862A>G, and c.2747+4del) and six previously characterized mutations (c.965-1_976delinsACCGAAAATTTT, c.506-1G>A, c.602-16G>A, c.533C >T, c.1456 G>A, and c.1108 G>C). Probands presented with the clinical syndrome of hypokalemia, increased plasma renin, hypocalciuria and hypokalemic alkalosis. Conclusion These clinical manifestations and genotypes were consistent with the diagnostic criteria of GS. The study described the phenotypes and genotypes of six pedigrees involving GS patients, demonstrating the importance of SLC12A3 gene screening for GS. This study expands the mutation spectrum of SLC12A3 gene in GS.
Collapse
Affiliation(s)
- Zeli Xun
- Department of Endocrinology, Xi’an Children’s Hospital, Shanxi, People’s Republic of China
| | - Pengfei Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai WeHealth Biomedical Technology Co, Ltd, Shanghai, People’s Republic of China
| | - Yanan Du
- Department of Endocrinology, Xi’an Children’s Hospital, Shanxi, People’s Republic of China
| | - Xue Yan
- Shanghai WeHealth Biomedical Technology Co, Ltd, Shanghai, People’s Republic of China
| | - Jingmin Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, People’s Republic of China
- Shanghai WeHealth Biomedical Technology Co, Ltd, Shanghai, People’s Republic of China
- Key Laboratory of Birth Defects and Reproductive Health of National Health and Family Planning Commission (Chongqing Key Laboratory of Birth Defects and Reproductive Health, Chongqing Population and Family Planning, Science and Technology Research Institute), Chongqing, People’s Republic of China
| | - Zhihua Wang
- Department of Endocrinology, Xi’an Children’s Hospital, Shanxi, People’s Republic of China
- Correspondence: Zhihua Wang, Xi’an Children’s Hospital, Shanxi, 710002, People’s Republic of China, Email
| |
Collapse
|
13
|
Zhou X, Feliciano P, Shu C, Wang T, Astrovskaya I, Hall JB, Obiajulu JU, Wright JR, Murali SC, Xu SX, Brueggeman L, Thomas TR, Marchenko O, Fleisch C, Barns SD, Snyder LG, Han B, Chang TS, Turner TN, Harvey WT, Nishida A, O'Roak BJ, Geschwind DH, Michaelson JJ, Volfovsky N, Eichler EE, Shen Y, Chung WK. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet 2022; 54:1305-1319. [PMID: 35982159 PMCID: PMC9470534 DOI: 10.1038/s41588-022-01148-2] [Citation(s) in RCA: 135] [Impact Index Per Article: 67.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 06/28/2022] [Indexed: 12/16/2022]
Abstract
To capture the full spectrum of genetic risk for autism, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 autism cases, including 35,130 new cases recruited online by SPARK. We identified 60 genes with exome-wide significance (P < 2.5 × 10-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1 and HNRNPUL2). The association of NAV3 with autism risk is primarily driven by rare inherited loss-of-function (LoF) variants, with an estimated relative risk of 4, consistent with moderate effect. Autistic individuals with LoF variants in the four moderate-risk genes (NAV3, ITSN1, SCAF1 and HNRNPUL2; n = 95) have less cognitive impairment than 129 autistic individuals with LoF variants in highly penetrant genes (CHD8, SCN2A, ADNP, FOXP1 and SHANK3) (59% vs 88%, P = 1.9 × 10-6). Power calculations suggest that much larger numbers of autism cases are needed to identify additional moderate-risk genes.
Collapse
Affiliation(s)
- Xueya Zhou
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Chang Shu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Department of Medical Genetics, Center for Medical Genetics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Neuroscience Research Institute, Department of Neurobiology, School of Basic Medical Sciences, Peking University Health Science Center; Key Laboratory for Neuroscience, Ministry of Education of China & National Health Commission of China, Beijing, China
| | | | | | - Joseph U Obiajulu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Taylor R Thomas
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | | | | | | | - Bing Han
- Simons Foundation, New York, NY, USA
| | - Timothy S Chang
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Tychele N Turner
- Department of Genetics, Washington University, St. Louis, MO, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew Nishida
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Brian J O'Roak
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA.,Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA. .,Simons Foundation, New York, NY, USA. .,Department of Medicine, Columbia University Medical Center, New York, NY, USA.
| |
Collapse
|
14
|
Kuśmirek W. Different Strategies for Counting the Depth of Coverage in Copy Number Variation Calling Tools. Bioinform Biol Insights 2022; 16:11779322221115534. [PMID: 35935530 PMCID: PMC9354125 DOI: 10.1177/11779322221115534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 07/02/2022] [Indexed: 12/04/2022] Open
Abstract
There are many copy number variation (CNV) detection tools based on the depth of coverage. A characteristic feature of all tools based on the depth of coverage is the first stage of data processing—counting the depth of coverage in the investigated sequencing regions. However, each tool implements this stage in a slightly different way. Herein, we used data from the 1000 Genomes Project to present the impact of another depth of coverage counting strategies on the results of the CNVs detection process. In the study, we used 7 CNV calling tools: CODEX, CANOES, exomeCopy, ExomeDepth, CLAMMS, CNVkit, and CNVind; from each of these applications, we separated the process of counting the depth of coverage into independent modules. Then, we counted the depth of coverage by mentioned modules, and finally, the obtained depth of coverage tables were used as the input data set to other CNV calling tools. The performed experiments showed that the best methods of counting the depth of coverage are the algorithms implemented in the CLAMMS and CNVkit applications. Both ways allow obtaining much better sets of detected CNVs compared to counting the depth of coverage implemented in other tools. What is more, some CNV detection tools are reasonably resistant to changing the input depth of coverage table. In this study, we proved that the exomeCopy application gives an approximately similar set of the resulting rare CNVs, regardless of the method of counting the depth of coverage table.
Collapse
Affiliation(s)
- Wiktor Kuśmirek
- Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
15
|
Deaton AM, Dubey A, Ward LD, Dornbos P, Flannick J, Yee E, Ticau S, Noetzli L, Parker MM, Hoffing RA, Willis C, Plekan ME, Holleman AM, Hinkle G, Fitzgerald K, Vaishnaw AK, Nioi P. Rare loss of function variants in the hepatokine gene INHBE protect from abdominal obesity. Nat Commun 2022; 13:4319. [PMID: 35896531 PMCID: PMC9329324 DOI: 10.1038/s41467-022-31757-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 07/01/2022] [Indexed: 02/07/2023] Open
Abstract
Identifying genetic variants associated with lower waist-to-hip ratio can reveal new therapeutic targets for abdominal obesity. We use exome sequences from 362,679 individuals to identify genes associated with waist-to-hip ratio adjusted for BMI (WHRadjBMI), a surrogate for abdominal fat that is causally linked to type 2 diabetes and coronary heart disease. Predicted loss of function (pLOF) variants in INHBE associate with lower WHRadjBMI and this association replicates in data from AMP-T2D-GENES. INHBE encodes a secreted protein, the hepatokine activin E. In vitro characterization of the most common INHBE pLOF variant in our study, indicates an in-frame deletion resulting in a 90% reduction in secreted protein levels. We detect associations with lower WHRadjBMI for variants in ACVR1C, encoding an activin receptor, further highlighting the involvement of activins in regulating fat distribution. These findings highlight activin E as a potential therapeutic target for abdominal obesity, a phenotype linked to cardiometabolic disease.
Collapse
Affiliation(s)
| | | | | | - Peter Dornbos
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Jason Flannick
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Elaine Yee
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | | | - Paul Nioi
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| |
Collapse
|
16
|
O'Fallon B, Durtschi J, Kellogg A, Lewis T, Close D, Best H. Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data. BMC Bioinformatics 2022; 23:285. [PMID: 35854218 PMCID: PMC9297596 DOI: 10.1186/s12859-022-04820-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 06/06/2022] [Indexed: 12/03/2022] Open
Abstract
Background Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information. Results We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome. Conclusions In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80–90% for deletion CNVs spanning 1–4 targets and 90–100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.
Collapse
Affiliation(s)
- Brendan O'Fallon
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA.
| | - Jacob Durtschi
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
| | - Ana Kellogg
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
| | - Tracey Lewis
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
| | - Devin Close
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
| | - Hunter Best
- ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
| |
Collapse
|
17
|
Gardner EJ, Neville MDC, Samocha KE, Barclay K, Kolk M, Niemi MEK, Kirov G, Martin HC, Hurles ME. Reduced reproductive success is associated with selective constraint on human genes. Nature 2022; 603:858-863. [PMID: 35322230 DOI: 10.1038/s41586-022-04549-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 02/07/2022] [Indexed: 12/22/2022]
Abstract
Genome-wide sequencing of human populations has revealed substantial variation among genes in the intensity of purifying selection acting on damaging genetic variants1. Although genes under the strongest selective constraint are highly enriched for associations with Mendelian disorders, most of these genes are not associated with disease and therefore the nature of the selection acting on them is not known2. Here we show that genetic variants that damage these genes are associated with markedly reduced reproductive success, primarily owing to increased childlessness, with a stronger effect in males than in females. We present evidence that increased childlessness is probably mediated by genetically associated cognitive and behavioural traits, which may mean that male carriers are less likely to find reproductive partners. This reduction in reproductive success may account for 20% of purifying selection against heterozygous variants that ablate protein-coding genes. Although this genetic association may only account for a very minor fraction of the overall likelihood of being childless (less than 1%), especially when compared to more influential sociodemographic factors, it may influence how genes evolve over time.
Collapse
Affiliation(s)
- Eugene J Gardner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK.,Medical Research Council (MRC) Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, UK
| | | | - Kaitlin E Samocha
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - Kieron Barclay
- Max Planck Institute for Demographic Research, Rostock, Germany.,Demography Unit, Department of Sociology, Stockholm University, Stockholm, Sweden.,Swedish Collegium for Advanced Study, Uppsala, Sweden
| | - Martin Kolk
- Demography Unit, Department of Sociology, Stockholm University, Stockholm, Sweden
| | - Mari E K Niemi
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - George Kirov
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK
| | - Matthew E Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, UK.
| |
Collapse
|
18
|
Gordeeva V, Sharova E, Babalyan K, Sultanov R, Govorun VM, Arapidi G. Benchmarking germline CNV calling tools from exome sequencing data. Sci Rep 2021; 11:14416. [PMID: 34257369 PMCID: PMC8277855 DOI: 10.1038/s41598-021-93878-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 06/29/2021] [Indexed: 02/06/2023] Open
Abstract
Whole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1-2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.
Collapse
Affiliation(s)
- Veronika Gordeeva
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia.
- Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia.
| | - Elena Sharova
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
| | - Konstantin Babalyan
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
| | - Rinat Sultanov
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
- Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of the Federal Medical and Biological Agency, Moscow, Russia
| | - Vadim M Govorun
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
- Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia
| | - Georgij Arapidi
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, Moscow, Russia
- Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of the Federal Medical and Biological Agency, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
19
|
Wright CF, Quaife NM, Ramos-Hernández L, Danecek P, Ferla MP, Samocha KE, Kaplanis J, Gardner EJ, Eberhardt RY, Chao KR, Karczewski KJ, Morales J, Gallone G, Balasubramanian M, Banka S, Gompertz L, Kerr B, Kirby A, Lynch SA, Morton JEV, Pinz H, Sansbury FH, Stewart H, Zuccarelli BD, Cook SA, Taylor JC, Juusola J, Retterer K, Firth HV, Hurles ME, Lara-Pezzi E, Barton PJR, Whiffin N. Non-coding region variants upstream of MEF2C cause severe developmental disorder through three distinct loss-of-function mechanisms. Am J Hum Genet 2021; 108:1083-1094. [PMID: 34022131 PMCID: PMC8206381 DOI: 10.1016/j.ajhg.2021.04.025] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 04/29/2021] [Indexed: 02/08/2023] Open
Abstract
Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.
Collapse
Affiliation(s)
- Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter EX2 5DW, UK
| | - Nicholas M Quaife
- National Heart & Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London W12 0NN, UK; Cardiovascular Research Centre, Royal Brompton & Harefield Hospitals NHS Trust, London SW3 6NP, UK
| | - Laura Ramos-Hernández
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Petr Danecek
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Matteo P Ferla
- National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Kaitlin E Samocha
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Joanna Kaplanis
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Eugene J Gardner
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Ruth Y Eberhardt
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Katherine R Chao
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joannella Morales
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
| | - Giuseppe Gallone
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Meena Balasubramanian
- Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield S10 2TH, UK; Academic Unit of Child Health, Department of Oncology & Metabolism, University of Sheffield, Sheffield S10 2TH, UK
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK; Division of Evolution and Genomic Sciences, School of Biological Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Lianne Gompertz
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University Hospitals NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Bronwyn Kerr
- Division of Evolution and Genomic Sciences, School of Biological Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Amelia Kirby
- Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA
| | - Sally A Lynch
- UCD Academic Centre on Rare Diseases, School of Medicine and Medical Sciences, University College Dublin, and Clinical Genetics, Temple Street Children's University Hospital, Dublin D01 XD99, Ireland
| | - Jenny E V Morton
- West Midlands Regional Clinical Genetics Service and Birmingham Health Partners, Birmingham Women's and Children's Hospitals NHS Foundation Trust, Birmingham B4 6NH, UK
| | - Hailey Pinz
- Department of Pediatrics, Saint Louis University School of Medicine, Saint Louis, MO 63104, USA
| | - Francis H Sansbury
- All Wales Medical Genomics Service, NHS Wales Cardiff and Vale University Health Board, Institute of Medical Genetics, University Hospital of Wales, Cardiff CF14 4AY, UK
| | - Helen Stewart
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford OX3 7LE, UK
| | - Britton D Zuccarelli
- Department of Neurology, University of Kansas School of Medicine-Salina Campus, Salina, KS 67401, USA
| | - Stuart A Cook
- National Heart & Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London W12 0NN, UK
| | - Jenny C Taylor
- National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | | | | | - Helen V Firth
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK; East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Matthew E Hurles
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK
| | - Enrique Lara-Pezzi
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain; CIBER de enfermedades CardioVasculares (CIBERCV), 28029 Madrid, Spain
| | - Paul J R Barton
- National Heart & Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London W12 0NN, UK; Cardiovascular Research Centre, Royal Brompton & Harefield Hospitals NHS Trust, London SW3 6NP, UK
| | - Nicola Whiffin
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1RQ, UK; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.
| |
Collapse
|
20
|
Bigio B, Seeleuthner Y, Kerner G, Migaud M, Rosain J, Boisson B, Nasca C, Puel A, Bustamante J, Casanova JL, Abel L, Cobat A. Detection of homozygous and hemizygous complete or partial exon deletions by whole-exome sequencing. NAR Genom Bioinform 2021; 3:lqab037. [PMID: 34046589 PMCID: PMC8140739 DOI: 10.1093/nargab/lqab037] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 03/19/2021] [Accepted: 05/03/2021] [Indexed: 12/11/2022] Open
Abstract
The detection of copy number variations (CNVs) in whole-exome sequencing (WES) data is important, as CNVs may underlie a number of human genetic disorders. The recently developed HMZDelFinder algorithm can detect rare homozygous and hemizygous (HMZ) deletions in WES data more effectively than other widely used tools. Here, we present HMZDelFinder_opt, an approach that outperforms HMZDelFinder for the detection of HMZ deletions, including partial exon deletions in particular, in WES data from laboratory patient collections that were generated over time in different experimental conditions. We show that using an optimized reference control set of WES data, based on a PCA-derived Euclidean distance for coverage, strongly improves the detection of HMZ complete exon deletions both in real patients carrying validated disease-causing deletions and in simulated data. Furthermore, we develop a sliding window approach enabling HMZDelFinder_opt to identify HMZ partial deletions of exons that are undiscovered by HMZDelFinder. HMZDelFinder_opt is a timely and powerful approach for detecting HMZ deletions, particularly partial exon deletions, in WES data from inherently heterogeneous laboratory patient collections.
Collapse
Affiliation(s)
- Benedetta Bigio
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Yoann Seeleuthner
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Gaspard Kerner
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Mélanie Migaud
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Jérémie Rosain
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Bertrand Boisson
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Carla Nasca
- Laboratory of Neuroendocrinology, The Rockefeller University, New York, NY 10065, USA
| | - Anne Puel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Jacinta Bustamante
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Laurent Abel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA
| | - Aurelie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Necker Hospital for Sick Children, 75015 Paris, France
| |
Collapse
|
21
|
Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH. Diagnostics (Basel) 2021; 11:diagnostics11040708. [PMID: 33920867 PMCID: PMC8071346 DOI: 10.3390/diagnostics11040708] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/09/2021] [Accepted: 04/13/2021] [Indexed: 12/13/2022] Open
Abstract
Copy number variations (CNVs) represent a type of structural variant involving alterations in the number of copies of specific regions of DNA that can either be deleted or duplicated. CNVs contribute substantially to normal population variability, however, abnormal CNVs cause numerous genetic disorders. At present, several methods for CNV detection are applied, ranging from the conventional cytogenetic analysis, through microarray-based methods (aCGH), to next-generation sequencing (NGS). In this paper, we present GenomeScreen, an NGS-based CNV detection method for low-coverage, whole-genome sequencing. We determined the theoretical limits of its accuracy and obtained confirmation in an extensive in silico study and in real patient samples with known genotypes. In theory, at least 6 M uniquely mapped reads are required to detect a CNV with the length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in silico analysis required at least 8 M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has mean resolution of 200 kb. GenomeScreen and aCGH both detected 59 deviations, while GenomeScreen furthermore detected 134 other (usually) smaller variations. When compared to aCGH, overall performance of the proposed GenemoScreen tool is comparable or superior in terms of accuracy, turn-around time, and cost-effectiveness, thus providing reasonable benefits, particularly in a prenatal diagnosis setting.
Collapse
|
22
|
Cristiano S, McKean D, Carey J, Bracci P, Brennan P, Chou M, Du M, Gallinger S, Goggins MG, Hassan MM, Hung RJ, Kurtz RC, Li D, Lu L, Neale R, Olson S, Petersen G, Rabe KG, Fu J, Risch H, Rosner GL, Ruczinski I, Klein AP, Scharpf RB. Bayesian copy number detection and association in large-scale studies. BMC Cancer 2020; 20:856. [PMID: 32894098 PMCID: PMC7487704 DOI: 10.1186/s12885-020-07304-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 08/17/2020] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. METHODS We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. RESULTS Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). CONCLUSIONS Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.
Collapse
Affiliation(s)
- Stephen Cristiano
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - David McKean
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jacob Carey
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Paige Bracci
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Paul Brennan
- Genetics Section, International Agency for Research on Cancer, Lyon, France
| | - Michael Chou
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Mengmeng Du
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, M5G 1x5, Ontario, Canada
| | - Michael G Goggins
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Manal M Hassan
- Department of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Rayjean J Hung
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, M5G 1x5, Ontario, Canada
| | - Robert C Kurtz
- Department of Gastroenterology, Hepatology, and Nutrition Service, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Lingeng Lu
- Department of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer Center, New Haven, CT, USA
| | - Rachel Neale
- Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, 4029, Australia
| | - Sara Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, 55905, MN, USA
| | - Kari G Rabe
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, 55905, MN, USA
| | - Jack Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Harvey Risch
- Department of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer Center, New Haven, CT, USA
| | - Gary L Rosner
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Alison P Klein
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Robert B Scharpf
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
23
|
Rajagopalan R, Murrell JR, Luo M, Conlin LK. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data. Genome Med 2020; 12:14. [PMID: 32000839 PMCID: PMC6993336 DOI: 10.1186/s13073-020-0712-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 01/13/2020] [Indexed: 12/12/2022] Open
Abstract
Background Exome sequencing (ES) is a first-tier diagnostic test for many suspected Mendelian disorders. While it is routine to detect small sequence variants, it is not a standard practice in clinical settings to detect germline copy-number variants (CNVs) from ES data due to several reasons relating to performance. In this work, we comprehensively characterized one of the most sensitive ES-based CNV tools, ExomeDepth, against SNP array, a standard of care test in clinical settings to detect genome-wide CNVs. Methods We propose a modified ExomeDepth workflow by excluding exons with low mappability prior to variant calling to drastically reduce the false positives originating from the repetitive regions of the genome, and an iterative variant calling framework to assess the reproducibility. We used a cohort of 307 individuals with clinical ES data and clinical SNP array to estimate the sensitivity and false discovery rate of the CNV detection using exome sequencing. Further, we performed targeted testing of the STRC gene in 1972 individuals. To reduce the number of variants for downstream analysis, we performed a large-scale iterative variant calling process with random control cohorts to assess the reproducibility of the CNVs. Results The modified workflow presented in this paper reduced the number of total variants identified by one third while retaining a higher sensitivity of 97% and resulted in an improved false discovery rate of 11.4% compared to the default ExomeDepth pipeline. The exclusion of exons with low mappability removes 4.5% of the exons, including a subset of exons (0.6%) in disease-associated genes which are intractable by short-read next-generation sequencing (NGS). Results from the reproducibility analysis showed that the clinically reported variants were reproducible 100% of the time and that the modified workflow can be used to rank variants from high to low confidence. Targeted testing of 30 CNVs identified in STRC, a challenging gene to ascertain by NGS, showed a 100% validation rate. Conclusions In summary, we introduced a modification to the default ExomeDepth workflow to reduce the false positives originating from the repetitive regions of the genome, created a large-scale iterative variant calling framework for reproducibility, and provided recommendations for implementation in clinical settings.
Collapse
Affiliation(s)
- Ramakrishnan Rajagopalan
- Division of Genomic Diagnostics, Department of Pathology and Laboaratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Jill R Murrell
- Division of Genomic Diagnostics, Department of Pathology and Laboaratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Minjie Luo
- Division of Genomic Diagnostics, Department of Pathology and Laboaratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Laura K Conlin
- Division of Genomic Diagnostics, Department of Pathology and Laboaratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
24
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|
25
|
Oetjens MT, Kelly MA, Sturm AC, Martin CL, Ledbetter DH. Quantifying the polygenic contribution to variable expressivity in eleven rare genetic disorders. Nat Commun 2019; 10:4897. [PMID: 31653860 PMCID: PMC6814771 DOI: 10.1038/s41467-019-12869-0] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 10/03/2019] [Indexed: 12/02/2022] Open
Abstract
Rare genetic disorders (RGDs) often exhibit significant clinical variability among affected individuals, a disease characteristic termed variable expressivity. Recently, the aggregate effect of common variation, quantified as polygenic scores (PGSs), has emerged as an effective tool for predictions of disease risk and trait variation in the general population. Here, we measure the effect of PGSs on 11 RGDs including four sex-chromosome aneuploidies (47,XXX; 47,XXY; 47,XYY; 45,X) that affect height; two copy-number variant (CNV) disorders (16p11.2 deletions and duplications) and a Mendelian disease (melanocortin 4 receptor deficiency (MC4R)) that affect BMI; and two Mendelian diseases affecting cholesterol: familial hypercholesterolemia (FH; LDLR and APOB) and familial hypobetalipoproteinemia (FHBL; PCSK9 and APOB). Our results demonstrate that common, polygenic factors of relevant complex traits frequently contribute to variable expressivity of RGDs and that PGSs may be a useful metric for predicting clinical severity in affected individuals and for risk stratification.
Collapse
MESH Headings
- Apolipoproteins B/genetics
- Autistic Disorder/genetics
- Body Height/genetics
- Body Mass Index
- Cholesterol, LDL/blood
- Cholesterol, LDL/genetics
- Chromosome Deletion
- Chromosome Disorders/genetics
- Chromosome Duplication/genetics
- Chromosomes, Human, Pair 16/genetics
- Chromosomes, Human, X/genetics
- Female
- Humans
- Hyperlipoproteinemia Type II/genetics
- Hypobetalipoproteinemias/genetics
- Intellectual Disability/genetics
- Klinefelter Syndrome/genetics
- Male
- Middle Aged
- Multifactorial Inheritance
- Obesity/genetics
- Proprotein Convertase 9/genetics
- Rare Diseases/genetics
- Receptor, Melanocortin, Type 4/deficiency
- Receptor, Melanocortin, Type 4/genetics
- Receptors, LDL/genetics
- Sex Chromosome Aberrations
- Sex Chromosome Disorders of Sex Development/genetics
- Trisomy/genetics
- Turner Syndrome/genetics
- XYY Karyotype/genetics
Collapse
Affiliation(s)
| | - M A Kelly
- Geisinger Health System, Danville, PA, USA
| | - A C Sturm
- Geisinger Health System, Danville, PA, USA
| | - C L Martin
- Geisinger Health System, Danville, PA, USA
| | | |
Collapse
|
26
|
Linderman MD, Chia D, Wallace F, Nothaft FA. DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark. BMC Bioinformatics 2019; 20:493. [PMID: 31604420 PMCID: PMC6787990 DOI: 10.1186/s12859-019-3108-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/20/2019] [Indexed: 11/16/2022] Open
Abstract
Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM algorithm using the ADAM framework and Apache Spark that incorporates novel algorithmic optimizations to eliminate unneeded computation. DECA parallelizes XHMM on both multi-core shared memory computers and large shared-nothing Spark clusters. We performed CNV discovery from the read-depth matrix in 2535 exomes in 9.3 min on a 16-core workstation (35.3× speedup vs. XHMM), 12.7 min using 10 executor cores on a Spark cluster (18.8× speedup vs. XHMM), and 9.8 min using 32 executor cores on Amazon AWS’ Elastic MapReduce. We performed CNV discovery from the original BAM files in 292 min using 640 executor cores on a Spark cluster. Conclusions We describe DECA’s performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark’s configuration parameters.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA.
| | - Davin Chia
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA
| | - Forrest Wallace
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA
| | - Frank A Nothaft
- AMPLab, University of California, Berkeley, Berkeley, CA, USA.,Databricks, Inc., San Francisco, CA, USA
| |
Collapse
|
27
|
Zhang Y, Zafar W, Hartzel DN, Williams MS, Tin A, Chang AR, Lee MTM. GSTM1 Copy Number Is Not Associated With Risk of Kidney Failure in a Large Cohort. Front Genet 2019; 10:765. [PMID: 31555322 PMCID: PMC6728412 DOI: 10.3389/fgene.2019.00765] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 07/19/2019] [Indexed: 12/15/2022] Open
Abstract
Deletion of glutathione S-transferase µ1 (GSTM1) is common in populations and has been asserted to associate with chronic kidney disease progression in some research studies. The association needs to be validated. We estimated GSTM1 copy number using whole exome sequencing data in the DiscovEHR cohort. Kidney failure was defined as requiring dialysis or receiving kidney transplant using data from the electronic health record and linkage to the United States Renal Data System, or the most recent eGFR < 15 ml/min/1.73 m2. In a cohort of 46,983 unrelated participants, 28.8% of blacks and 52.1% of whites had 0 copies of GSTM1. Over a mean of 9.2 years follow-up, 645 kidney failure events were observed in 46,187 white participants, and 28 in 796 black participants. No significant association was observed between GSTM1 copy number and kidney failure in Cox regression adjusting for age, sex, BMI, smoking status, genetic principal components, or comorbid conditions (hypertension, diabetes, heart failure, coronary artery disease, and stroke), whether using a genotypic, dominant, or recessive model. In sensitivity analyses, GSTM1 copy number was not associated with kidney failure in participants that were 45 years or older at baseline, had baseline eGFR < 60 ml/min/1.73 m2, or with baseline year between 1996 and 2002. In conclusion, we found no association between GSTM1 copy number and kidney failure in a large cohort study.
Collapse
Affiliation(s)
- Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, PA, United States
| | - Waleed Zafar
- Kidney Institute, Geisinger, Danville, PA, United States
| | - Dustin N Hartzel
- Phenomic Analytics & Clinical Data Core, Geisinger, Danville, PA, United States
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger, Danville, PA, United States
| | - Adrienne Tin
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
| | - Alex R Chang
- Kidney Institute, Geisinger, Danville, PA, United States
| | | |
Collapse
|
28
|
Feliciano P, Zhou X, Astrovskaya I, Turner TN, Wang T, Brueggeman L, Barnard R, Hsieh A, Snyder LG, Muzny DM, Sabo A, Gibbs RA, Eichler EE, O’Roak BJ, Michaelson JJ, Volfovsky N, Shen Y, Chung WK. Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. NPJ Genom Med 2019; 4:19. [PMID: 31452935 PMCID: PMC6707204 DOI: 10.1038/s41525-019-0093-8] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/11/2019] [Indexed: 12/30/2022] Open
Abstract
Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. We conducted a pilot study for SPARK (SPARKForAutism.org) of 457 families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. We identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings. In addition, we identified variants that are possibly associated with ASD in an additional 3.4% of families. A meta-analysis using the TADA framework at a false discovery rate (FDR) of 0.1 provides statistical support for 26 ASD risk genes. While most of these genes are already known ASD risk genes, BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD (p-value = 2.3e-06). Future studies leveraging the thousands of individuals with ASD who have enrolled in SPARK are likely to further clarify the genetic risk factors associated with ASD as well as allow accelerate ASD research that incorporates genetic etiology.
Collapse
Affiliation(s)
| | - Xueya Zhou
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Tychele N. Turner
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | - Rebecca Barnard
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Alexander Hsieh
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195 USA
| | - Brian J. O’Roak
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | | | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | - Wendy K. Chung
- Simons Foundation, New York, NY 10010 USA
- Department of Pediatrics, Columbia University Medical Center, New York, NY 10032 USA
| |
Collapse
|
29
|
Pounraja VK, Jayakar G, Jensen M, Kelkar N, Girirajan S. A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res 2019; 29:1134-1143. [PMID: 31171634 PMCID: PMC6633262 DOI: 10.1101/gr.245928.118] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 06/04/2019] [Indexed: 11/25/2022]
Abstract
Copy number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome-sequencing data are limited by high false-positive rates and low concordance because of inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn diagram approaches to identify "high-confidence" CNVs. However, this approach is inadequate, because it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM, and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (∼90%) and recall (∼85%) rates while maintaining robust performance even when trained with minimal data (∼30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram-based approaches, with features such as exome capture probe count, caller concordance, and GC content providing the most discriminatory power. In fact, ∼58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.
Collapse
Affiliation(s)
- Vijay Kumar Pounraja
- Bioinformatics and Genomics Graduate Program of the Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Gopal Jayakar
- The Schreyer Honors College, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Matthew Jensen
- Bioinformatics and Genomics Graduate Program of the Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Neil Kelkar
- The Schreyer Honors College, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Santhosh Girirajan
- Bioinformatics and Genomics Graduate Program of the Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
30
|
Kuśmirek W, Szmurło A, Wiewiórka M, Nowak R, Gambin T. Comparison of kNN and k-means optimization methods of reference set selection for improved CNV callers performance. BMC Bioinformatics 2019; 20:266. [PMID: 31138108 PMCID: PMC6537193 DOI: 10.1186/s12859-019-2889-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2018] [Accepted: 05/09/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are over 25 tools dedicated for the detection of Copy Number Variants (CNVs) using Whole Exome Sequencing (WES) data based on read depth analysis. The tools reported consist of several steps, including: (i) calculation of read depth for each sequencing target, (ii) normalization, (iii) segmentation and (iv) actual CNV calling. The essential aspect of the entire process is the normalization stage, in which systematic errors and biases are removed and the reference sample set is used to increase the signal-to-noise ratio. Although some CNV calling tools use dedicated algorithms to obtain the optimal reference sample set, most of the advanced CNV callers do not include this feature. To our knowledge, this work is the first attempt to assess the impact of reference sample set selection on CNV detection performance. METHODS We used WES data from the 1000 Genomes project to evaluate the impact of various methods of reference sample set selection on CNV calling performance of three chosen state-of-the-art tools: CODEX, CNVkit and exomeCopy. Two naive solutions (all samples as reference set and random selection) as well as two clustering methods (k-means and k nearest neighbours (kNN) with a variable number of clusters or group sizes) have been evaluated to discover the best performing sample selection method. RESULTS AND CONCLUSIONS The performed experiments have shown that the appropriate selection of the reference sample set may greatly improve the CNV detection rate. In particular, we found that smart reduction of reference sample size may significantly increase the algorithms' precision while having negligible negative effect on sensitivity. We observed that a complete CNV calling process with the k-means algorithm as the selection method has significantly better time complexity than kNN-based solution.
Collapse
Affiliation(s)
- Wiktor Kuśmirek
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
| | - Agnieszka Szmurło
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
| | - Marek Wiewiórka
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
| | - Robert Nowak
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warsaw, 00-665, Poland
| | - Tomasz Gambin
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warsaw, 00-665, Poland.
| |
Collapse
|
31
|
Sadler B, Haller G, Antunes L, Bledsoe X, Morcuende J, Giampietro P, Raggio C, Miller N, Kidane Y, Wise CA, Amarillo I, Walton N, Seeley M, Johnson D, Jenkins C, Jenkins T, Oetjens M, Tong RS, Druley TE, Dobbs MB, Gurnett CA. Distal chromosome 16p11.2 duplications containing SH2B1 in patients with scoliosis. J Med Genet 2019; 56:427-433. [PMID: 30803986 DOI: 10.1136/jmedgenet-2018-105877] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 01/18/2019] [Accepted: 01/25/2019] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Adolescent idiopathic scoliosis (AIS) is a common musculoskeletal disorder with strong evidence for a genetic contribution. CNVs play an important role in congenital scoliosis, but their role in idiopathic scoliosis has been largely unexplored. METHODS Exome sequence data from 1197 AIS cases and 1664 in-house controls was analysed using coverage data to identify rare CNVs. CNV calls were filtered to include only highly confident CNVs with >10 average reads per region and mean log-ratio of coverage consistent with single-copy duplication or deletion. The frequency of 55 common recurrent CNVs was determined and correlated with clinical characteristics. RESULTS Distal chromosome 16p11.2 microduplications containing the gene SH2B1 were found in 0.7% of AIS cases (8/1197). We replicated this finding in two additional AIS cohorts (8/1097 and 2/433), resulting in 0.7% (18/2727) of all AIS cases harbouring a chromosome 16p11.2 microduplication, compared with 0.06% of local controls (1/1664) and 0.04% of published controls (8/19584) (p=2.28×10-11, OR=16.15). Furthermore, examination of electronic health records of 92 455 patients from the Geisinger health system showed scoliosis in 30% (20/66) patients with chromosome 16p11.2 microduplications containing SH2B1 compared with 7.6% (10/132) of controls (p=5.6×10-4, OR=3.9). CONCLUSIONS Recurrent distal chromosome 16p11.2 duplications explain nearly 1% of AIS. Distal chromosome 16p11.2 duplications may contribute to scoliosis pathogenesis by directly impairing growth or by altering expression of nearby genes, such as TBX6. Individuals with distal chromosome 16p11.2 microduplications should be screened for scoliosis to facilitate early treatment.
Collapse
Affiliation(s)
- Brooke Sadler
- Department of Neurology, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Gabe Haller
- Department of Orthopedic Surgery, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Lilian Antunes
- Department of Neurology, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Xavier Bledsoe
- Department of Neurology, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Jose Morcuende
- Department of Orthopaedic Surgery and Rehabilitation, University of Iowa Roy J and Lucille A Carver College of Medicine, Iowa City, Iowa, USA
| | - Philip Giampietro
- Department of Genetics, St. Christopher's Hospital for Children, Philadelphia, Pennsylvania, USA
| | - Cathleen Raggio
- Orthopedic Surgery, Pediatrics, Hospital for Special Surgery, New York City, New York, USA
| | - Nancy Miller
- Department of Orthopedics, University of Colorado at Denver - Anschutz Medical Campus, Aurora, Colorado, USA
| | - Yared Kidane
- Sarah M. and Charles E. Seay Center for Musculoskeletal Research, Texas Scottish Rite Hospital for Children, Dallas, Texas, USA
| | - Carol A Wise
- Sarah M. and Charles E. Seay Center for Musculoskeletal Research, Texas Scottish Rite Hospital for Children, Dallas, Texas, USA
| | - Ina Amarillo
- Department of Pathology and Immunology, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Nephi Walton
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - Mark Seeley
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - Darren Johnson
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - Conner Jenkins
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - Troy Jenkins
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - Matthew Oetjens
- Genomic Medicine, Geisinger Health System, Danville, Pennsylvania, USA
| | - R Spencer Tong
- Department of Pediatrics, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Todd E Druley
- Department of Pediatrics, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Matthew B Dobbs
- Department of Orthopedic Surgery, Washington University in Saint Louis School of Medicine, St. Louis, Missouri, USA
| | - Christina A Gurnett
- Department of Neurology, Division of Pediatric Neurology, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
32
|
Fu JM, Leslie EJ, Scott AF, Murray JC, Marazita ML, Beaty TH, Scharpf RB, Ruczinski I. Detection of de novo copy number deletions from targeted sequencing of trios. Bioinformatics 2019; 35:571-578. [PMID: 30084993 PMCID: PMC6378941 DOI: 10.1093/bioinformatics/bty677] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 07/25/2018] [Accepted: 08/01/2018] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION De novo copy number deletions have been implicated in many diseases, but there is no formal method to date that identifies de novo deletions in parent-offspring trios from capture-based sequencing platforms. RESULTS We developed Minimum Distance for Targeted Sequencing (MDTS) to fill this void. MDTS has similar sensitivity (recall), but a much lower false positive rate compared to less specific CNV callers, resulting in a much higher positive predictive value (precision). MDTS also exhibited much better scalability. AVAILABILITY AND IMPLEMENTATION MDTS is freely available as open source software from the Bioconductor repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack M Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Alan F Scott
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Jeffrey C Murray
- Department of Pediatrics, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Mary L Marazita
- Department of Oral Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Robert B Scharpf
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
33
|
Jiang Y, Wang R, Urrutia E, Anastopoulos IN, Nathanson KL, Zhang NR. CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing. Genome Biol 2018; 19:202. [PMID: 30477554 PMCID: PMC6260772 DOI: 10.1186/s13059-018-1578-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 11/02/2018] [Indexed: 12/04/2022] Open
Abstract
High-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, as a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.
Collapse
Affiliation(s)
- Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27599, USA.
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC, 27599, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Eugene Urrutia
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Ioannis N Anastopoulos
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Katherine L Nathanson
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
34
|
Spinal muscular atrophy within Amish and Mennonite populations: Ancestral haplotypes and natural history. PLoS One 2018; 13:e0202104. [PMID: 30188899 PMCID: PMC6126807 DOI: 10.1371/journal.pone.0202104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 07/29/2018] [Indexed: 12/27/2022] Open
Abstract
We correlate chromosome 5 haplotypes and SMN2 copy number with disease expression in 42 Mennonite and 14 Amish patients with spinal muscular atrophy (SMA). A single haplotype (A1) with 1 copy of SMN2 segregated among all Amish patients. SMN1 deletions segregated on four different Mennonite haplotypes that carried 1 (M1a, M1b, M1c) or 2 (M2) copies of SMN2. DNA microsatellite and microarray data revealed structural similarities among A1, M1a, M1b, and M2. Clinical data were parsed according to both SMN1 genotype and SMN2 copy number (2 copies, n = 44; 3 copies, n = 9; or 4 copies, n = 3). No infant with 2 copies of SMN2 sat unassisted. In contrast, all 9 Mennonites with the M1a/M2 genotype (3 copies of SMN2) sat during infancy at a median age of 7 months, and 5 (56%) walked and dressed independently at median ages of 18 and 36 months, respectively. All are alive at a median age of 11 (range 2–31) years without ventilatory support. Among 13 Amish and 26 Mennonite patients with 2 copies of SMN2 who did not receive feeding or ventilatory support, A1/A1 as compared to M1a/M1a genotype was associated with earlier clinical onset (p = 0.0040) and shorter lifespan (median survival 3.9 versus 5.7 months, p = 0.0314). These phenotypic differences were not explained by variation in SMN1 deletion size or SMN2 coding sequence, which were conserved across haplotypes. Distinctive features of SMA within Plain communities provide a population-specific framework to study variations of disease expression and the impact of disease-modifying therapies administered early in life.
Collapse
|
35
|
Zhu N, Gonzaga-Jauregui C, Welch C, Ma L, Qi H, King AK, Krishnan U, Rosenzweig EB, Ivy DD, Austin ED, Hamid R, Nichols WC, Pauciulo MW, Lutz KA, Sawle A, Reid JG, Overton JD, Baras A, Dewey F, Shen Y, Chung WK. Exome Sequencing in Children With Pulmonary Arterial Hypertension Demonstrates Differences Compared With Adults. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2018; 11:e001887. [PMID: 29631995 PMCID: PMC5896781 DOI: 10.1161/circgen.117.001887] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 01/31/2018] [Indexed: 12/30/2022]
Abstract
BACKGROUND Pulmonary arterial hypertension (PAH) is a rare disease characterized by pulmonary arteriole remodeling, elevated arterial pressure and resistance, and subsequent heart failure. Compared with adult-onset disease, pediatric-onset PAH is more heterogeneous and often associated with worse prognosis. Although BMPR2 mutations underlie ≈70% of adult familial PAH (FPAH) cases, the genetic basis of PAH in children is less understood. METHODS We performed genetic analysis of 155 pediatric- and 257 adult-onset PAH patients, including both FPAH and sporadic, idiopathic PAH (IPAH). After screening for 2 common PAH risk genes, mutation-negative FPAH and all IPAH cases were evaluated by exome sequencing. RESULTS We observed similar frequencies of rare, deleterious BMPR2 mutations in pediatric- and adult-onset patients: ≈55% in FPAH and 10% in IPAH patients in both age groups. However, there was significant enrichment of TBX4 mutations in pediatric- compared with adult-onset patients (IPAH: 10/130 pediatric versus 0/178 adult-onset), and TBX4 carriers had younger mean age-of-onset compared with BMPR2 carriers. Mutations in other known PAH risk genes were infrequent in both age groups. Notably, among pediatric IPAH patients without mutations in known risk genes, exome sequencing revealed a 2-fold enrichment of de novo likely gene-damaging and predicted deleterious missense variants. CONCLUSIONS Mutations in known PAH risk genes accounted for ≈70% to 80% of FPAH in both age groups, 21% of pediatric-onset IPAH, and 11% of adult-onset IPAH. Rare, predicted deleterious variants in TBX4 are enriched in pediatric patients and de novo variants in novel genes may explain ≈19% of pediatric-onset IPAH cases.
Collapse
Affiliation(s)
- Na Zhu
- Department of Pediatrics, Columbia University Medical Center, New York
- Department of Systems Biology, Columbia University, New York, NY
| | | | - Carrie Welch
- Department of Pediatrics, Columbia University Medical Center, New York
| | - Lijiang Ma
- Department of Pediatrics, Columbia University Medical Center, New York
| | - Hongjian Qi
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY
- Department of Systems Biology, Columbia University, New York, NY
| | | | - Usha Krishnan
- Department of Pediatrics, Columbia University Medical Center, New York
| | - Erika B. Rosenzweig
- Department of Pediatrics, Columbia University Medical Center, New York
- Department of Medicine, Columbia University Medical Center, New York
| | - D. Dunbar Ivy
- Children’s Hospital Colorado, Department of Pediatric Cardiology, Denver, CO
| | - Eric D. Austin
- Department of Pediatrics, Vanderbilt University School of Medicine, Nashville, TN
| | - Rizwan Hamid
- Department of Pediatrics, Vanderbilt University School of Medicine, Nashville, TN
| | - William C. Nichols
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center & Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Michael W. Pauciulo
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center & Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Katie A. Lutz
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center & Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Ashley Sawle
- Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York
| | - Jeffrey G. Reid
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown
| | - John D. Overton
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown
| | - Aris Baras
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown
| | - Frederick Dewey
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University Medical Center, New York
- Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York
- Department of Medicine, Columbia University Medical Center, New York
| |
Collapse
|
36
|
WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data. Eur J Hum Genet 2017; 25:1354-1363. [PMID: 29255179 PMCID: PMC5865163 DOI: 10.1038/s41431-017-0005-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 07/01/2017] [Accepted: 08/01/2017] [Indexed: 01/21/2023] Open
Abstract
In clinical genetics, detection of single nucleotide polymorphisms (SNVs) as well as copy number variations (CNVs) is essential for patient genotyping. Obtaining both CNV and SNV information from WES data would significantly simplify clinical workflow. Unfortunately, the sequence reads obtained with WES vary between samples, complicating accurate CNV detection with WES. To avoid being dependent on other samples, we developed a within-sample comparison approach (WISExome). For every (WES) target region on the genome, we identified a set of reference target regions elsewhere on the genome with similar read frequency behavior. For a new sample, aberrations are detected by comparing the read frequency of a target region with the distribution of read frequencies in the reference set. WISExome correctly identifies known pathogenic CNVs (range 4 Kb–5.2 Mb). Moreover, WISExome prioritizes pathogenic CNVs by sorting them on quality and annotations of overlapping genes in OMIM. When comparing WISExome to four existing CNV detection tools, we found that CoNIFER detects much fewer CNVs and XHMM breaks calls made by other tools into smaller calls (fragmentation). CODEX and CLAMMS seem to perform more similar to WISExome. CODEX finds all known pathogenic CNVs, but detects much more calls than all other methods. CLAMMS and WISExome agree the most. CLAMMS does, however, miss one of the known CNVs and shows slightly more fragmentation. Taken together, WISExome is a promising tool for genome diagnostics laboratories as the workflow can be solely based on WES data.
Collapse
|
37
|
Gambin T, Akdemir ZC, Yuan B, Gu S, Chiang T, Carvalho CMB, Shaw C, Jhangiani S, Boone PM, Eldomery MK, Karaca E, Bayram Y, Stray-Pedersen A, Muzny D, Charng WL, Bahrambeigi V, Belmont JW, Boerwinkle E, Beaudet AL, Gibbs RA, Lupski JR. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort. Nucleic Acids Res 2017; 45:1633-1648. [PMID: 27980096 PMCID: PMC5389578 DOI: 10.1093/nar/gkw1237] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/29/2016] [Indexed: 11/14/2022] Open
Abstract
We developed an algorithm, HMZDelFinder, that uses whole exome sequencing (WES) data to identify rare and intragenic homozygous and hemizygous (HMZ) deletions that may represent complete loss-of-function of the indicated gene. HMZDelFinder was applied to 4866 samples in the Baylor–Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% (82% for single-exonic and 88% for multi-exonic calls) and precision of 78% (53% single-exonic and 96% for multi-exonic calls). Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Further investigation of the 64 validated deletion calls revealed at least 15 pathogenic HMZ deletions. Of those, 7 accounted for 17–50% of pathogenic CNVs in different disease cohorts where 7.1–11% of the molecular diagnosis solved rate was attributed to CNVs. In summary, we present an algorithm to detect rare, intragenic, single-exon deletion CNVs using WES data; this tool can be useful for disease gene discovery efforts and clinical WES analyses.
Collapse
Affiliation(s)
- Tomasz Gambin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Institute of Computer Science, Warsaw University of Technology, Warsaw, 00-665 Warsaw, Poland
| | - Zeynep C Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shen Gu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Theodore Chiang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Chad Shaw
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini Jhangiani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Philip M Boone
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mohammad K Eldomery
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ender Karaca
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yavuz Bayram
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Asbjørg Stray-Pedersen
- Norwegian National Unit for Newborn Screening, Division for Pediatric and Adolescent Medicine, Oslo University Hospital, N-0424 Oslo, Norway
| | - Donna Muzny
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Wu-Lin Charng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Vahid Bahrambeigi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Graduate Program in Diagnostic Genetics, School of Health Professions, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - John W Belmont
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Arthur L Beaudet
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.,Texas Children's Hospital, Houston, TX 77030, USA
| |
Collapse
|
38
|
Tom JA, Reeder J, Forrest WF, Graham RR, Hunkapiller J, Behrens TW, Bhangale TR. Identifying and mitigating batch effects in whole genome sequencing data. BMC Bioinformatics 2017; 18:351. [PMID: 28738841 PMCID: PMC5525370 DOI: 10.1186/s12859-017-1756-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 07/12/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. RESULTS We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. CONCLUSIONS Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.
Collapse
Affiliation(s)
- Jennifer A Tom
- Bioinformatics and Computational Biology Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.
| | - Jens Reeder
- Bioinformatics and Computational Biology Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - William F Forrest
- Bioinformatics and Computational Biology Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Robert R Graham
- Human Genetics Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Julie Hunkapiller
- Human Genetics Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Timothy W Behrens
- Human Genetics Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Tushar R Bhangale
- Bioinformatics and Computational Biology Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.,Human Genetics Department, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| |
Collapse
|
39
|
Fu J, Beaty TH, Scott AF, Hetmanski J, Parker MM, Wilson JEB, Marazita ML, Mangold E, Albacha-Hejazi H, Murray JC, Bureau A, Carey J, Cristiano S, Ruczinski I, Scharpf RB. Whole exome association of rare deletions in multiplex oral cleft families. Genet Epidemiol 2017; 41:61-69. [PMID: 27910131 PMCID: PMC5154821 DOI: 10.1002/gepi.22010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 09/21/2016] [Accepted: 09/21/2016] [Indexed: 11/11/2022]
Abstract
By sequencing the exomes of distantly related individuals in multiplex families, rare mutational and structural changes to coding DNA can be characterized and their relationship to disease risk can be assessed. Recently, several rare single nucleotide variants (SNVs) were associated with an increased risk of nonsyndromic oral cleft, highlighting the importance of rare sequence variants in oral clefts and illustrating the strength of family-based study designs. However, the extent to which rare deletions in coding regions of the genome occur and contribute to risk of nonsyndromic clefts is not well understood. To identify putative structural variants underlying risk, we developed a pipeline for rare hemizygous deletions in families from whole exome sequencing and statistical inference based on rare variant sharing. Among 56 multiplex families with 115 individuals, we identified 53 regions with one or more rare hemizygous deletions. We found 45 of the 53 regions contained rare deletions occurring in only one family member. Members of the same family shared a rare deletion in only eight regions. We also devised a scalable global test for enrichment of shared rare deletions.
Collapse
Affiliation(s)
- Jack Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Terri H. Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Alan F. Scott
- Center for Inherited Disease Research and Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore MD, USA
| | - Jacqueline Hetmanski
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Margaret M. Parker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston MA, USA
| | - Joan E. Bailey Wilson
- Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore MD, USA
| | - Mary L. Marazita
- Department of Oral Biology, Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, PA, USA
| | | | | | - Jeffrey C. Murray
- Department of Pediatrics, School of Medicine, University of Iowa, IA, USA
| | - Alexandre Bureau
- Centre de Recherche de l’Institut Universitaire en Santé Mentale de Québec and Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada
| | - Jacob Carey
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Stephen Cristiano
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Robert B. Scharpf
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore MD, USA
| |
Collapse
|
40
|
Abul-Husn NS, Manickam K, Jones LK, Wright EA, Hartzel DN, Gonzaga-Jauregui C, O’Dushlaine C, Leader JB, Lester Kirchner H, Lindbuchler DM, Barr ML, Giovanni MA, Ritchie MD, Overton JD, Reid JG, Metpally RPR, Wardeh AH, Borecki IB, Yancopoulos GD, Baras A, Shuldiner AR, Gottesman O, Ledbetter DH, Carey DJ, Dewey FE, Murray MF. Genetic identification of familial hypercholesterolemia within a single U.S. health care system. Science 2016; 354:354/6319/aaf7000. [DOI: 10.1126/science.aaf7000] [Citation(s) in RCA: 264] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 11/16/2016] [Indexed: 12/12/2022]
|