1
|
Zanardo ÉA, Monteiro FP, Chehimi SN, Oliveira YG, Dias AT, Costa LA, Ramos LL, Novo-Filho GM, Montenegro MM, Nascimento AM, Kitajima JP, Kok F, Kulikowski LD. Application of Whole-Exome Sequencing in Detecting Copy Number Variants in Patients with Developmental Delay and/or Multiple Congenital Malformations. J Mol Diagn 2020; 22:1041-1049. [PMID: 32497716 DOI: 10.1016/j.jmoldx.2020.05.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 04/25/2020] [Accepted: 05/07/2020] [Indexed: 02/01/2023] Open
Abstract
Overcoming challenges for the unambiguous detection of copy number variations is essential to broaden our understanding of the role of genomic variants in the clinical phenotype. With the improvement of software and databases, whole-exome sequencing quickly can become an excellent strategy in the routine diagnosis of patients with a developmental delay and/or multiple congenital malformations. However, even after a detailed analysis of pathogenic single-nucleotide variants and indels in known disease genes, using whole-exome sequencing, some patients with suspected syndromic conditions are left without a conclusive diagnosis. These negative results could be the result of different factors including nongenetic etiologies, lack of knowledge about the genes that cause different disease phenotypes, or, in some cases, a deletion or duplication of genomic information not routinely detectable by whole-exome sequencing variant calling. Although copy number variant detection is possible using whole-exome sequencing data, such analysis presents significant challenges and cannot yet be used to replace chromosomal arrays for identification of deletions or duplications.
Collapse
Affiliation(s)
- Évelin A Zanardo
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil.
| | | | - Samar N Chehimi
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Yanca G Oliveira
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Alexandre T Dias
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | | | | | - Gil M Novo-Filho
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Marília M Montenegro
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Amom M Nascimento
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | | | - Fernando Kok
- Mendelics Análise Genômica, São Paulo, Brazil; Departamento de Neurologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Leslie D Kulikowski
- Laboratório de Citogenômica, Departamento de Patologia, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil.
| |
Collapse
|
2
|
Moon S, Kim YJ, Han S, Hwang MY, Shin DM, Park MY, Lu Y, Yoon K, Jang HM, Kim YK, Park TJ, Song DS, Park JK, Lee JE, Kim BJ. The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits. Sci Rep 2019; 9:1382. [PMID: 30718733 PMCID: PMC6361960 DOI: 10.1038/s41598-018-37832-9] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 12/11/2018] [Indexed: 02/05/2023] Open
Abstract
We introduce the design and implementation of a new array, the Korea Biobank Array (referred to as KoreanChip), optimized for the Korean population and demonstrate findings from GWAS of blood biochemical traits. KoreanChip comprised >833,000 markers including >247,000 rare-frequency or functional variants estimated from >2,500 sequencing data in Koreans. Of the 833 K markers, 208 K functional markers were directly genotyped. Particularly, >89 K markers were presented in East Asians. KoreanChip achieved higher imputation performance owing to the excellent genomic coverage of 95.38% for common and 73.65% for low-frequency variants. From GWAS (Genome-wide association study) using 6,949 individuals, 28 associations were successfully recapitulated. Moreover, 9 missense variants were newly identified, of which we identified new associations between a common population-specific missense variant, rs671 (p.Glu457Lys) of ALDH2, and two traits including aspartate aminotransferase (P = 5.20 × 10−13) and alanine aminotransferase (P = 4.98 × 10−8). Furthermore, two novel missense variants of GPT with rare frequency in East Asians but extreme rarity in other populations were associated with alanine aminotransferase (rs200088103; p.Arg133Trp, P = 2.02 × 10−9 and rs748547625; p.Arg143Cys, P = 1.41 × 10−6). These variants were successfully replicated in 6,000 individuals (P = 5.30 × 10−8 and P = 1.24 × 10−6). GWAS results suggest the promising utility of KoreanChip with a substantial number of damaging variants to identify new population-specific disease-associated rare/functional variants.
Collapse
Affiliation(s)
- Sanghoon Moon
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Young Jin Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Sohee Han
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Mi Yeong Hwang
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Dong Mun Shin
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | | | | | - Kyungheon Yoon
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Hye-Mi Jang
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Yun Kyoung Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Tae-Joon Park
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Dae Sub Song
- Division of Epidemiology and Health Index, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Jae Kyung Park
- Division of Epidemiology and Health Index, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Jong-Eun Lee
- DNA link, Incorporated, Seoul, 03759, Republic of Korea
| | - Bong-Jo Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea.
| |
Collapse
|
3
|
Copy Number Variations in Amyotrophic Lateral Sclerosis: Piecing the Mosaic Tiles Together through a Systems Biology Approach. Mol Neurobiol 2017; 55:1299-1322. [PMID: 28120152 PMCID: PMC5820374 DOI: 10.1007/s12035-017-0393-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 01/06/2017] [Indexed: 12/11/2022]
Abstract
Amyotrophic lateral sclerosis (ALS) is a devastating and still untreatable motor neuron disease. Despite the molecular mechanisms underlying ALS pathogenesis that are still far from being understood, several studies have suggested the importance of a genetic contribution in both familial and sporadic forms of the disease. In addition to single-nucleotide polymorphisms (SNPs), which account for only a limited number of ALS cases, a consistent number of common and rare copy number variations (CNVs) have been associated to ALS. Most of the CNV-based association studies use a traditional candidate-gene approach that is inadequate for uncovering the genetic architectures of complex traits like ALS. The emergent paradigm of “systems biology” may offer a new perspective to better interpret the wide spectrum of CNVs in ALS, enabling the characterization of the complex network of gene products underlying ALS pathogenesis. In this review, we will explore the landscape of CNVs in ALS, putting specific emphasis on the functional impact of common CNV regions and genes consistently associated with increased risk of developing disease. In addition, we will discuss the potential contribution of multiple rare CNVs in ALS pathogenesis, focusing our attention on the complex mechanisms by which these proteins might impact, individually or in combination, the genetic susceptibility of ALS. The comprehensive detection and functional characterization of common and rare candidate risk CNVs in ALS susceptibility may bring new pieces into the intricate mosaic of ALS pathogenesis, providing interesting and important implications for a more precise molecular biomarker-assisted diagnosis and more effective and personalized treatments.
Collapse
|
4
|
Whole-exome sequencing study reveals common copy number variants in protocadherin genes associated with childhood obesity in Koreans. Int J Obes (Lond) 2017; 41:660-663. [PMID: 28100915 DOI: 10.1038/ijo.2017.12] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 11/29/2016] [Accepted: 12/18/2016] [Indexed: 01/25/2023]
Abstract
Recently, the prevalence of childhood obesity has significantly increased in industrialized countries, including Korea, and now controlling obesity is becoming an economic burden. However, knowledge of the risk factors associated with obesity is still limited. In this study, we aimed to discover additional obesity-associated loci in children. To achieve this, we conducted an exome-wide association analysis of copy number variation (CNV) using whole-exome sequencing (WES) data from a total of 102 cases and 86 controls. We newly identified a CNV locus that overlapped two protocadherin genes, PCDHB7 and PCDHB8, which are brain function-related genes (P-value=6.40 × 10-4, odds ratio=2.2189). A subsequent replication analysis using WES data from 203 obese and 291 normal weight children showed that this CNV region satisfied the genome-wide significance standard (Fisher's combined P-value=3.76 × 10-5). Moreover, correlation test using 199 additional samples supported significant association between CNV and increased body mass index. This region also showed a meaningful association with 273 cases and 2596 controls in adult samples. Our findings suggest that differences in the common CNV region at 5q31.3 may have an impact on the pathophysiology of obesity.
Collapse
|
5
|
Wenric S, Sticca T, Caberg JH, Josse C, Fasquelle C, Herens C, Jamar M, Max S, Gothot A, Caers J, Bours V. Exome copy number variation detection: Use of a pool of unrelated healthy tissue as reference sample. Genet Epidemiol 2016; 41:35-40. [PMID: 27862228 DOI: 10.1002/gepi.22019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 08/09/2016] [Accepted: 09/21/2016] [Indexed: 11/06/2022]
Abstract
An increasing number of bioinformatic tools designed to detect CNVs (copy number variants) in tumor samples based on paired exome data where a matched healthy tissue constitutes the reference have been published in the recent years. The idea of using a pool of unrelated healthy DNA as reference has previously been formulated but not thoroughly validated. As of today, the gold standard for CNV calling is still aCGH but there is an increasing interest in detecting CNVs by exome sequencing. We propose to design a metric allowing the comparison of two CNV profiles, independently of the technique used and assessed the validity of using a pool of unrelated healthy DNA instead of a matched healthy tissue as reference in exome-based CNV detection. We compared the CNV profiles obtained with three different approaches (aCGH, exome sequencing with a matched healthy tissue as reference, exome sequencing with a pool of eight unrelated healthy tissue as reference) on three multiple myeloma samples. We show that the usual analyses performed to compare CNV profiles (deletion/amplification ratios and CNV size distribution) lack in precision when confronted with low LRR values, as they only consider the binary status of each CNV. We show that the metric-based distance constitutes a more accurate comparison of two CNV profiles. Based on these analyses, we conclude that a reliable picture of CNV alterations in multiple myeloma samples can be obtained from whole-exome sequencing in the absence of a matched healthy sample.
Collapse
Affiliation(s)
- Stephane Wenric
- Laboratory of Human Genetics, GIGA-Research, University of Liège, Liège, Belgium
| | - Tiberio Sticca
- Laboratory of Human Genetics, GIGA-Research, University of Liège, Liège, Belgium
| | | | - Claire Josse
- Laboratory of Human Genetics, GIGA-Research, University of Liège, Liège, Belgium
| | - Corinne Fasquelle
- Laboratory of Human Genetics, GIGA-Research, University of Liège, Liège, Belgium
| | - Christian Herens
- Department of Human Genetics, University Hospital (CHU), Liège, Belgium
| | - Mauricette Jamar
- Department of Human Genetics, University Hospital (CHU), Liège, Belgium
| | - Stéphanie Max
- Department of Haematology and Immuno-haematology, University Hospital (CHU), Liège, Belgium
| | - André Gothot
- Department of Haematology and Immuno-haematology, University Hospital (CHU), Liège, Belgium
| | - Jo Caers
- Laboratory of Haematology, GIGA-Research, University of Liège, Liège, Belgium.,Department of Clinical Haematology, University Hospital (CHU), Liège, Belgium
| | - Vincent Bours
- Laboratory of Human Genetics, GIGA-Research, University of Liège, Liège, Belgium.,Department of Human Genetics, University Hospital (CHU), Liège, Belgium
| |
Collapse
|
6
|
Hong CS, Singh LN, Mullikin JC, Biesecker LG. Assessing the reproducibility of exome copy number variations predictions. Genome Med 2016; 8:82. [PMID: 27503473 PMCID: PMC4976506 DOI: 10.1186/s13073-016-0336-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 07/13/2016] [Indexed: 11/28/2022] Open
Abstract
Background Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology. Methods Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array. Results Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = –0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency. Conclusions Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0336-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Celine S Hong
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Larry N Singh
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA.,Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - Leslie G Biesecker
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA. .,NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA.
| |
Collapse
|
7
|
Mason-Suares H, Landry L, S. Lebo M. Detecting Copy Number Variation via Next Generation Technology. CURRENT GENETIC MEDICINE REPORTS 2016. [DOI: 10.1007/s40142-016-0091-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|