101
|
SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples. Genes Genomics 2019; 41:529-536. [PMID: 30779024 DOI: 10.1007/s13258-019-00788-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 01/21/2019] [Indexed: 12/13/2022]
Abstract
BACKGROUND Copy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases. OBJECTIVE The objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences. METHODS A statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing. RESULTS Compared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively. CONCLUSION SM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/ .
Collapse
|
102
|
Lofgren LA, Uehling JK, Branco S, Bruns TD, Martin F, Kennedy PG. Genome‐based estimates of fungal rDNA copy number variation across phylogenetic scales and ecological lifestyles. Mol Ecol 2019; 28:721-730. [DOI: 10.1111/mec.14995] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 11/22/2018] [Accepted: 11/27/2018] [Indexed: 12/28/2022]
Affiliation(s)
- Lotus A. Lofgren
- Department of Plant and Microbial Biology University of Minnesota St. Paul Minnesota
| | - Jessie K. Uehling
- Department of Plant and Microbial Biology University of California Berkeley Berkeley California
| | - Sara Branco
- Department of Microbiology and Immunology Montana State University Bozeman Montana
| | - Thomas D. Bruns
- Department of Plant and Microbial Biology University of California Berkeley Berkeley California
| | - Francis Martin
- Laboratoire d'Excellence ARBRE, Interactions Arbres/Micro‐organismes, INRA UMR1136 INRA‐Université de Lorraine Champenoux France
| | - Peter G. Kennedy
- Department of Plant and Microbial Biology University of Minnesota St. Paul Minnesota
- Department of Ecology, Evolution and Behavior University of Minnesota St. Paul Minnesota
| |
Collapse
|
103
|
Walker L, Watson CM, Hewitt S, Crinnion LA, Bonthron DT, Cohen KE. An alternative to array-based diagnostics: a prospectively recruited cohort, comparing arrayCGH to next-generation sequencing to evaluate foetal structural abnormalities. J OBSTET GYNAECOL 2019; 39:328-334. [PMID: 30714504 DOI: 10.1080/01443615.2018.1522529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Molecular diagnostic investigations, following the identification of foetal abnormalities, are routinely performed using array comparative genomic hybridisation (aCGH). Despite the utility of this technique, contemporary approaches for the detection of copy number variation are typically based on next-generation sequencing (NGS). We sought to compare an in-house NGS-based workflow (CNVseq) with aCGH, for invasively obtained foetal samples from pregnancies complicated by foetal structural abnormality. DNA from 40 foetuses was screened using both 8 × 60 K aCGH oligoarrays and low-coverage whole genome sequencing. Sequencer-compatible libraries were combined in a ten-sample multiplex and sequenced using an Illumina HiSeq2500. The mean resolution of CNVseq was 29 kb, compared to 60 kb for aCGH analyses. Four clinically significant, concordant, copy number imbalances were detected using both techniques, however, genomic breakpoints were more precisely defined by CNVseq. This data indicates CNVseq is a robust and sensitive alternative to aCGH, for the prenatal investigation of foetuses with structural abnormalities. Impact statement What is already known about this subject? Copy number variant analysis using next-generation sequencing has been successfully applied to investigations of tumour specimens and patients with developmental delays. The application of our approach, to a prospective prenatal diagnosis cohort, has not hitherto been assessed. What do the results of this study add? Next-generation sequencing has a comparable turnaround time and assay sensitivity to copy number variant analysis performed using array CGH. We demonstrate that having established a next-generation sequencing facility, high-throughput CNVseq sample processing and analysis can be undertaken within the framework of a regional diagnostic service. What are the implications of these findings for clinical practice and/or further research? Array CGH is a legacy technology which is likely to be superseded by low-coverage whole genome sequencing, for the detection of copy number variants, in the prenatal diagnosis of structural abnormalities.
Collapse
Affiliation(s)
- Lesley Walker
- a Department of Fetal Medicine , Leeds General Infirmary , Leeds , United Kingdom
| | - Christopher M Watson
- b Yorkshire Regional Genetics Service , St. James's University Hospital , Leeds , United Kingdom.,c School of Medicine , University of Leeds, St. James's University Hospital , Leeds , United Kingdom
| | - Sarah Hewitt
- b Yorkshire Regional Genetics Service , St. James's University Hospital , Leeds , United Kingdom
| | - Laura A Crinnion
- b Yorkshire Regional Genetics Service , St. James's University Hospital , Leeds , United Kingdom.,c School of Medicine , University of Leeds, St. James's University Hospital , Leeds , United Kingdom
| | - David T Bonthron
- b Yorkshire Regional Genetics Service , St. James's University Hospital , Leeds , United Kingdom.,c School of Medicine , University of Leeds, St. James's University Hospital , Leeds , United Kingdom
| | - Kelly E Cohen
- a Department of Fetal Medicine , Leeds General Infirmary , Leeds , United Kingdom
| |
Collapse
|
104
|
Haas J, Mester S, Lai A, Frese KS, Sedaghat-Hamedani F, Kayvanpour E, Rausch T, Nietsch R, Boeckel JN, Carstensen A, Völkers M, Dietrich C, Pils D, Amr A, Holzer DB, Martins Bordalo D, Oehler D, Weis T, Mereles D, Buss S, Riechert E, Wirsz E, Wuerstle M, Korbel JO, Keller A, Katus HA, Posch AE, Meder B. Genomic structural variations lead to dysregulation of important coding and non-coding RNA species in dilated cardiomyopathy. EMBO Mol Med 2019; 10:107-120. [PMID: 29138229 PMCID: PMC5760848 DOI: 10.15252/emmm.201707838] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The transcriptome needs to be tightly regulated by mechanisms that include transcription factors, enhancers, and repressors as well as non‐coding RNAs. Besides this dynamic regulation, a large part of phenotypic variability of eukaryotes is expressed through changes in gene transcription caused by genetic variation. In this study, we evaluate genome‐wide structural genomic variants (SVs) and their association with gene expression in the human heart. We detected 3,898 individual SVs affecting all classes of gene transcripts (e.g., mRNA, miRNA, lncRNA) and regulatory genomic regions (e.g., enhancer or TFBS). In a cohort of patients (n = 50) with dilated cardiomyopathy (DCM), 80,635 non‐protein‐coding elements of the genome are deleted or duplicated by SVs, containing 3,758 long non‐coding RNAs and 1,756 protein‐coding transcripts. 65.3% of the SV‐eQTLs do not harbor a significant SNV‐eQTL, and for the regions with both classes of association, we find similar effect sizes. In case of deleted protein‐coding exons, we find downregulation of the associated transcripts, duplication events, however, do not show significant changes over all events. In summary, we are first to describe the genomic variability associated with SVs in heart failure due to DCM and dissect their impact on the transcriptome. Overall, SVs explain up to 7.5% of the variation of cardiac gene expression, underlining the importance to study human myocardial gene expression in the context of the individual genome. This has immediate implications for studies on basic mechanisms of cardiac maladaptation, biomarkers, and (gene) therapeutic studies alike.
Collapse
Affiliation(s)
- Jan Haas
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Stefan Mester
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Alan Lai
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Karen S Frese
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Farbod Sedaghat-Hamedani
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Elham Kayvanpour
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Tobias Rausch
- EMBL (European Molecular Biology Laboratory), Heidelberg, Germany
| | - Rouven Nietsch
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Jes-Niels Boeckel
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Avisha Carstensen
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Mirko Völkers
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Carsten Dietrich
- Strategy and Innovation, Siemens Healthcare GmbH, Erlangen, Germany
| | - Dietmar Pils
- Siemens AG, Corporate Technology, Vienna, Austria.,Section for Clinical Biometrics, Center for Medical Statistics, Informatics, and Intelligent Systems (CeMSIIS), Medical University of Vienna, Vienna, Austria
| | - Ali Amr
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Daniel B Holzer
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Diana Martins Bordalo
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Daniel Oehler
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Tanja Weis
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Derliz Mereles
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Sebastian Buss
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Eva Riechert
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Emil Wirsz
- Strategy and Innovation, Siemens Healthcare GmbH, Erlangen, Germany
| | | | - Jan O Korbel
- EMBL (European Molecular Biology Laboratory), Heidelberg, Germany
| | - Andreas Keller
- Department of Bioinformatics, University of Saarland, Saarbrücken, Germany
| | - Hugo A Katus
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany.,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| | - Andreas E Posch
- Strategy and Innovation, Siemens Healthcare GmbH, Erlangen, Germany
| | - Benjamin Meder
- Department of Internal Medicine III, University of Heidelberg, Heidelberg, Germany .,DZHK (German Centre for Cardiovascular Research), Heidelberg, Germany
| |
Collapse
|
105
|
Wang X, Zhang H, Liu X. Defind: Detecting Genomic Deletions by Integrating Read Depth, GC Content, Mapping Quality and Paired-end Mapping Signatures of Next Generation Sequencing Data. Curr Bioinform 2019. [DOI: 10.2174/1574893613666180703110126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Background:
Accurate and exhaustive identification of genomic deletion events is the
basis for understanding their roles in phenotype variation. Developing effective algorithms to
identify deletions using next generation sequencing (NGS) data remains a challenge.
Objective:
The accurate and exhaustive identification of genomic deletion events is important; we
present a new approach, Defind, to detect deletions using NGS data from a single sample mapped
to the reference genome sequences.
Method:
The operating system(s) is Linux. Programming languages are Perl and R. We present
Defind, a new approach for detecting medium- and large-sized deletions, based on inspecting the
depth of coverage, GC content, mapping quality, and paired-end information of NGS data,
simultaneously. We carried out detailed comparisons between Defind and other deletion detection
methods using both simulation data and real data.
Results:
In simulation studies, Defind could retrieve more deletions than other methods at low to
medium sequencing coverage (e.g., 5 to 10×) with no false positives. Using real data, 94% of
deletions commonly detected by at least two other methods were also detected by Defind. In
addition, 90% of the deletions detected by Defind using the real data were positively supported by
comparative genomic hybridization results, demonstrating the efficiency of Defind.
Conclusion:
Defind performed robustly at different sequence coverage with different read length
in the simulation study. Our studies also provided a significant practical guidance to select
appropriate methods to detect genomic deletions using NGS data.
Collapse
Affiliation(s)
- Xin Wang
- College of Life Science, Nanchang University, Nanchang 330031, China
| | - Huan Zhang
- College of Life Science, Nanchang University, Nanchang 330031, China
| | - Xiaojing Liu
- College of Life Science, Nanchang University, Nanchang 330031, China
| |
Collapse
|
106
|
Pingel J, Andersen JD, Christiansen SL, Børsting C, Morling N, Lorentzen J, Kirk H, Doessing S, Wong C, Nielsen JB. Sequence variants in muscle tissue-related genes may determine the severity of muscle contractures in cerebral palsy. Am J Med Genet B Neuropsychiatr Genet 2019; 180:12-24. [PMID: 30467950 DOI: 10.1002/ajmg.b.32693] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 07/20/2018] [Accepted: 09/20/2018] [Indexed: 12/30/2022]
Abstract
Muscle contractures are a common complication to cerebral palsy (CP). The purpose of this study was to evaluate whether individuals with CP carry specific gene variants of important structural genes that might explain the severity of muscle contractures. Next-generation-sequencing (NGS) of 96 candidate genes associated with muscle structure and metabolism were analyzed in 43 individuals with CP (Gross Motor Function classification system [GMFCS] I, n=10; GMFCS II, n=14; GMFCS III, n=19) and four control participants. In silico analysis of the identified variants was performed. The variants were classified into four categories ranging from likely benign (VUS0) to highly likely functional effect (VUS3). All individuals with CP were classified and grouped according to their GMFCS level: Statistical comparisons were made between GMFCS groups. Kruskal-Wallis tests showed significantly more VUS2 variants in the genes COL4 (GMFCS I-III; 1, 1, 5, respectively [p < .04]), COL5 (GMFCS I-III; 1, 1, 5 [p < .04]), COL6 (GMFCS I-III; 0, 4, 7 [p < .003]), and COL9 (GMFCS I-III; 1, 1, 5 [p < .04]), in individuals with CP within GMFCS Level III when compared to the other GMFCS levels. Furthermore, significantly more VUS3 variants in COL6 (GMFCS I-III; 0, 5, 2 [p < .01]) and COL7 (GMFCS I-III; 0, 3, 0 [p < .04]) were identified in the GMFCS II level when compared to the other GMFCS levels. The present results highlight several candidate gene variants in different collagen types with likely functional effects in individuals with CP.
Collapse
Affiliation(s)
- Jessica Pingel
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jeppe Dyrberg Andersen
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sofie Lindgren Christiansen
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Claus Børsting
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Niels Morling
- Department of Forensic Medicine, Section of Forensic Genetics, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jakob Lorentzen
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| | - Henrik Kirk
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| | - Simon Doessing
- Department of Orthopedic Surgery, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Christian Wong
- Department of Orthopedic Surgery, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Jens Bo Nielsen
- Department of Neuroscience and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Helene Elsass Center, Charlottenlund, Denmark
| |
Collapse
|
107
|
Hepatoid adenocarcinoma of the stomach: a unique subgroup with distinct clinicopathological and molecular features. Gastric Cancer 2019; 22:1183-1192. [PMID: 30989433 PMCID: PMC6811386 DOI: 10.1007/s10120-019-00965-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 04/06/2019] [Indexed: 02/07/2023]
Abstract
OBJECTIVES Hepatoid adenocarcinoma of the stomach (HAS) is characterized by histological resemblance to hepatocellular carcinoma and a poor prognosis. The aim of this study is to elucidate the clinicopathological and molecular characteristics of HAS. METHODS Forty-two patients with HAS who received gastrectomy were enrolled in this study. Based on a panel of 483 cancer-related genes, targeted sequencing of 24 HAS and 22 clinical parameter-matched common gastric cancer (CGC) samples was performed. Prognostic factors for overall survival (OS) and disease-free survival (DFS) were analysed with the Kaplan-Meier method. RESULTS The most frequently mutated gene in both HAS and CGC was TP53, with a mutation rate of 30%. Additionally, CEBPA, RPTOR, WISP3, MARK1, and CD3EAP were identified as genes with high-frequency mutations in HAS (10-20%). Copy number gains (CNGs) at 20q11.21-13.12 occurred frequently in HAS, nearly 50% of HAS tumours harboured at least one gene with a CNG at 20q11.21-13.12. This CNG tended to be related to more adverse biobehaviour, including poorer differentiation, greater vascular and nerve invasion, and greater liver metastasis. Pathway enrichment analysis revealed that the HIF-1 signalling pathway and signalling pathways regulating stem cell pluripotency were specifically enriched in HAS. The survival analysis showed that a preoperative serum AFP level ≥ 500 ng/ml was significantly associated with poorer OS (p = 0.007) and tended to be associated with poorer DFS (p = 0.05). CONCLUSION CNGs at 20q11.21-13.12 happened frequently in HAS and tended to be related to more adverse biobehaviour. The preoperative serum AFP level was a sensitive prognostic biomarker for DFS and OS.
Collapse
|
108
|
Owens GL, Baute GJ, Hubner S, Rieseberg LH. Genomic sequence and copy number evolution during hybrid crop development in sunflowers. Evol Appl 2019; 12:54-65. [PMID: 30622635 PMCID: PMC6304689 DOI: 10.1111/eva.12603] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Accepted: 01/18/2018] [Indexed: 01/21/2023] Open
Abstract
Hybrid crops, an important part of modern agriculture, rely on the development of male and female heterotic gene pools. In sunflowers, heterotic gene pools were developed through the use of crop-wild relatives to produce cytoplasmic male sterile female and branching, fertility restoring male lines. Here, we use genomic data from a diversity panel of male, female, and open-pollinated lines to explore the genetic changes brought during modern improvement. We find the male lines have diverged most from their open-pollinated progenitors and that genetic differentiation is concentrated in chromosomes, 8, 10 and 13, due to introgressions from wild relatives. Ancestral variation from open-pollinated varieties almost universally evolved in parallel for both male and female lines suggesting little or no selection for heterotic overdominance. Furthermore, we show that gene content differs between the male and female lines and that differentiation in gene content is concentrated in high FST regions. This means that the introgressions that brought branching and fertility restoration to the male lines, brought with them different gene content from the ancestral haplotypes, including the removal of some genes. Although we find no evidence that gene complementation genomewide is responsible for heterosis between male and female lines, several of the genes that are largely absent in either the male or female lines are associated with pathogen defense, suggesting complementation may be functionally relevant for crop breeders.
Collapse
Affiliation(s)
- Gregory L. Owens
- Department of Botany and Biodiversity Research CentreUniversity of British ColumbiaVancouverBCCanada
| | - Gregory J. Baute
- Department of Botany and Biodiversity Research CentreUniversity of British ColumbiaVancouverBCCanada
| | - Sariel Hubner
- Department of Botany and Biodiversity Research CentreUniversity of British ColumbiaVancouverBCCanada
- Department of BiotechnologyTel‐Hai Academic CollegeUpper GalileeIsrael
- MIGAL ‐ Galilee Research InstituteKiryat ShmonaIsrael
| | - Loren H. Rieseberg
- Department of Botany and Biodiversity Research CentreUniversity of British ColumbiaVancouverBCCanada
| |
Collapse
|
109
|
Roca I, González-Castro L, Fernández H, Couce ML, Fernández-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH 2019; 779:114-125. [DOI: 10.1016/j.mrrev.2019.02.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 12/25/2018] [Accepted: 02/22/2019] [Indexed: 01/23/2023]
|
110
|
Owens GL, Baute GJ, Hubner S, Rieseberg LH. Genomic sequence and copy number evolution during hybrid crop development in sunflowers. Evol Appl 2019. [PMID: 30622635 DOI: 10.111/eva.12603] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023] Open
Abstract
Hybrid crops, an important part of modern agriculture, rely on the development of male and female heterotic gene pools. In sunflowers, heterotic gene pools were developed through the use of crop-wild relatives to produce cytoplasmic male sterile female and branching, fertility restoring male lines. Here, we use genomic data from a diversity panel of male, female, and open-pollinated lines to explore the genetic changes brought during modern improvement. We find the male lines have diverged most from their open-pollinated progenitors and that genetic differentiation is concentrated in chromosomes, 8, 10 and 13, due to introgressions from wild relatives. Ancestral variation from open-pollinated varieties almost universally evolved in parallel for both male and female lines suggesting little or no selection for heterotic overdominance. Furthermore, we show that gene content differs between the male and female lines and that differentiation in gene content is concentrated in high FST regions. This means that the introgressions that brought branching and fertility restoration to the male lines, brought with them different gene content from the ancestral haplotypes, including the removal of some genes. Although we find no evidence that gene complementation genomewide is responsible for heterosis between male and female lines, several of the genes that are largely absent in either the male or female lines are associated with pathogen defense, suggesting complementation may be functionally relevant for crop breeders.
Collapse
Affiliation(s)
- Gregory L Owens
- Department of Botany and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| | - Gregory J Baute
- Department of Botany and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| | - Sariel Hubner
- Department of Botany and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
- Department of Biotechnology Tel-Hai Academic College Upper Galilee Israel
- MIGAL - Galilee Research Institute Kiryat Shmona Israel
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre University of British Columbia Vancouver BC Canada
| |
Collapse
|
111
|
Desvillechabrol D, Bouchier C, Kennedy S, Cokelaer T. Sequana coverage: detection and characterization of genomic variations using running median and mixture models. Gigascience 2018; 7:5091804. [PMID: 30192951 PMCID: PMC6275460 DOI: 10.1093/gigascience/giy110] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 08/23/2018] [Indexed: 11/30/2022] Open
Abstract
Background In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location. Results We provide a stand-alone application, sequana_coverage, that reports genomic regions of interest (ROIs) that are significantly over- or underrepresented in high-throughput sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and overcovered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single-nucleotide variants or CNVs can be effectively identified at the same time.
Collapse
Affiliation(s)
| | - Christiane Bouchier
- Institut Pasteur - Pole Biomics - 25-28 Rue du Docteur Roux, 75015 Paris, France
| | - Sean Kennedy
- Institut Pasteur - Pole Biomics - 25-28 Rue du Docteur Roux, 75015 Paris, France
| | - Thomas Cokelaer
- Institut Pasteur - Pole Biomics - 25-28 Rue du Docteur Roux, 75015 Paris, France.,Institut Pasteur - Bioinformatics and Biostatistics Hub - C3BI, USR 3756 IP CNRS - Paris, France
| |
Collapse
|
112
|
McKenna B, Koomar T, Vervier K, Kremsreiter J, Michaelson JJ. Whole-genome sequencing in a family with twin boys with autism and intellectual disability suggests multimodal polygenic risk. Cold Spring Harb Mol Case Stud 2018; 4:a003285. [PMID: 30559312 PMCID: PMC6318775 DOI: 10.1101/mcs.a003285] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 10/10/2018] [Indexed: 01/02/2023] Open
Abstract
Over the past decade, a focus on de novo mutations has rapidly accelerated gene discovery in autism spectrum disorder (ASD), intellectual disability (ID), and other neurodevelopmental disorders (NDDs). However, recent studies suggest that only a minority of cases are attributable to de novo mutations, and instead these disorders often result from an accumulation of various forms of genetic risk. Consequently, we adopted an inclusive approach to investigate the genetic risk contributing to a case of male monozygotic twins with ASD and ID. At the time of the study, the probands were 7 yr old and largely nonverbal. Medical records indicated a history of motor delays, sleep difficulties, and significant cognitive deficits. Through whole-genome sequencing of the probands and their parents, we uncovered elevated common polygenic risk, a coding de novo point mutation in CENPE, an ultra-rare homozygous regulatory variant in ANK3, inherited rare variants in NRXN3, and a maternally inherited X-linked deletion situated in a noncoding regulatory region between ZNF81 and ZNF182 Although each of these genes has been directly or indirectly associated with NDDs, evidence suggests that no single variant adequately explains the probands' phenotype. Instead, we propose that the probands' condition is due to the confluence of multiple rare variants in the context of a high-risk genetic background. This case emphasizes the multifactorial nature of genetic risk underlying most instances of NDDs and aligns with the "female protective model" of ASD.
Collapse
Affiliation(s)
- Brooke McKenna
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242, USA
- Department of Psychology, Emory University, Atlanta, Georgia 30322, USA
| | - Tanner Koomar
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242, USA
| | - Kevin Vervier
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242, USA
- Host-Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom
| | - Jamie Kremsreiter
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242, USA
| | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, Iowa 52242, USA
| |
Collapse
|
113
|
Banuelos M, Sindi S, Marcia R. Structural Variant Prediction in Extended Pedigrees Through Sparse Negative Binomial Genome Signal Recovery. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:1311-1314. [PMID: 30440632 DOI: 10.1109/embc.2018.8512519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Structural variants (SVs) are rearrangements, such as deletions, insertions, duplications, inversions, and translocations, in an individual's genome relative to a reference. SV detection is often marred by high false positive rates due to errors in sequencing and mapping. In previous work, we proposed a maximum likelihood approach to SV prediction that incorporated low-coverage sequencing data and coverage distribution. In particular, we developed a negative binomial framework to reflect a more realistic representation DNA fragment distributions sampled from an individual's genome. In this paper, we leverage relationships between an off spring and both parents, in addition to the negative binomial framework, to improve SV identification accuracy. We present numerical results on both simulated genomes as well as two sequenced parent-child trios from the 1000 Genomes Project.
Collapse
|
114
|
Computational Analysis of Structural Variation in Cancer Genomes. Methods Mol Biol 2018. [PMID: 30378069 DOI: 10.1007/978-1-4939-8868-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Cancer onset and progression is often triggered by the accumulation of structural abnormalities in the genome. Somatically acquired large structural variants (SV) are one class of abnormalities that can lead to cancer onset by, for example, deactivating tumor suppressor genes and by upregulating oncogenes. Detecting and classifying these variants can lead to improved therapies and diagnostics for cancer patients.This chapter provides an overview of the problem of computational genomic SV detection using next-generation sequencing (NGS) platforms, along with a brief overview of typical approaches for addressing this problem. It also discusses the general protocol that should be followed to analyze a cancer genome for SV detection in NGS data.
Collapse
|
115
|
Hartasánchez DA, Brasó-Vives M, Heredia-Genestar JM, Pybus M, Navarro A. Effect of Collapsed Duplications on Diversity Estimates: What to Expect. Genome Biol Evol 2018; 10:2899-2905. [PMID: 30364947 PMCID: PMC6239678 DOI: 10.1093/gbe/evy223] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/08/2018] [Indexed: 12/19/2022] Open
Abstract
The study of segmental duplications (SDs) and copy-number variants (CNVs) is of great importance in the fields of genomics and evolution. However, SDs and CNVs are usually excluded from genome-wide scans for natural selection. Because of high identity between copies, SDs and CNVs that are not included in reference genomes are prone to be collapsed-that is, mistakenly aligned to the same region-when aligning sequence data from single individuals to the reference. Such collapsed duplications are additionally challenging because concerted evolution between duplications alters their site frequency spectrum and linkage disequilibrium patterns. To investigate the potential effect of collapsed duplications upon natural selection scans we obtained expectations for four summary statistics from simulations of duplications evolving under a range of interlocus gene conversion and crossover rates. We confirm that summary statistics traditionally used to detect the action of natural selection on DNA sequences cannot be applied to SDs and CNVs since in some cases values for known duplications mimic selective signatures. As a proof of concept of the pervasiveness of collapsed duplications, we analyzed data from the 1,000 Genomes Project. We find that, within regions identified as variable in copy number, diversity between individuals with the duplication is consistently higher than between individuals without the duplication. Furthermore, the frequency of single nucleotide variants (SNVs) deviating from Hardy-Weinberg Equilibrium is higher in individuals with the duplication, which strongly suggests that higher diversity is a consequence of collapsed duplications and incorrect evaluation of SNVs within these CNV regions.
Collapse
Affiliation(s)
- Diego A Hartasánchez
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.,Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
| | - Marina Brasó-Vives
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Jose Maria Heredia-Genestar
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Marc Pybus
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.,National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain.,Centre for Genomic Regulation (CRG), Barcelona, Catalonia, Spain
| |
Collapse
|
116
|
Zare F, Hosny A, Nabavi S. Noise cancellation using total variation for copy number variation detection. BMC Bioinformatics 2018; 19:361. [PMID: 30343665 PMCID: PMC6196408 DOI: 10.1186/s12859-018-2332-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Due to recent advances in sequencing technologies, sequence-based analysis has been widely applied to detecting copy number variations (CNVs). There are several techniques for identifying CNVs using next generation sequencing (NGS) data, however methods employing depth of coverage or read depth (RD) have recently become a main technique to identify CNVs. The main assumption of the RD-based CNV detection methods is that the readcount value at a specific genomic location is correlated with the copy number at that location. However, readcount data's noise and biases distort the association between the readcounts and copy numbers. For more accurate CNV identification, these biases and noise need to be mitigated. In this work, to detect CNVs more precisely and efficiently we propose a novel denoising method based on the total variation approach and the Taut String algorithm. RESULTS To investigate the performance of the proposed denoising method, we computed sensitivities, false discovery rates and specificities of CNV detection when employing denoising, using both simulated and real data. We also compared the performance of the proposed denoising method, Taut String, with that of the commonly used approaches such as moving average (MA) and discrete wavelet transforms (DWT) in terms of sensitivity of detecting true CNVs and time complexity. The results show that Taut String works better than DWT and MA and has a better power to identify very narrow CNVs. The ability of Taut String denoising in preserving CNV segments' breakpoints and narrow CNVs increases the detection accuracy of segmentation algorithms, resulting in higher sensitivities and lower false discovery rates. CONCLUSIONS In this study, we proposed a new denoising method for sequence-based CNV detection based on a signal processing technique. Existing CNV detection algorithms identify many false CNV segments and fail in detecting short CNV segments due to noise and biases. Employing an effective and efficient denoising method can significantly enhance the detection accuracy of the CNV segmentation algorithms. Advanced denoising methods from the signal processing field can be employed to implement such algorithms. We showed that non-linear denoising methods that consider sparsity and piecewise constant characteristics of CNV data result in better performance in CNV detection.
Collapse
Affiliation(s)
- Fatima Zare
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
| | - Abdelrahman Hosny
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Sheida Nabavi
- Computer Science and Engineering Department and Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
117
|
Sato K, Kawazu M, Yamamoto Y, Ueno T, Kojima S, Nagae G, Abe H, Soda M, Oga T, Kohsaka S, Sai E, Yamashita Y, Iinuma H, Fukayama M, Aburatani H, Watanabe T, Mano H. Fusion Kinases Identified by Genomic Analyses of Sporadic Microsatellite Instability-High Colorectal Cancers. Clin Cancer Res 2018; 25:378-389. [PMID: 30279230 DOI: 10.1158/1078-0432.ccr-18-1574] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 07/31/2018] [Accepted: 09/27/2018] [Indexed: 11/16/2022]
Abstract
PURPOSE Colorectal cancers with microsatellite instability-high (MSI-H) status, due to mismatch repair deficiency, are associated with poor patient outcomes after relapse. We aimed to identify novel therapeutic targets for them. EXPERIMENTAL DESIGN We performed MSI analyses of over 2,800 surgically resected colorectal tumors obtained from consecutive patients treated in Japan from 1998 through June 2016. Whole-exome sequencing, transcriptome sequencing, and methylation analyses were performed on 149 of 162 tumors showing MSI in BAT25 and BAT26 loci. We analyzed patient survival times using Bonferroni-adjusted log-rank tests. RESULTS Sporadic MSI-H colorectal cancers with promoter methylation of MLH1 (called MM) had a clinicopathological profile that was distinct from that of colorectal cancers of patients with germline mutations (Lynch syndrome, LS-associated) or somatic, Lynch-like mutations in mismatch repair genes. MM tumors had more insertions and deletions and more recurrent mutations in BRAF and RNF43 than LS-associated or Lynch-like MSI-H tumors. Eleven fusion kinases were exclusively detected in MM MSI-H colorectal cancers lacking oncogenic KRAS/BRAF missense mutations and were associated with worse post-relapse prognosis. We developed a simple method to identify MM tumors and applied it to a validation cohort of 28 MSI-H colorectal cancers, identifying 16 MM tumors and 2 fusion kinases. CONCLUSIONS We discovered that fusion kinases are frequently observed among sporadic MM MSI-H colorectal cancers. The new method to identify MM tumors enables us to straightforwardly group MSI-H patients into candidates of LS or fusion kinase carriers.
Collapse
Affiliation(s)
- Kazuhito Sato
- Department of Surgical Oncology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Masahito Kawazu
- Department of Medical Genomics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| | - Yoko Yamamoto
- Department of Surgical Oncology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Toshihide Ueno
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shinya Kojima
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Genta Nagae
- Genome Science Division, Research Center for Advanced Science and Technologies, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Abe
- Department of Pathology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Manabu Soda
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Takafumi Oga
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shinji Kohsaka
- Department of Medical Genomics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Eirin Sai
- Department of Medical Genomics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yoshihiro Yamashita
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hisae Iinuma
- Department of Surgery, Teikyo University School of Medicine, Tokyo, Japan
| | - Masashi Fukayama
- Department of Pathology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Aburatani
- Genome Science Division, Research Center for Advanced Science and Technologies, The University of Tokyo, Tokyo, Japan
| | - Toshiaki Watanabe
- Department of Surgical Oncology and Vascular Surgery, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Mano
- Department of Cellular Signaling, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,National Cancer Center Research Institute, Tokyo, Japan
| |
Collapse
|
118
|
Li J, Lu N, Tao Y, Duan M, Qiao Y, Xu Y, Ge Q, Bi C, Fu J, Tu J, Lu Z. Accurate and sensitive single-cell-level detection of copy number variations by micro-channel multiple displacement amplification (μcMDA). NANOSCALE 2018; 10:17933-17941. [PMID: 30226245 DOI: 10.1039/c8nr04917c] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Whole genome amplification (WGA) has laid the foundation for investigating complex genomic alteration with single-cell or even single-molecule resolution. Coupled with sequencing-based copy number variation (CNV) analysis, it promotes understanding of the nature of commonly existing genetic heterogeneity by constructing the sequencing profiles for every single cell. However, prevailing methods only provide insights into limited aspects due to their intrinsic technical challenges. Their output data, as a result, fails to render comprehensive information (which is) concerned. Here, we describe the CNV detection analysis based on micro-channel multiple displacement amplification (μcMDA), a protocol able to provide optimized amplification uniformity while inheriting the advantages of MDA chemistry. We demonstrate the analysis of both the normal diploid YH-1 cell line and the aneuploid K562 cancer cell line. In the detection of simulated CNVs ranging from 300 kb to 2 Mb, μcMDA can respectively increase the detection rates of copy number loss and gain by 28.8% and 40.2% on average, using only 0.2× sequencing data. When detecting the inherent CNVs in tumor cells, the resolution of CNV recognition can be improved to 250 kb. Starting from either superabundant template copies or minute single-cell-level input, this easily accessible approach is capable of providing quantitatively reliable coverage as well as more robust GC-content regression for CNV detection.
Collapse
Affiliation(s)
- Junji Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
119
|
Burbulis IE, Wierman MB, Wolpert M, Haakenson M, Lopes MB, Schiff D, Hicks J, Loe J, Ratan A, McConnell MJ. Improved molecular karyotyping in glioblastoma. Mutat Res 2018; 811:16-26. [PMID: 30055482 DOI: 10.1016/j.mrfmmm.2018.06.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 06/22/2018] [Accepted: 06/24/2018] [Indexed: 06/08/2023]
Abstract
Uneven replication creates artifacts during whole genome amplification (WGA) that confound molecular karyotype assignment in single cells. Here, we present an improved WGA recipe that increased coverage and detection of copy number variants (CNVs) in single cells. We examined serial resections of glioblastoma (GBM) tumor from the same patient and found low-abundance clones containing CNVs in clinically relevant loci that were not observable using bulk DNA sequencing. We discovered extensive genomic variability in this class of tumor and provide a practical approach for investigating somatic mosaicism.
Collapse
Affiliation(s)
- Ian E Burbulis
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States; Escuela de Medicina, Universidad San Sebastian, Puerto Montt, Chile
| | - Margaret B Wierman
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - Matt Wolpert
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - Mark Haakenson
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - Maria-Beatriz Lopes
- Department of Pathology, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - David Schiff
- Department of Neurology, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - James Hicks
- Michelson Center, University of Southern California, Los Angeles, CA, United States; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
| | - Justin Loe
- Full Genomes Corp, Inc., Rockville, MD, United States
| | - Aakrosh Ratan
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States; Center for Public Health Genomics, University of Virginia, School of Medicine, Charlottesville, VA, United States
| | - Michael J McConnell
- Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, VA, United States; Department of Neuroscience, University of Virginia, School of Medicine, Charlottesville, VA, United States; Center for Public Health Genomics, University of Virginia, School of Medicine, Charlottesville, VA, United States; Center for Brain Immunology and Glia, University of Virginia, School of Medicine, Charlottesville, VA, United States.
| |
Collapse
|
120
|
Monlong J, Cossette P, Meloche C, Rouleau G, Girard SL, Bourque G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res 2018; 46:7236-7249. [PMID: 30137632 PMCID: PMC6101599 DOI: 10.1093/nar/gky538] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 05/04/2018] [Accepted: 06/12/2018] [Indexed: 12/18/2022] Open
Abstract
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
Collapse
Affiliation(s)
- Jean Monlong
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
| | - Patrick Cossette
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Caroline Meloche
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Guy Rouleau
- Montreal Neurological Institute, McGill University, Montréal H3A 2B4, Canada
| | - Simon L Girard
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi G7H 2B1, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
- McGill University and Génome Québec Innovation Center, Montréal H3A 1A4, Canada
| |
Collapse
|
121
|
Smadbeck JB, Johnson SH, Smoley SA, Gaitatzes A, Drucker TM, Zenka RM, Kosari F, Murphy SJ, Hoppman N, Aypar U, Sukov WR, Jenkins RB, Kearney HM, Feldman AL, Vasmatzis G. Copy number variant analysis using genome-wide mate-pair sequencing. Genes Chromosomes Cancer 2018; 57:459-470. [PMID: 29726617 DOI: 10.1002/gcc.5] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 04/23/2018] [Accepted: 04/29/2018] [Indexed: 02/06/2023] Open
Abstract
Copy number variation (CNV) is a common form of structural variation detected in human genomes, occurring as both constitutional and somatic events. Cytogenetic techniques like chromosomal microarray (CMA) are widely used in analyzing CNVs. However, CMA techniques cannot resolve the full nature of these structural variations (i.e. the orientation and location of associated breakpoint junctions) and must be combined with other cytogenetic techniques, such as karyotyping or FISH, to do so. This makes the development of a next-generation sequencing (NGS) approach capable of resolving both CNVs and breakpoint junctions desirable. Mate-pair sequencing (MPseq) is a NGS technology designed to find large structural rearrangements across the entire genome. Here we present an algorithm capable of performing copy number analysis from mate-pair sequencing data. The algorithm uses a step-wise procedure involving normalization, segmentation, and classification of the sequencing data. The segmentation technique combines both read depth and discordant mate-pair reads to increase the sensitivity and resolution of CNV calls. The method is particularly suited to MPseq, which is designed to detect breakpoint junctions at high resolution. This allows for the classification step to accurately calculate copy number levels at the relatively low read depth of MPseq. Here we compare results for a series of hematological cancer samples that were tested with CMA and MPseq. We demonstrate comparable sensitivity to the state-of-the-art CMA technology, with the benefit of improved breakpoint resolution. The algorithm provides a powerful analytical tool for the analysis of MPseq results in cancer.
Collapse
Affiliation(s)
- James B Smadbeck
- Center for Individualized Medicine - Biomarker Discovery, Mayo Clinic, Rochester, Minnesota
| | - Sarah H Johnson
- Center for Individualized Medicine - Biomarker Discovery, Mayo Clinic, Rochester, Minnesota
| | - Stephanie A Smoley
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | | | | | - Roman M Zenka
- Bioinformatics Systems, Mayo Clinic, Rochester, Minnesota
| | - Farhad Kosari
- Center for Individualized Medicine - Biomarker Discovery, Mayo Clinic, Rochester, Minnesota
| | - Stephen J Murphy
- Center for Individualized Medicine - Biomarker Discovery, Mayo Clinic, Rochester, Minnesota
| | - Nicole Hoppman
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Umut Aypar
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - William R Sukov
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Robert B Jenkins
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Hutton M Kearney
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Andrew L Feldman
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - George Vasmatzis
- Center for Individualized Medicine - Biomarker Discovery, Mayo Clinic, Rochester, Minnesota.,Department of Molecular Medicine, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
122
|
Liu X, Li A, Xi J, Feng H, Wang M. Detection of copy number variants and loss of heterozygosity from impure tumor samples using whole exome sequencing data. Oncol Lett 2018; 16:4713-4720. [PMID: 30214605 DOI: 10.3892/ol.2018.9150] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 06/02/2017] [Indexed: 01/07/2023] Open
Abstract
Using whole-exome sequencing (WES) for the detection of chromosomal aberrations from tumor samples has become increasingly popular, as it is cost-effective and time efficient. However, factors which present in WES tumor samples, including diversity in exon size, batch effect and tumor impurity, can complicate the identification of somatic mutation in each region of the exon. To address these issues, the authors of the present study have developed a novel method, PECNV, for the detection of genomic copy number variants and loss of heterozygosity in WES datasets. PECNV combines normalized logarithm ratio of read counts (Log Ratio) and B allele frequency (BAF), and then employs expectation maximization (EM) algorithm to estimate parameters involved in the models. A comprehensive assessment of PECNV of PECNV was performed by analyzing simulated datasets contaminated with different normal cell proportion and eight real primary triple-negative breast cancer samples. PECNV demonstrated superior results compared with ExomeCNV and EXCAVATOR for the detection of genomic aberrations in WES data.
Collapse
Affiliation(s)
- Xiaocheng Liu
- Department of Electronic Science and Technology, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
| | - Ao Li
- Department of Electronic Science and Technology, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China.,Center for Biomedical Engineering, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
| | - Jianing Xi
- Department of Electronic Science and Technology, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
| | - Huanqing Feng
- Department of Electronic Science and Technology, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
| | - Minghui Wang
- Department of Electronic Science and Technology, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China.,Center for Biomedical Engineering, School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
| |
Collapse
|
123
|
Sun L, Ge Y, Bancroft AC, Cheng X, Wen J. FNBtools: A Software to Identify Homozygous Lesions in Deletion Mutant Populations. FRONTIERS IN PLANT SCIENCE 2018; 9:976. [PMID: 30042776 PMCID: PMC6048286 DOI: 10.3389/fpls.2018.00976] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 06/15/2018] [Indexed: 05/27/2023]
Abstract
Deletion mutagenesis such as fast neutron bombardment (FNB) has been widely used for forward and reverse genetics studies in functional genomics. Traditionally, the time-consuming map-based cloning is used to locate causal deletions in deletion mutants. In recent years, comparative genomic hybridization (CGH) has been used to speed up and scale up the lesion identification process in deletion mutants. However, limitations of low accuracy and sensitivity for small deletions in the CGH approach are apparent. With the next generation sequencing (NGS) becoming affordable for most users, NGS-based bioinformatics tools are more appealing. Although several deletion callers are available, these tools are not efficient in detecting small deletions. Population-scale deletion callers that can identify both small and large deletions are rare. We were motivated to create a population-scale deletion detection tool, called FNBtools, to identify homozygous causal deletions in mutant populations by using NGS data. FNBtools is a tool to call deletions at a population-scale and to achieve high accuracy at different levels of coverage. In addition, FNBtools can detect both small and large deletions with the ability to identify unique deletions in a mutant pool by filtering deletions that exist in a wild-type or control pool. Furthermore, FNBtools is also able to visualize all identified deletions in a genome-wide scope by using Circos. From simulated data analysis, FNBtools outperforms four existing popular deletion callers in detecting small deletions at different coverage levels. To test the usefulness of FNBtools in real biological applications, we used it to analyze a salt-tolerant mutant in Medicago truncatula and identified the unique deletion locus that is tightly linked with this trait. The causal deletion in the mutant was confirmed by PCR amplification, sequencing and genetic linkage analyses. FNBtools can be used for homozygous deletion identification in any species with reference genome sequences. FNBtools is publicly available at: https://github.com/noble-research-institute/fnbtools.
Collapse
Affiliation(s)
- Liang Sun
- *Correspondence: Liang Sun, Jiangqi Wen,
| | | | | | | | | |
Collapse
|
124
|
Cai YH, Yao GY, Chen LJ, Gan HY, Ye CS, Yang XX. The Combining Effects of Cell-Free Circulating Tumor DNA of Breast Tumor to the Noninvasive Prenatal Testing Results: A Simulating Investigation. DNA Cell Biol 2018; 37:626-633. [PMID: 29957029 DOI: 10.1089/dna.2017.4112] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Massively parallel sequencing of circulating fetal DNA in the plasma of pregnant women is a common method for noninvasive prenatal testing (NIPT) of fetal trisomy 13, 18, and 21. However, circulating DNA is not restricted to pregnant women, with increased levels of plasma DNA also frequently detected in the plasma of cancer patients. Among pregnant women whose NIPT results were inconsistent with the fetal karyotype, a small number of patients have subsequently been diagnosed with a previously undetected malignancy. However, the extent to which circulating tumor DNA (ctDNA) affects the results of NIPT is still unclear. We examined serum from 50 nonpregnant women with breast tumors by NIPT. These samples were then added to serum containing trisomy 13, 18, and 21 fetal DNA to figure out the extent to which maternal tumors can interrupt NIPT results in pregnant women with breast tumors. Concentrations of cell-free DNA (cfDNA) were higher in both pregnant women and breast tumor patients, relative to nonpregnant healthy controls. Among the 50 samples evaluated, 3 produced false positive NIPT results for trisomy 13, 18, or 21, indicating that genomic copy number variations (CNVs) had occurred. Simulation testing also showed that ctDNA can increase the standard deviation of the associated z-scores, which lower absolute z-scores by decreasing the proportion of circulating fetal DNA relative to total DNA. Of the 50 samples tested, 9 fell within the equivocal range and 8 produced false negative results for trisomy 13, 18, or 21. Data presented here show for the first time that ctDNA is able to affect NIPT results in two ways. First, ctDNA can lead to false positive results due to the detection of genomic CNVs in tumor DNA. Alternatively, ctDNA can increase the likelihood of a false negative by decreasing the proportion of circulating fetal DNA in serum.
Collapse
Affiliation(s)
- Ya-Hong Cai
- 1 Department of Breast Surgery, Zhuhai Hospital of Traditional Chinese and Western Medicine (The Second People's Hospital of Zhuhai) , Zhuhai City, Guangdong, People's Republic of China
| | - Guang-Yu Yao
- 2 Department of Breast Surgery, Nanfang Hospital, Southern Medical University , Guangzhou, Guangdong, People's Republic of China
| | - Lu-Jia Chen
- 2 Department of Breast Surgery, Nanfang Hospital, Southern Medical University , Guangzhou, Guangdong, People's Republic of China
| | - Hai-Yan Gan
- 3 Guangzhou Darui Biotechnology Co., Ltd. , Guangzhou, Guangdong, People's Republic of China
| | - Chang-Sheng Ye
- 2 Department of Breast Surgery, Nanfang Hospital, Southern Medical University , Guangzhou, Guangdong, People's Republic of China
| | - Xue-Xi Yang
- 4 School of Laboratory Medical and Biotechnology, Southern Medical University , Guangzhou, Guangdong, People's Republic of China
| |
Collapse
|
125
|
|
126
|
Knaus BJ, Grünwald NJ. Inferring Variation in Copy Number Using High Throughput Sequencing Data in R. Front Genet 2018; 9:123. [PMID: 29706990 PMCID: PMC5909048 DOI: 10.3389/fgene.2018.00123] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 03/26/2018] [Indexed: 12/30/2022] Open
Abstract
Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR. This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans, both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects.
Collapse
Affiliation(s)
- Brian J Knaus
- Horticultural Crops Research Unit, United States Department of Agriculture-Agricultural Research Service, Corvallis, OR, United States
| | - Niklaus J Grünwald
- Horticultural Crops Research Unit, United States Department of Agriculture-Agricultural Research Service, Corvallis, OR, United States
| |
Collapse
|
127
|
Bianconi ME, Dunning LT, Moreno-Villena JJ, Osborne CP, Christin PA. Gene duplication and dosage effects during the early emergence of C4 photosynthesis in the grass genus Alloteropsis. JOURNAL OF EXPERIMENTAL BOTANY 2018; 69:1967-1980. [PMID: 29394370 PMCID: PMC6018922 DOI: 10.1093/jxb/ery029] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 01/17/2018] [Indexed: 05/04/2023]
Abstract
The importance of gene duplication for evolutionary diversification has been mainly discussed in terms of genetic redundancy allowing neofunctionalization. In the case of C4 photosynthesis, which evolved via the co-option of multiple enzymes to boost carbon fixation in tropical conditions, the importance of genetic redundancy has not been consistently supported by genomic studies. Here, we test for a different role for gene duplication in the early evolution of C4 photosynthesis, via dosage effects creating rapid step changes in expression levels. Using genome-wide data for accessions of the grass genus Alloteropsis that recently diversified into different photosynthetic types, we estimate gene copy numbers and demonstrate that recurrent duplications in two important families of C4 genes coincided with increases in transcript abundance along the phylogeny, in some cases via a pure dosage effect. While increased gene copy number during the initial emergence of C4 photosynthesis probably offered a rapid route to enhanced expression, we also find losses of duplicates following the acquisition of genes encoding better-suited isoforms. The dosage effect of gene duplication might therefore act as a transient process during the evolution of a C4 biochemistry, rendered obsolete by the fixation of regulatory mutations increasing expression levels.
Collapse
Affiliation(s)
- Matheus E Bianconi
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Luke T Dunning
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | | | - Colin P Osborne
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | | |
Collapse
|
128
|
Nishio S, Moteki H, Usami S. Simple and efficient germline copy number variant visualization method for the Ion AmpliSeq™ custom panel. Mol Genet Genomic Med 2018; 6:678-686. [PMID: 29633566 PMCID: PMC6081219 DOI: 10.1002/mgg3.399] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 03/02/2018] [Accepted: 03/06/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Recent advances in molecular genetic analysis using next-generation sequencing (NGS) have drastically accelerated the identification of disease-causing gene mutations. Most next-generation sequencing analyses of inherited diseases have mainly focused on single-nucleotide variants and short indels, although, recently, structure variations including copy number variations have come to be considered an important cause of many different diseases. However, only a limited number of tools are available for multiplex PCR-based target genome enrichment. METHODS In this paper, we reported a simple and efficient copy number variation visualization method for Ion AmpliSeq™ target resequencing data. Unlike the hybridization capture-based target genome enrichment system, Ion AmpliSeq™ reads are multiplex PCR products, and each read generated by the same amplicon is quite uniform in length and position. Based on this feature, the depth of coverage information for each amplicon included in the barcode/amplicon coverage matrix file was used for copy number detection analysis. We also performed copy number analysis to investigate the utility of this method through the use of positive controls and a large Japanese hearing loss cohort. RESULTS Using this method, we successfully confirmed previously reported copy number loss cases involving the STRC gene and copy number gain in trisomy 21 cases. We also performed copy number analysis of a large Japanese hearing loss cohort (2,475 patients) and identified many gene copy number variants. The most prevalent copy number variation was STRC gene copy number loss, with 129 patients carrying this copy number variation. CONCLUSION Our copy number visualization method for Ion AmpliSeq™ data can be utilized in efficient copy number analysis for the comparison of a large number of samples. This method is simple and requires only easy calculations using standard spread sheet software.
Collapse
Affiliation(s)
- Shin‐ya Nishio
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| | - Hideaki Moteki
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| | - Shin‐ichi Usami
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| |
Collapse
|
129
|
Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
Affiliation(s)
- Prashanthi Dharanipragada
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Sriharsha Vogeti
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
130
|
Semeraro R, Orlandini V, Magi A. Xome-Blender: A novel cancer genome simulator. PLoS One 2018; 13:e0194472. [PMID: 29621252 PMCID: PMC5886411 DOI: 10.1371/journal.pone.0194472] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 02/05/2018] [Indexed: 11/18/2022] Open
Abstract
The adoption of next generation sequencing based methods in cancer research allowed for the investigation of the complex genetic structure of tumor samples. In the last few years, considerable importance was given to the research of somatic variants and several computational approaches were developed for this purpose. Despite continuous improvements to these programs, the validation of their results it’s a hard challenge due to multiple sources of error. To overcome this drawback different simulation approaches are used to generate synthetic samples but they are often based on the addition of artificial mutations that mimic the complexity of genomic variations. For these reasons, we developed a novel software, Xome-Blender, that generates synthetic cancer genomes with user defined features such as the number of subclones, the number of somatic variants and the presence of copy number alterations (CNAs), without the addition of any synthetic element. The singularity of our method is the “morphological approach” used to generate mutation events. To demonstrate the power of our tool we used it to address the hard challenge of evaluating the performance of nine state-of-the-art somatic variant calling methods for small and large variants (VarScan2, MuTect, Shimmer, BCFtools, Strelka, EXCAVATOR2, Control-FREEC and CopywriteR). Through these analyses we observed that by using Xome-Blender data it is possible to appraise small differences between their performance and we have designated VarScan2 and EXCAVATOR2 as best tool for this kind of applications. Xome-Blender is unix-based, licensed under the GPLv3 and freely available at https://github.com/rsemeraro/XomeBlender.
Collapse
Affiliation(s)
- Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
- * E-mail:
| | - Valerio Orlandini
- Medical Genetics Unit, Meyer Children’s University Hospital, Florence, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
131
|
Eisfeldt J, Nilsson D, Andersson-Assarsson JC, Lindstrand A. AMYCNE: Confident copy number assessment using whole genome sequencing data. PLoS One 2018; 13:e0189710. [PMID: 29579039 PMCID: PMC5868770 DOI: 10.1371/journal.pone.0189710] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/30/2017] [Indexed: 11/18/2022] Open
Abstract
Copy number variations (CNVs) within the human genome have been linked to a diversity of inherited diseases and phenotypic traits. The currently used methodology to measure copy numbers has limited resolution and/or precision, especially for regions with more than 4 copies. Whole genome sequencing (WGS) offers an alternative data source to allow for the detection and characterization of the copy number across different genomic regions in a single experiment. A plethora of tools have been developed to utilize WGS data for CNV detection. None of these tools are designed specifically to accurately estimate copy numbers of complex regions in a small cohort or clinical setting. Herein, we present AMYCNE (automatic modeling functionality for copy number estimation), a CNV analysis tool using WGS data. AMYCNE is multifunctional and performs copy number estimation of complex regions, annotation of VCF files, and CNV detection on individual samples. The performance of AMYCNE was evaluated using AMY1A ddPCR measurements from 86 unrelated individuals. In addition, we validated the accuracy of AMYCNE copy number predictions on two additional genes (FCGR3A and FCGR3B) using datasets available through the 1000 genomes consortium. Finally, we simulated levels of mosaic loss and gain of chromosome X and used this dataset for benchmarking AMYCNE. The results show a high concordance between AMYCNE and ddPCR, validating the use of AMYCNE to measure tandem AMY1 repeats with high accuracy. This opens up new possibilities for the use of WGS for accurate copy number determination of other complex regions in the genome in small cohorts or single individuals.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, and Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden
- * E-mail:
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, and Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | | | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, and Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
132
|
Yuan X, Zhang J, Yang L, Bai J, Fan P. Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data. IEEE Trans Nanobioscience 2018; 17:12-20. [PMID: 29570071 DOI: 10.1109/tnb.2017.2783910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Analyzing copy number variations (CNVs) from next-generation sequencing (NGS) data has become a common approach to detect disease susceptibility genes. The main challenge is how to utilize the NGS data with limited coverage depth to detect significant CNVs. Here, we introduce a new statistical method, the derivative of correlation coefficient (DCC), to detect significant CNVs that recurrently occur in multiple samples using read depth signals. We use a sliding window to calculate a correlation coefficient for each genome bin, and compute corresponding derivatives by fitting curves to the correlation coefficient. Then, the detection of significant CNVs was transformed into a problem of detecting significant derivatives reflecting genome breakpoints that can be solved using statistical hypothesis testing. We tested and compared the performance of DCC against several peer methods using a large number of simulation data sets, and validated DCC using several real sequencing data sets derived from the European Genome-Phenome archive, DNA Data Bank of Japan, and the 1000 Genomes Project. Experimental results suggest that DCC is an effective approach for identifying CNVs, outperforming peer methods in the terms of detection power and accuracy. DCC can be used to detect significant or recurrent CNVs in various NGS data sets, thus providing useful information to study genomic mutations and find disease susceptibility genes.
Collapse
|
133
|
Malekpour SA, Pezeshk H, Sadeghi M. MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples. Sci Rep 2018; 8:4009. [PMID: 29507384 PMCID: PMC5838159 DOI: 10.1038/s41598-018-22323-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 02/16/2018] [Indexed: 01/23/2023] Open
Abstract
Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
- Department of Mathematics and Statistics, Concordia University, Montreal, Canada.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
134
|
Zou J, Liu Y, Wang J, Liu Z, Lu Z, Chen Z, Li Z, Dong B, Huang W, Li Y, Gao J, Shen L. Establishment and genomic characterizations of patient-derived esophageal squamous cell carcinoma xenograft models using biopsies for treatment optimization. J Transl Med 2018; 16:15. [PMID: 29370817 PMCID: PMC5785825 DOI: 10.1186/s12967-018-1379-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/05/2018] [Indexed: 12/15/2022] Open
Abstract
Background Squamous cell carcinoma is the dominant type of esophageal cancer in China with many patients initially diagnosed at advanced stage. Patient-derived xenografts (PDX) models have been developed to be an important platform for preclinical research. This study aims to establish and characterize PDX models using biopsy tissue from advanced esophageal cancer patients to lay the foundation of preclinical application. Methods Fresh endoscopic biopsy tissues were harvested from patients with advanced esophageal cancer and implanted subcutaneously into NOD/SCID mice. Then, the PDXs were serially passaged for up to four generations. Transplantation was analyzed and genomic characteristics of xenografts were profiled using next-generation sequencing. Results Twenty-five PDX models were established (13.3%, 25/188). The latency period was 75.12 ± 19.87 days (50–120 days) for the first passage and it decreased with increasing passaging. Other than tumor stages, no differences were found between transplantations of xenografts and patient characteristics, irrespective of chemotherapy. Histopathological features and chemosensitivity of PDXs were in great accordance with primary patient tumors. Each PDX was assessed for molecular characteristics including copy number variations, somatic mutations, and signaling pathway abnormalities and these were similar to patient results. Conclusions Our PDX models were established from real time biopsies and molecularly profiled. They might be promising for drug development and individualized therapy. Electronic supplementary material The online version of this article (10.1186/s12967-018-1379-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jianling Zou
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Ying Liu
- Laboratory of Genetics, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Jingyuan Wang
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Zhentao Liu
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Zhihao Lu
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Zuhua Chen
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Zhongwu Li
- Department of Pathology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Bin Dong
- Department of Pathology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Beijing, China
| | - Wenwen Huang
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Yanyan Li
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China
| | - Jing Gao
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China.
| | - Lin Shen
- Department of Gastrointestinal Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital & Institute, Fu-Cheng Road 52, Hai-Dian District, Beijing, 100142, China.
| |
Collapse
|
135
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
136
|
Trost B, Walker S, Wang Z, Thiruvahindrapuram B, MacDonald JR, Sung WWL, Pereira SL, Whitney J, Chan AJS, Pellecchia G, Reuter MS, Lok S, Yuen RKC, Marshall CR, Merico D, Scherer SW. A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data. Am J Hum Genet 2018; 102:142-155. [PMID: 29304372 DOI: 10.1016/j.ajhg.2017.12.007] [Citation(s) in RCA: 122] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 12/07/2017] [Indexed: 12/30/2022] Open
Abstract
A remaining hurdle to whole-genome sequencing (WGS) becoming a first-tier genetic test has been accurate detection of copy-number variations (CNVs). Here, we used several datasets to empirically develop a detailed workflow for identifying germline CNVs >1 kb from short-read WGS data using read depth-based algorithms. Our workflow is comprehensive in that it addresses all stages of the CNV-detection process, including DNA library preparation, sequencing, quality control, reference mapping, and computational CNV identification. We used our workflow to detect rare, genic CNVs in individuals with autism spectrum disorder (ASD), and 120/120 such CNVs tested using orthogonal methods were successfully confirmed. We also identified 71 putative genic de novo CNVs in this cohort, which had a confirmation rate of 70%; the remainder were incorrectly identified as de novo due to false positives in the proband (7%) or parental false negatives (23%). In individuals with an ASD diagnosis in which both microarray and WGS experiments were performed, our workflow detected all clinically relevant CNVs identified by microarrays, as well as additional potentially pathogenic CNVs < 20 kb. Thus, CNVs of clinical relevance can be discovered from WGS with a detection rate exceeding microarrays, positioning WGS as a single assay for genetic variation detection.
Collapse
Affiliation(s)
- Brett Trost
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Susan Walker
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Zhuozhi Wang
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Jeffrey R MacDonald
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Wilson W L Sung
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Sergio L Pereira
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Joe Whitney
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Ada J S Chan
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Giovanna Pellecchia
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Miriam S Reuter
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Si Lok
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Ryan K C Yuen
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Christian R Marshall
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - Daniele Merico
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Deep Genomics Inc., Toronto, ON M5G 1L7, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; McLaughlin Centre, University of Toronto, Toronto, ON M5G 0A4, Canada.
| |
Collapse
|
137
|
DNA sequence-level analyses reveal potential phenotypic modifiers in a large family with psychiatric disorders. Mol Psychiatry 2018; 23:2254-2265. [PMID: 29880880 PMCID: PMC6294736 DOI: 10.1038/s41380-018-0087-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 03/30/2018] [Accepted: 04/09/2018] [Indexed: 02/07/2023]
Abstract
Psychiatric disorders are a group of genetically related diseases with highly polygenic architectures. Genome-wide association analyses have made substantial progress towards understanding the genetic architecture of these disorders. More recently, exome- and whole-genome sequencing of cases and families have identified rare, high penetrant variants that provide direct functional insight. There remains, however, a gap in the heritability explained by these complementary approaches. To understand how multiple genetic variants combine to modify both severity and penetrance of a highly penetrant variant, we sequenced 48 whole genomes from a family with a high loading of psychiatric disorder linked to a balanced chromosomal translocation. The (1;11)(q42;q14.3) translocation directly disrupts three genes: DISC1, DISC2, DISC1FP and has been linked to multiple brain imaging and neurocognitive outcomes in the family. Using DNA sequence-level linkage analysis, functional annotation and population-based association, we identified common and rare variants in GRM5 (minor allele frequency (MAF) > 0.05), PDE4D (MAF > 0.2) and CNTN5 (MAF < 0.01) that may help explain the individual differences in phenotypic expression in the family. We suggest that whole-genome sequencing in large families will improve the understanding of the combined effects of the rare and common sequence variation underlying psychiatric phenotypes.
Collapse
|
138
|
Abstract
Whole-genome sequencing with short-read technologies is well suited for calling single nucleotide polymorphisms, but has major problems with the detection of structural variants larger than the read length. One such type of variation is copy number variation (CNV), which entails deletion or duplication of genomic regions, and the expansion or contraction of repeated elements. Duplicated and deleted regions will typically be collapsed during de novo assembly of sequence data, or ignored when mapping reads toward a reference. However, signatures of the copy number variation can be detected in the resultant read depth at each position in the genome. We here provide instructions on how to analyze this read depth signal with the R package CNOGpro, allowing for estimation of copy numbers with uncertainty for each feature in a genome.
Collapse
|
139
|
Abstract
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
Collapse
Affiliation(s)
- Aurélien Macé
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
140
|
DNase-capture reveals differential transcription factor binding modalities. PLoS One 2017; 12:e0187046. [PMID: 29284001 PMCID: PMC5746236 DOI: 10.1371/journal.pone.0187046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 10/12/2017] [Indexed: 11/19/2022] Open
Abstract
We describe DNase-capture, an assay that increases the analytical resolution of DNase-seq by focusing its sequencing phase on selected genomic regions. We introduce a new method to compensate for capture bias called BaseNormal that allows for accurate recovery of transcription factor protection profiles from DNase-capture data. We show that these normalized data allow for nuanced detection of transcription factor binding heterogeneity with as few as dozens of sites.
Collapse
|
141
|
Li L, Leung AKY, Kwok TP, Lai YYY, Pang IK, Chung GTY, Mak ACY, Poon A, Chu C, Li M, Wu JJK, Lam ET, Cao H, Lin C, Sibert J, Yiu SM, Xiao M, Lo KW, Kwok PY, Chan TF, Yip KY. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps. Genome Biol 2017; 18:230. [PMID: 29195502 PMCID: PMC5709945 DOI: 10.1186/s13059-017-1356-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 11/03/2017] [Indexed: 12/20/2022] Open
Abstract
We present a new method, OMSV, for accurately and comprehensively identifying structural variations (SVs) from optical maps. OMSV detects both homozygous and heterozygous SVs, SVs of various types and sizes, and SVs with or without creating or destroying restriction sites. We show that OMSV has high sensitivity and specificity, with clear performance gains over the latest method. Applying OMSV to a human cell line, we identified hundreds of SVs >2 kbp, with 68 % of them missed by sequencing-based callers. Independent experimental validation confirmed the high accuracy of these SVs. The OMSV software is available at http://yiplab.cse.cuhk.edu.hk/omsv/ .
Collapse
Affiliation(s)
- Le Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Alden King-Yung Leung
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Tsz-Piu Kwok
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Yvonne Y Y Lai
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA
| | - Iris K Pang
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Grace Tin-Yun Chung
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Angel C Y Mak
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA
| | - Annie Poon
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA
| | - Catherine Chu
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA
| | - Menglu Li
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong
| | - Jacob J K Wu
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong
| | | | - Han Cao
- BioNano Genomics, San Diego, California, USA
| | - Chin Lin
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA
| | - Justin Sibert
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, USA
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, USA
| | - Kwok-Wai Lo
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
| | - Ting-Fung Chan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. .,CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| |
Collapse
|
142
|
Eitan R, Shamir R. Reconstructing cancer karyotypes from short read data: the half empty and half full glass. BMC Bioinformatics 2017; 18:488. [PMID: 29141589 PMCID: PMC5688766 DOI: 10.1186/s12859-017-1929-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 11/06/2017] [Indexed: 02/01/2023] Open
Abstract
Background During cancer progression genomes undergo point mutations as well as larger segmental changes. The latter include, among others, segmental deletions duplications, translocations and inversions.The result is a highly complex, patient-specific cancer karyotype. Using high-throughput technologies of deep sequencing and microarrays it is possible to interrogate a cancer genome and produce chromosomal copy number profiles and a list of breakpoints (“jumps”) relative to the normal genome. This information is very detailed but local, and does not give the overall picture of the cancer genome. One of the basic challenges in cancer genome research is to use such information to infer the cancer karyotype. We present here an algorithmic approach, based on graph theory and integer linear programming, that receives segmental copy number and breakpoint data as input and produces a cancer karyotype that is most concordant with them. We used simulations to evaluate the utility of our approach, and applied it to real data. Results By using a simulation model, we were able to estimate the correctness and robustness of the algorithm in a spectrum of scenarios. Under our base scenario, designed according to observations in real data, the algorithm correctly inferred 69% of the karyotypes. However, when using less stringent correctness metrics that account for incomplete and noisy data, 87% of the reconstructed karyotypes were correct. Furthermore, in scenarios where the data were very clean and complete, accuracy rose to 90%–100%. Some examples of analysis of real data, and the reconstructed karyotypes suggested by our algorithm, are also presented. Conclusion While reconstruction of complete, perfect karyotype based on short read data is very hard, a large fraction of the reconstruction will still be correct and can provide useful information. Electronic supplementary material The online version of this article (10.1186/s12859-017-1929-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rami Eitan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel.
| |
Collapse
|
143
|
WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data. Eur J Hum Genet 2017; 25:1354-1363. [PMID: 29255179 PMCID: PMC5865163 DOI: 10.1038/s41431-017-0005-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 07/01/2017] [Accepted: 08/01/2017] [Indexed: 01/21/2023] Open
Abstract
In clinical genetics, detection of single nucleotide polymorphisms (SNVs) as well as copy number variations (CNVs) is essential for patient genotyping. Obtaining both CNV and SNV information from WES data would significantly simplify clinical workflow. Unfortunately, the sequence reads obtained with WES vary between samples, complicating accurate CNV detection with WES. To avoid being dependent on other samples, we developed a within-sample comparison approach (WISExome). For every (WES) target region on the genome, we identified a set of reference target regions elsewhere on the genome with similar read frequency behavior. For a new sample, aberrations are detected by comparing the read frequency of a target region with the distribution of read frequencies in the reference set. WISExome correctly identifies known pathogenic CNVs (range 4 Kb–5.2 Mb). Moreover, WISExome prioritizes pathogenic CNVs by sorting them on quality and annotations of overlapping genes in OMIM. When comparing WISExome to four existing CNV detection tools, we found that CoNIFER detects much fewer CNVs and XHMM breaks calls made by other tools into smaller calls (fragmentation). CODEX and CLAMMS seem to perform more similar to WISExome. CODEX finds all known pathogenic CNVs, but detects much more calls than all other methods. CLAMMS and WISExome agree the most. CLAMMS does, however, miss one of the known CNVs and shows slightly more fragmentation. Taken together, WISExome is a promising tool for genome diagnostics laboratories as the workflow can be solely based on WES data.
Collapse
|
144
|
Lin JY, Le BH, Chen M, Henry KF, Hur J, Hsieh TF, Chen PY, Pelletier JM, Pellegrini M, Fischer RL, Harada JJ, Goldberg RB. Similarity between soybean and Arabidopsis seed methylomes and loss of non-CG methylation does not affect seed development. Proc Natl Acad Sci U S A 2017; 114:E9730-E9739. [PMID: 29078418 PMCID: PMC5692608 DOI: 10.1073/pnas.1716758114] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
We profiled soybean and Arabidopsis methylomes from the globular stage through dormancy and germination to understand the role of methylation in seed formation. CHH methylation increases significantly during development throughout the entire seed, targets primarily transposable elements (TEs), is maintained during endoreduplication, and drops precipitously within the germinating seedling. By contrast, no significant global changes in CG- and CHG-context methylation occur during the same developmental period. An Arabidopsis ddcc mutant lacking CHH and CHG methylation does not affect seed development, germination, or major patterns of gene expression, implying that CHH and CHG methylation does not play a significant role in seed development or in regulating seed gene activity. By contrast, over 100 TEs are transcriptionally de-repressed in ddcc seeds, suggesting that the increase in CHH-context methylation may be a failsafe mechanism to reinforce transposon silencing. Many genes encoding important classes of seed proteins, such as storage proteins, oil biosynthesis enzymes, and transcription factors, reside in genomic regions devoid of methylation at any stage of seed development. Many other genes in these classes have similar methylation patterns, whether the genes are active or repressed. Our results suggest that methylation does not play a significant role in regulating large numbers of genes important for programming seed development in both soybean and Arabidopsis. We conclude that understanding the mechanisms controlling seed development will require determining how cis-regulatory elements and their cognate transcription factors are organized in genetic regulatory networks.
Collapse
Affiliation(s)
- Jer-Young Lin
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Brandon H Le
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Min Chen
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Kelli F Henry
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Jungim Hur
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Tzung-Fu Hsieh
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720
| | - Pao-Yang Chen
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Julie M Pelletier
- Section of Plant Biology, Division of Biological Sciences, University of California, Davis, CA 95616
| | - Matteo Pellegrini
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095
| | - Robert L Fischer
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720
| | - John J Harada
- Section of Plant Biology, Division of Biological Sciences, University of California, Davis, CA 95616
| | - Robert B Goldberg
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA 90095;
| |
Collapse
|
145
|
do Nascimento F, Guimaraes KS. Copy Number Variations Detection: Unravelling the Problem in Tangible Aspects. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1237-1250. [PMID: 27295681 DOI: 10.1109/tcbb.2016.2576441] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In the midst of the important genomic variants associated to the susceptibility and resistance to complex diseases, Copy Number Variations (CNV) has emerged as a prevalent class of structural variation. Following the flood of next-generation sequencing data, numerous tools publicly available have been developed to provide computational strategies to identify CNV at improved accuracy. This review goes beyond scrutinizing the main approaches widely used for structural variants detection in general, including Split-Read, Paired-End Mapping, Read-Depth, and Assembly-based. In this paper, (1) we characterize the relevant technical details around the detection of CNV, which can affect the estimation of breakpoints and number of copies, (2) we pinpoint the most important insights related to GC-content and mappability biases, and (3) we discuss the paramount caveats in the tools evaluation process. The points brought out in this study emphasize common assumptions, a variety of possible limitations, valuable insights, and directions for desirable contributions to the state-of-the-art in CNV detection tools.
Collapse
|
146
|
Tan R, Wang J, Wu X, Juan L, Zheng L, Ma R, Zhan Q, Wang T, Jin S, Jiang Q, Wang Y. ERDS-exome: a Hybrid Approach for Copy Number Variant Detection from Whole-exome Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 17:796-803. [PMID: 28981421 DOI: 10.1109/tcbb.2017.2758779] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Copy number variants (CNVs) play important roles in human disease and evolution. With the rapid development of next-generation sequencing technologies, many tools have been developed for inferring CNVs based on whole-exome sequencing (WES) data. However, as a result of the sparse distribution of exons in the genome, the limitations of the WES technique, and the nature of high-level signal noises in WES data, the efficacy of these variants remains less than desirable. Thus, there is need for the development of an effective tool to achieve a considerable power in WES CNVs discovery. In the present study, we describe a novel method, Estimation by Read Depth (RD) with Single-nucleotide variants from exome sequencing data (ERDS-exome). ERDS-exome employs a hybrid normalization approach to normalize WES data and to incorporate RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Based on systematic evaluations of real data from the 1000 Genomes Project using other state-of-the-art tools, we observed that ERDS-exome demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-exome is publicly available at: https://erds-exome.github.io.
Collapse
|
147
|
Sohrabi SS, Mohammadabadi M, Wu DD, Esmailizadeh A. Detection of breed-specific copy number variations in domestic chicken genome. Genome 2017; 61:7-14. [PMID: 28961404 DOI: 10.1139/gen-2017-0016] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Copy number variations (CNVs) are important large-scale variants. They are widespread in the genome and may contribute to phenotypic variation. Detection and characterization of CNVs can provide new insights into the genetic basis of important traits. Here, we perform whole-genome short read sequence analysis to identify CNVs in two indigenous and commercial chicken breeds to evaluate the impact of the identified CNVs on breed-specific traits. After filtration, a total of 12 955 CNVs spanning (on average) about 9.42% of the chicken genome were found that made up 5467 CNV regions (CNVRs). Chicken quantitative trait loci (QTL) datasets and Ensembl gene annotations were used as resources for the estimation of potential phenotypic effects of our CNVRs on breed-specific traits. In total, 34% of our detected CNVRs were also detected in earlier CNV studies. These CNVRs partly overlap several previously reported QTL and gene ontology terms associated with some important traits, including shank length QTL in Creeper-specific CNVRs and body weight and egg production characteristics, as well as muscle and body organ growth, in the Arian commercial breed. Our findings provide new insights into the genomic structure of the chicken genome for an improved understanding of the potential roles of CNVRs in differentiating between breeds or lines.
Collapse
Affiliation(s)
- Saeed S Sohrabi
- a Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran.,b Young Researchers Society, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
| | - Mohammadreza Mohammadabadi
- a Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
| | - Dong-Dong Wu
- c State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,d Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China
| | - Ali Esmailizadeh
- a Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran.,c State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
148
|
Cao Y, Jin Y, Yu J, Wang J, Yan J, Zhao Q. Research progress of neuroblastoma related gene variations. Oncotarget 2017; 8:18444-18455. [PMID: 28055978 PMCID: PMC5392342 DOI: 10.18632/oncotarget.14408] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 12/27/2016] [Indexed: 01/08/2023] Open
Abstract
Neuroblastoma, the most common extracranial solid tumor among children, is an embryonal tumor originating from undifferentiated neural crest cell. Neuroblastomas are highly heterogeneous, represented by the wide range of clinical presentations and likelihood of cure, ranging from spontaneous regression to relentless progression despite rigorous multimodal treatments. Approximately, 50% of cases are high-risk with overall survival rates less than 40%. With the efforts to collect large numbers of clinically annotated specimens and the advancements in technologies, researchers have revealed numerous genetic alterations that may drive tumor growth. However, the most lack mutations in genes that are recurrently mutated, which inspires researchers to identify disrupted pathways instead of single mutated genes to unearth biological systems perturbed in neuroblastoma. Stratification of patients and target therapy based on their molecular signatures have been the center of focus. This review provides a comprehensive summary of the recent advances in identification of candidate genes variations, targeted approaches to high-risk neuroblastoma and evaluates the methods utilized for detection, which will provide new avenues to develop therapies and further genetic researches.
Collapse
Affiliation(s)
- Yanna Cao
- Department of Pediatric Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| | - Yan Jin
- Department of Pediatric Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| | - Jinpu Yu
- Department of Cancer Molecular Diagnostic Center, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| | - Jingfu Wang
- Department of Pediatric Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| | - Jie Yan
- Department of Pediatric Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| | - Qiang Zhao
- Department of Pediatric Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy of Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin, P.R. China
| |
Collapse
|
149
|
XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments. BMC Genomics 2017; 18:747. [PMID: 28934930 PMCID: PMC5609061 DOI: 10.1186/s12864-017-4137-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 09/11/2017] [Indexed: 11/10/2022] Open
Abstract
Background We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments. Results By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools. Conclusion All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4137-0) contains supplementary material, which is available to authorized users.
Collapse
|
150
|
Gao J, Wang H, Zang W, Li B, Rao G, Li L, Yu Y, Li Z, Dong B, Lu Z, Jiang Z, Shen L. Circulating tumor DNA functions as an alternative for tissue to overcome tumor heterogeneity in advanced gastric cancer. Cancer Sci 2017; 108:1881-1887. [PMID: 28677165 PMCID: PMC5581520 DOI: 10.1111/cas.13314] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 06/28/2017] [Accepted: 07/02/2017] [Indexed: 12/11/2022] Open
Abstract
Overcoming tumor heterogeneity is a major challenge for personalized treatment of gastric cancer, especially for human epidermal growth factor receptor‐2 targeted therapy. Analysis of circulating tumor DNA allows a more comprehensive analysis of tumor heterogeneity than traditional biopsies in lung cancer and breast cancer, but little is known in gastric cancer. We assessed mutation profiles of ctDNA and primary tumors from 30 patients with advanced gastric cancer, then performed a comprehensive analysis of tumor mutations by multiple biopsies from five patients, and finally analyzed the concordance of HER2 amplification in ctDNA and paired tumor tissues in 70 patients. By comparing with a single tumor sample, ctDNA displayed a low concordance of mutation profile, only approximately 50% (138/275) somatic mutations were found in paired tissue samples, however, when compared with multiple biopsies, most DNA mutations in ctDNA were also shown in paired tumor tissues. ctDNA had a high concordance (91.4%, Kappa index = 0.784, P < 0.001) of HER2 amplification with tumor tissues, suggesting it might be an alternative for tissue. It implied that ctDNA‐based assessment could partially overcome the tumor heterogeneity, and might serve as a potential surrogate for HER2 analysis in gastric cancer.
Collapse
Affiliation(s)
- Jing Gao
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Haixing Wang
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Wanchun Zang
- Novogene Bioinformatics Institute, Beijing, China
| | - Beifang Li
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Guanhua Rao
- Novogene Bioinformatics Institute, Beijing, China
| | - Lei Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Yang Yu
- Novogene Bioinformatics Institute, Beijing, China
| | - Zhongwu Li
- Department of pathology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Bin Dong
- Department of pathology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Zhihao Lu
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| | - Zhi Jiang
- Novogene Bioinformatics Institute, Beijing, China
| | - Lin Shen
- Department of Gastrointestinal Oncology, Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, Beijing, China
| |
Collapse
|