201
|
Liu L, Huang J, Wang K, Li L, Li Y, Yuan J, Wei S. Identification of hallmarks of lung adenocarcinoma prognosis using whole genome sequencing. Oncotarget 2016; 6:38016-28. [PMID: 26497366 PMCID: PMC4741981 DOI: 10.18632/oncotarget.5697] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 09/30/2015] [Indexed: 11/25/2022] Open
Abstract
In conjunction with clinical characteristics, prognostic biomarkers are essential for choosing optimal therapies to lower the mortality of lung adenocarcinoma. Whole genome sequencing (WGS) of 7 cancerous-noncancerous tissue pairs was performed to explore the comparative copy number variations (CNVs) associated with lung adenocarcinoma. The frequencies of top ranked CNVs were verified in an independent set of 114 patients and then the roles of target CNVs in disease prognosis were assessed in 313 patients. The WGS yielded 2604 CNVs. After frequency validation and biological function screening of top 10 CNVs, 9 mutant driver genes from 7 CNVs were further analyzed for an association with survival. Compared with the PBXIP1 amplified copy number, unamplified carriers had a 0.62-fold (95%CI = 0.43–0.91) decreased risk of death. Compared with an amplified TERT, those with an unamplified TERT had a 35% reduction (95% CI = 3%–56%) in risk of lung adenocarcinoma progression. Cases with both unamplified PBXIP1 and TERT had a median 34.32-month extension of overall survival and 34.55-month delay in disease progression when compared with both amplified CNVs. This study demonstrates that CNVs of TERT and PBXIP1 have the potential to translate into the clinic and be used to improve outcomes for patients with this fatal disease.
Collapse
Affiliation(s)
- Li Liu
- Department of Epidemiology and Biostatistics, and the Ministry of Education Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| | - Jiao Huang
- Department of Epidemiology and Biostatistics, and the Ministry of Education Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| | - Ke Wang
- Department of Epidemiology and Biostatistics, and the Ministry of Education Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| | - Li Li
- Department of Epidemiology and Biostatistics, and the Ministry of Education Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| | - Yangkai Li
- Department of Thoracic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| | - Jingsong Yuan
- Department of Radiation Oncology, Center for Radiological Research, Columbia University Medical Center, New York, NY, USA
| | - Sheng Wei
- Department of Epidemiology and Biostatistics, and the Ministry of Education Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, PR China
| |
Collapse
|
202
|
Zhang C, Cai H, Huang J, Song Y. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinformatics 2016; 17:384. [PMID: 27639558 PMCID: PMC5027123 DOI: 10.1186/s12859-016-1239-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 09/04/2016] [Indexed: 02/02/2023] Open
Abstract
Background Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. Results We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Conclusions Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1239-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Changsheng Zhang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hongmin Cai
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China.
| | - Jingying Huang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yan Song
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
203
|
Liang Y, Qiu K, Liao B, Zhu W, Huang X, Li L, Chen X, Li K. Seeksv: an accurate tool for somatic structural variation and virus integration detection. Bioinformatics 2016; 33:184-191. [DOI: 10.1093/bioinformatics/btw591] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 07/29/2016] [Accepted: 09/06/2016] [Indexed: 01/31/2023] Open
|
204
|
Lih CJ, Si H, Das B, Harrington RD, Harper KN, Sims DJ, McGregor PM, Camalier CE, Kayserian AY, Williams PM, He HJ, Almeida JL, Lund SP, Choquette S, Cole KD. Certified DNA Reference Materials to Compare HER2 Gene Amplification Measurements Using Next-Generation Sequencing Methods. J Mol Diagn 2016; 18:753-761. [PMID: 27455875 PMCID: PMC5397679 DOI: 10.1016/j.jmoldx.2016.05.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Revised: 05/12/2016] [Accepted: 05/27/2016] [Indexed: 01/29/2023] Open
Abstract
The National Institute of Standards and Technology (NIST) Standard Reference Materials 2373 is a set of genomic DNA samples prepared from five breast cancer cell lines with certified values for the ratio of the HER2 gene copy number to the copy numbers of reference genes determined by real-time quantitative PCR and digital PCR. Targeted-amplicon, whole-exome, and whole-genome sequencing measurements were used with the reference material to compare the performance of both the laboratory steps and the bioinformatic approaches of the different methods using a range of amplification ratios. Although good reproducibility was observed in each next-generation sequencing method, slightly different HER2 copy numbers associated with platform-specific biases were obtained. This study clearly demonstrates the value of Standard Reference Materials 2373 as reference material and as a calibrator for evaluating assay performance as well as for increasing confidence in reporting HER2 amplification for clinical applications.
Collapse
Affiliation(s)
- Chih-Jian Lih
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Han Si
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Biswajit Das
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Robin D Harrington
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Kneshay N Harper
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - David J Sims
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Paul M McGregor
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Corinne E Camalier
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Andrew Y Kayserian
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - P Mickey Williams
- Molecular Characterization and Clinical Assay Development Laboratory, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Hua-Jun He
- Division of Biosystems and Biomaterials, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - Jamie L Almeida
- Division of Biosystems and Biomaterials, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - Steve P Lund
- Division of Statistical Engineering, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - Steve Choquette
- Division of Biosystems and Biomaterials, National Institute of Standards and Technology, Gaithersburg, Maryland
| | - Kenneth D Cole
- Division of Biosystems and Biomaterials, National Institute of Standards and Technology, Gaithersburg, Maryland.
| |
Collapse
|
205
|
Choi JW, Chung WH, Lim KS, Lim WJ, Choi BH, Lee SH, Kim HC, Lee SS, Cho ES, Lee KT, Kim N, Kim JD, Kim JB, Chai HH, Cho YM, Kim TH, Lim D. Copy number variations in Hanwoo and Yanbian cattle genomes using the massively parallel sequencing data. Gene 2016; 589:36-42. [PMID: 27188257 DOI: 10.1016/j.gene.2016.05.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 04/28/2016] [Accepted: 05/12/2016] [Indexed: 11/29/2022]
Abstract
Hanwoo is an indigenous Korean beef cattle breed, and it shared an ancestor with Yanbian cattle that are found in the Northeast provinces in China until the last century. During recent decades, those cattle breeds experienced different selection pressures. Here, we present genome-wide copy number variations (CNVs) by comparing Hanwoo and Yanbian cattle sequencing data. We used ~3.12 and ~3.07 billion sequence reads from Hanwoo and Yanbian cattle, respectively. A total of 901 putative CNV regions (CNVRs) were identified throughout the genome, representing 5,513,340bp. This is a smaller number than has been reported in previous studies, indicating that Hanwoo are genetically close to Yanbian cattle. Of the CNVRs, 53.2% and 46.8% were found to be gains and losses in Hanwoo. Potential functional roles of each CNVR were assessed by annotating all CNVRs and gene ontology (GO) enrichment analysis. We found that 278 CNVRs overlapped with cattle gene-sets (genic-CNVRs) that could be promising candidates to account for economically important traits in cattle. The enrichment analysis indicated that genes were significantly over-represented in GO terms, including developmental process, multicellular organismal process, reproduction, and response to stimulus. These results provide a valuable genomic resource for determining how CNVs are associated with cattle traits.
Collapse
Affiliation(s)
- Jung-Woo Choi
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea; College of Animal Life Science, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Won-Hyong Chung
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Kyu-Sang Lim
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Won-Jun Lim
- Personalized Genomic Medicine Research Center, Division of Strategic Research Groups, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Korea; Department of Functional Genomics, Korea University of Science and Technology, Daejeon 34141, Republic of Korea
| | - Bong-Hwan Choi
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Seung-Hwan Lee
- Division of Animal and Dairy Science, Chung Nam National University, Daejeon 305-764, Republic of Korea
| | - Hyeong-Cheol Kim
- Hanwoo Experiment Station, National Institute of Animal Science, RDA, Pyeongchang 232-950, Korea
| | - Seung-Soo Lee
- Animal Genetic and Breeding Division, National Institute of Animal Science, Cheon-An 331-808, Korea
| | - Eun-Seok Cho
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Kyung-Tai Lee
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Namshin Kim
- Personalized Genomic Medicine Research Center, Division of Strategic Research Groups, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Korea; Department of Functional Genomics, Korea University of Science and Technology, Daejeon 34141, Republic of Korea
| | - Jeong-Dae Kim
- College of Animal Life Science, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Jong-Bok Kim
- College of Animal Life Science, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Han-Ha Chai
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Yong-Min Cho
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Tae-Hun Kim
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea
| | - Dajeong Lim
- Division of Animal Genomics & Bioinformatics, National Institute of Animal Science, RDA, Jeonju 565-851, Republic of Korea.
| |
Collapse
|
206
|
Implementation of next-generation sequencing for molecular diagnosis of hereditary breast and ovarian cancer highlights its genetic heterogeneity. Breast Cancer Res Treat 2016; 159:245-56. [PMID: 27553368 DOI: 10.1007/s10549-016-3948-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 08/16/2016] [Indexed: 01/13/2023]
Abstract
Molecular diagnosis of hereditary breast and ovarian cancer (HBOC) by standard methodologies has been limited to the BRCA1 and BRCA2 genes. With the recent development of new sequencing methodologies, the speed and efficiency of DNA testing have dramatically improved. The aim of this work was to validate the use of next-generation sequencing (NGS) for the detection of BRCA1/BRCA2 point mutations in a diagnostic setting and to study the role of other genes associated with HBOC in Portuguese families. A cohort of 94 high-risk families was included in the study, and they were initially screened for the two common founder mutations with variant-specific methods. Fourteen index patients were shown to carry the Portuguese founder mutation BRCA2 c.156_157insAlu, and the remaining 80 were analyzed in parallel by Sanger sequencing for the BRCA1/BRCA2 genes and by NGS for a panel of 17 genes that have been described as involved in predisposition to breast and/or ovarian cancer. A total of 506 variants in the BRCA1/BRCA2 genes were detected by both methodologies, with a 100 % concordance between them. This strategy allowed the detection of a total of 39 deleterious mutations in the 94 index patients, namely 10 in BRCA1 (25.6 %), 21 in BRCA2 (53.8 %), four in PALB2 (10.3 %), two in ATM (5.1 %), one in CHEK2 (2.6 %), and one in TP53 (2.6 %), with 20.5 % of the deleterious mutations being found in genes other than BRCA1/BRCA2. These results demonstrate the efficiency of NGS for the detection of BRCA1/BRCA2 point mutations and highlight the genetic heterogeneity of HBOC.
Collapse
|
207
|
Hirakawa H, Kaur P, Shirasawa K, Nichols P, Nagano S, Appels R, Erskine W, Isobe SN. Draft genome sequence of subterranean clover, a reference for genus Trifolium. Sci Rep 2016; 6:30358. [PMID: 27545089 PMCID: PMC4992838 DOI: 10.1038/srep30358] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/04/2016] [Indexed: 11/26/2022] Open
Abstract
Clovers (genus Trifolium) are widely cultivated across the world as forage legumes and make a large contribution to livestock feed production and soil improvement. Subterranean clover (T. subterraneum L.) is well suited for genomic and genetic studies as a reference species in the Trifolium genus, because it is an annual with a simple genome structure (autogamous and diploid), unlike the other economically important perennial forage clovers, red clover (T. pratense) and white clover (T. repens). This report represents the first draft genome sequence of subterranean clover. The 471.8 Mb assembled sequence covers 85.4% of the subterranean clover genome and contains 42,706 genes. Eight pseudomolecules of 401.1 Mb in length were constructed, based on a linkage map consisting of 35,341 SNPs. The comparative genomic analysis revealed that different clover chromosomes showed different degrees of conservation with other Papilionoideae species. These results provide a reference for genetic and genomic analyses in the genus Trifolium and new insights into evolutionary divergence in Papilionoideae species.
Collapse
Affiliation(s)
- Hideki Hirakawa
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba 292-0818, Japan
| | - Parwinder Kaur
- Centre for Plant Genetics and Breeding, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Kenta Shirasawa
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba 292-0818, Japan
| | - Phillip Nichols
- Department of Agriculture and Food Western Australia, 3 Baron-Hay Court, South Perth, WA 6151, Australia.,School of Plant Biology, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Soichiro Nagano
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba 292-0818, Japan
| | - Rudi Appels
- Murdoch University, 90 South Street, Murdoch, WA 6150, Australia
| | - William Erskine
- Centre for Plant Genetics and Breeding, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Sachiko N Isobe
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba 292-0818, Japan
| |
Collapse
|
208
|
Malekpour SA, Pezeshk H, Sadeghi M. MGP-HMM: Detecting genome-wide CNVs using an HMM for modeling mate pair insertion sizes and read counts. Math Biosci 2016; 279:53-62. [PMID: 27424951 DOI: 10.1016/j.mbs.2016.07.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 06/12/2016] [Accepted: 07/10/2016] [Indexed: 01/02/2023]
Abstract
MOTIVATION Association of Copy Number Variation (CNV) with schizophrenia, autism, developmental disabilities and fatal diseases such as cancer is verified. Recent developments in Next Generation Sequencing (NGS) have facilitated the CNV studies. However, many of the current CNV detection tools are not capable of discriminating tandem duplication from non-tandem duplications. RESULTS In this study, we propose MGP-HMM as a tool which besides detecting genome-wide deletions discriminates tandem duplications from non-tandem duplications. MGP-HMM takes mate pair abnormalities into account and predicts the digitized number of tandem or non-tandem copies. Abnormalities in the mate pair directions and insertion sizes, after being mapped to the reference genome, are elucidated using a Hidden Markov Model (HMM). For this purpose, a Mixture Gaussian density with time-dependent parameters is applied for emitting mate pair insertion sizes from HMM states. Indeed, depending on observed abnormalities in mate pair insertion size or its orientation, each component in the mixture density will have different parameters. MGP-HMM also applies a Poisson distribution for modeling read depth data. This parametric modeling of the mate pair reads enables us to estimate the length of CNVs precisely, which is an advantage over methods which rely only on read depth approach for the CNV detection. Hidden state of the proposed HMM is the digitized copy number of a genomic segment and states correspond to the multipliers of the mixture Gaussian components. The accuracy of our model is validated on a set of next generation sequencing real and simulated data and is compared to other tools.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
| |
Collapse
|
209
|
Hibsh D, Buetow KH, Yaari G, Efroni S. Quantification of read species behavior within whole genome sequencing of cancer genomes for the stratification and visualization of genomic variation. Nucleic Acids Res 2016; 44:e81. [PMID: 26809676 PMCID: PMC4872078 DOI: 10.1093/nar/gkw031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 01/11/2016] [Indexed: 11/13/2022] Open
Abstract
The cancer genome is abnormal genome, and the ability to monitor its sequence had undergone a technological revolution. Yet prognosis and diagnosis remain an expert-based decision, with only limited abilities to provide machine-based decisions. We introduce a heterogeneity-based method for stratifying and visualizing whole-genome sequencing (WGS) reads. This method uses the heterogeneity within WGS reads to markedly reduce the dimensionality of next-generation sequencing data; it is available through the tool HiBS (Heterogeneity-Based Subclassification) that allows cancer sample classification. We validated HiBS using >200 WGS samples from nine different cancer types from The Cancer Genome Atlas (TCGA). With HiBS, we show progress with two WGS related issues: (i) differentiation between normal (NB) and tumor (TP) samples based solely on the information structure of their WGS data, and (ii) identification of specific regions of chromosomal amplification/deletion and their association with tumor stage. By comparing results to those obtained through available WGS analyses tools, we demonstrate some of the novelties obtained by the approach implemented in HiBS and also show nearly perfect normal/tumor classification, used to identify known and unknown chromosomal aberrations. Finally, the HiBS index has been associated with breast cancer tumor stage.
Collapse
Affiliation(s)
- Dror Hibsh
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| | - Kenneth H Buetow
- Computational Sciences and Informatics Program, Complex Adaptive Systems Initiative, Arizona State University, Tempe AZ 85281, USA
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, Ramat Gan 52900, Israel
| | - Sol Efroni
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|
210
|
Kruglyak KM, Lin E, Ong FS. Next-Generation Sequencing and Applications to the Diagnosis and Treatment of Lung Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 890:123-36. [PMID: 26703802 DOI: 10.1007/978-3-319-24932-2_7] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Cancer is a genetic disease characterized by uncontrolled growth of abnormal cells. Over time, somatic mutations accumulate in the cells of an individual due to replication errors, chromosome segregation errors, or DNA damage. When not caught by traditional mechanisms, these somatic mutations can lead to cellular proliferation, the hallmark of cancer. Lung cancer is the leading cause of cancer-related mortality in the United States, accounting for approximately 160,000 deaths annually. Five year survival rates for lung cancer remain low (<50 %) for all stages, with even worse prognosis (<15 %) in late stage cases. Technological advances, including advances in next-generation sequencing (NGS), offer the vision of personalized medicine or precision oncology, wherein an individual's treatment can be based on his or her individual molecular profile, rather than on historical population-based medicine. Towards this end, NGS has already been used to identify new biomarker candidates for the early diagnosis of lung cancer and is increasingly used to guide personalized treatment decisions. In this review we will provide a high-level overview of NGS technology and summarize its application to the diagnosis and treatment of lung cancer. We will also describe how NGS can drive advances that bring us closer to precision oncology and discuss some of the technical challenges that will need to be overcome in order to realize this ultimate goal.
Collapse
Affiliation(s)
| | - Erick Lin
- Medical Affairs, Ambry Genetics, Inc., Aliso Viejo, CA, USA
| | - Frank S Ong
- Medical Affairs and Clinical Development, NantHealth, LLC, Culver City, CA, USA.
| |
Collapse
|
211
|
Roquis D, Rognon A, Chaparro C, Boissier J, Arancibia N, Cosseau C, Parrinello H, Grunau C. Frequency and mitotic heritability of epimutations inSchistosoma mansoni. Mol Ecol 2016; 25:1741-58. [DOI: 10.1111/mec.13555] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 01/22/2016] [Accepted: 01/23/2016] [Indexed: 12/28/2022]
Affiliation(s)
- David Roquis
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Anne Rognon
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Cristian Chaparro
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Jerome Boissier
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Nathalie Arancibia
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Celine Cosseau
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| | - Hugues Parrinello
- MGX - Montpellier GenomiX IBiSA, Institut de Génomique Fonctionnelle; 141, rue de la Cardonille F-34094 Montpellier Cedex 05 France
| | - Christoph Grunau
- Université de Perpignan Via Domitia; Perpignan F-66860 France
- CNRS; UMR 5244; Interactions Hôtes-Pathogènes-Environnements (IHPE); Perpignan F-66860 France
| |
Collapse
|
212
|
Beghain J, Langlois AC, Legrand E, Grange L, Khim N, Witkowski B, Duru V, Ma L, Bouchier C, Ménard D, Paul RE, Ariey F. Plasmodium copy number variation scan: gene copy numbers evaluation in haploid genomes. Malar J 2016; 15:206. [PMID: 27066902 PMCID: PMC4828863 DOI: 10.1186/s12936-016-1258-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 03/31/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In eukaryotic genomes, deletion or amplification rates have been estimated to be a thousand more frequent than single nucleotide variation. In Plasmodium falciparum, relatively few transcription factors have been identified, and the regulation of transcription is seemingly largely influenced by gene amplification events. Thus copy number variation (CNV) is a major mechanism enabling parasite genomes to adapt to new environmental changes. METHODS Currently, the detection of CNVs is based on quantitative PCR (qPCR), which is significantly limited by the relatively small number of genes that can be analysed at any one time. Technological advances that facilitate whole-genome sequencing, such as next generation sequencing (NGS) enable deeper analyses of the genomic variation to be performed. Because the characteristics of Plasmodium CNVs need special consideration in algorithms and strategies for which classical CNV detection programs are not suited a dedicated algorithm to detect CNVs across the entire exome of P. falciparum was developed. This algorithm is based on a custom read depth strategy through NGS data and called PlasmoCNVScan. RESULTS The analysis of CNV identification on three genes known to have different levels of amplification and which are located either in the nuclear, apicoplast or mitochondrial genomes is presented. The results are correlated with the qPCR experiments, usually used for identification of locus specific amplification/deletion. CONCLUSIONS This tool will facilitate the study of P. falciparum genomic adaptation in response to ecological changes: drug pressure, decreased transmission, reduction of the parasite population size (transition to pre-elimination endemic area).
Collapse
Affiliation(s)
- Johann Beghain
- Institut Pasteur, Génome et Génomique des Insectes Vecteurs, Paris, France.
| | - Anne-Claire Langlois
- Institut Pasteur du Cambodge, Epidémiologie Moléculaire du Paludisme, Phnom Penh, Cambodia
| | - Eric Legrand
- Institut Pasteur, Génome et Génomique des Insectes Vecteurs, Paris, France
| | - Laura Grange
- Institut Pasteur, Génétique Fonctionnelle des Maladies Infectieuses, Paris, France
| | - Nimol Khim
- Institut Pasteur du Cambodge, Epidémiologie Moléculaire du Paludisme, Phnom Penh, Cambodia
| | - Benoit Witkowski
- Institut Pasteur du Cambodge, Epidémiologie Moléculaire du Paludisme, Phnom Penh, Cambodia
| | - Valentine Duru
- Institut Pasteur du Cambodge, Epidémiologie Moléculaire du Paludisme, Phnom Penh, Cambodia
| | - Laurence Ma
- Institut Pasteur, Plate Forme Génomique, Paris, France
| | | | - Didier Ménard
- Institut Pasteur du Cambodge, Epidémiologie Moléculaire du Paludisme, Phnom Penh, Cambodia
| | - Richard E Paul
- Institut Pasteur, Génétique Fonctionnelle des Maladies Infectieuses, Paris, France
| | - Frédéric Ariey
- INSERM U 1016, Institut Cochin, Université Paris Descartes Sorbonne Paris Cité, Faculté de Médecine, Paris, France
| |
Collapse
|
213
|
Yang S, Fang Z. Beta approximation of ratio distribution and its application to next generation sequencing read counts. J Appl Stat 2016; 44:57-70. [PMID: 29456282 DOI: 10.1080/02664763.2016.1158798] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Paired sequencing data are commonly collected in genomic studies to control biological variation. However, existing data processing strategies suffer at low coverage regions, which are unavoidable due to the limitation of current sequencing technology. Furthermore, information contained in the absolute values of the read counts is commonly ignored. We propose a read count ratio processing/modification method, to not only incorporate information contained in the absolute values of paired counts into one variable, but also mitigate the discrete artifact, especially when both counts are small. Simulation shows that the processed variable fits well with a Beta distribution, thus providing an easy tool for down-stream inference analysis.
Collapse
Affiliation(s)
- Shengping Yang
- Department of Pathology, School of Medicine, Texas Tech University Health Science Center, Lubbock, Texas, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, Louisiana, USA
| |
Collapse
|
214
|
Zhu X, Li J, Ru T, Wang Y, Xu Y, Yang Y, Wu X, Cram DS, Hu Y. Identification of copy number variations associated with congenital heart disease by chromosomal microarray analysis and next-generation sequencing. Prenat Diagn 2016; 36:321-7. [PMID: 26833920 DOI: 10.1002/pd.4782] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 01/23/2016] [Accepted: 01/28/2016] [Indexed: 12/25/2022]
Abstract
OBJECTIVE To determine the type and frequency of pathogenic chromosomal abnormalities in fetuses diagnosed with congenital heart disease (CHD) using chromosomal microarray analysis (CMA) and validate next-generation sequencing as an alternative diagnostic method. METHOD Chromosomal aneuploidies and submicroscopic copy number variations (CNVs) were identified in amniocytes DNA samples from CHD fetuses using high-resolution CMA and copy number variation sequencing (CNV-Seq). RESULT Overall, 21 of 115 CHD fetuses (18.3%) referred for CMA had a pathogenic chromosomal anomaly. In six of 73 fetuses (8.2%) with an isolated CHD, CMA identified two cases of DiGeorge syndrome, and one case each of 1q21.1 microdeletion, 16p11.2 microdeletion and Angelman/Prader Willi syndromes, and 22q11.21 microduplication syndrome. In 12 of 42 fetuses (28.6%) with CHD and additional structural abnormalities, CMA identified eight whole or partial trisomies (19.0%), five CNVs (11.9%) associated with DiGeorge, Wolf-Hirschhorn, Miller-Dieker, Cri du Chat and Blepharophimosis, Ptosis, and Epicanthus Inversus syndromes and four other rare pathogenic CNVs (9.5%). Overall, there was a 100% diagnostic concordance between CMA and CNV-Seq for detecting all 21 pathogenic chromosomal abnormalities associated with CHD. CONCLUSION CMA and CNV-Seq are reliable and accurate prenatal techniques for identifying pathogenic fetal chromosomal abnormalities associated with cardiac defects. © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Xiangyu Zhu
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Jie Li
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Tong Ru
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Yaping Wang
- Department of Medical Genetics of Nanjing University Medical School, Nanjing, China
| | - Yan Xu
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Ying Yang
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Xing Wu
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | | | - Yali Hu
- Department of Obstetrics and Gynecology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| |
Collapse
|
215
|
Camiolo S, Sablok G, Porceddu A. Altools: a user friendly NGS data analyser. Biol Direct 2016; 11:8. [PMID: 26883204 PMCID: PMC4756442 DOI: 10.1186/s13062-016-0110-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 02/09/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Genotyping by re-sequencing has become a standard approach to estimate single nucleotide polymorphism (SNP) diversity, haplotype structure and the biodiversity and has been defined as an efficient approach to address geographical population genomics of several model species. To access core SNPs and insertion/deletion polymorphisms (indels), and to infer the phyletic patterns of speciation, most such approaches map short reads to the reference genome. Variant calling is important to establish patterns of genome-wide association studies (GWAS) for quantitative trait loci (QTLs), and to determine the population and haplotype structure based on SNPs, thus allowing content-dependent trait and evolutionary analysis. Several tools have been developed to investigate such polymorphisms as well as more complex genomic rearrangements such as copy number variations, presence/absence variations and large deletions. The programs available for this purpose have different strengths (e.g. accuracy, sensitivity and specificity) and weaknesses (e.g. low computation speed, complex installation procedure and absence of a user-friendly interface). Here we introduce Altools, a software package that is easy to install and use, which allows the precise detection of polymorphisms and structural variations. RESULTS Altools uses the BWA/SAMtools/VarScan pipeline to call SNPs and indels, and the dnaCopy algorithm to achieve genome segmentation according to local coverage differences in order to identify copy number variations. It also uses insert size information from the alignment of paired-end reads and detects potential large deletions. A double mapping approach (BWA/BLASTn) identifies precise breakpoints while ensuring rapid elaboration. Finally, Altools implements several processes that yield deeper insight into the genes affected by the detected polymorphisms. Altools was used to analyse both simulated and real next-generation sequencing (NGS) data and performed satisfactorily in terms of positive predictive values, sensitivity, the identification of large deletion breakpoints and copy number detection. CONCLUSIONS Altools is fast, reliable and easy to use for the mining of NGS data. The software package also attempts to link identified polymorphisms and structural variants to their biological functions thus providing more valuable information than similar tools.
Collapse
Affiliation(s)
- Salvatore Camiolo
- Università degli studi di Sassari, Dipartimento di Agraria, SACEG, Via Enrico De Nicola 1, Sassari, 07100, Italy.
| | - Gaurav Sablok
- Plant Functional Biology and Climate Change Cluster (C3), University of Technology Sydney, PO Box 123 Broadway, NSW 2007, Sydney, Australia.
| | - Andrea Porceddu
- Università degli studi di Sassari, Dipartimento di Agraria, SACEG, Via Enrico De Nicola 1, Sassari, 07100, Italy.
| |
Collapse
|
216
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
217
|
Duan J, Soussen C, Brie D, Idier J, Wang YP, Wan M. An optimal method to segment piecewise poisson distributed signals with application to sequencing data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:6465-8. [PMID: 26737773 DOI: 10.1109/embc.2015.7319873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
To analyze the next generation sequencing data, the so-called read depth signal is often segmented with standard segmentation tools. However, these tools usually assume the signal to be a piecewise constant signal and contaminated with zero mean Gaussian noise, and therefore modeling error occurs. This paper models the read depth signal with piecewise Poisson distribution, which is more appropriate to the next generation sequencing mechanism. Based on the proposed model, an opti- mal dynamic programming algorithm with parallel computing is proposed to segment the piecewise signal, and furthermore detect the copy number variation.
Collapse
|
218
|
Next Generation Sequencing Data and Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:11-19. [DOI: 10.1007/978-3-319-42316-6_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
219
|
Greenman CD, Cooke SL, Marshall J, Stratton MR, Campbell PJ. Modeling the evolution space of breakage fusion bridge cycles with a stochastic folding process. J Math Biol 2016; 72:47-86. [PMID: 25833184 PMCID: PMC4702116 DOI: 10.1007/s00285-015-0875-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2012] [Revised: 03/04/2015] [Indexed: 12/11/2022]
Abstract
Breakage-fusion-bridge cycles in cancer arise when a broken segment of DNA is duplicated and an end from each copy joined together. This structure then 'unfolds' into a new piece of palindromic DNA. This is one mechanism responsible for the localised amplicons observed in cancer genome data. Here we study the evolution space of breakage-fusion-bridge structures in detail. We firstly consider discrete representations of this space with 2-d trees to demonstrate that there are [Formula: see text] qualitatively distinct evolutions involving [Formula: see text] breakage-fusion-bridge cycles. Secondly we consider the stochastic nature of the process to show these evolutions are not equally likely, and also describe how amplicons become localized. Finally we highlight these methods by inferring the evolution of breakage-fusion-bridge cycles with data from primary tissue cancer samples.
Collapse
Affiliation(s)
- C D Greenman
- School of Computing Sciences, University of East Anglia, Norwich, UK.
- The Genome Analysis Centre, Norwich Research Park, Norwich, UK.
| | - S L Cooke
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK
| | - J Marshall
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK
| | - M R Stratton
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK
| | - P J Campbell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| |
Collapse
|
220
|
Shirasawa K, Hirakawa H, Nunome T, Tabata S, Isobe S. Genome-wide survey of artificial mutations induced by ethyl methanesulfonate and gamma rays in tomato. PLANT BIOTECHNOLOGY JOURNAL 2016; 14:51-60. [PMID: 25689669 PMCID: PMC5023996 DOI: 10.1111/pbi.12348] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 12/15/2014] [Accepted: 12/26/2014] [Indexed: 05/20/2023]
Abstract
Genome-wide mutations induced by ethyl methanesulfonate (EMS) and gamma irradiation in the tomato Micro-Tom genome were identified by a whole-genome shotgun sequencing analysis to estimate the spectrum and distribution of whole-genome DNA mutations and the frequency of deleterious mutations. A total of ~370 Gb of paired-end reads for four EMS-induced mutants and three gamma-ray-irradiated lines as well as a wild-type line were obtained by next-generation sequencing technology. Using bioinformatics analyses, we identified 5920 induced single nucleotide variations and insertion/deletion (indel) mutations. The predominant mutations in the EMS mutants were C/G to T/A transitions, while in the gamma-ray mutants, C/G to T/A transitions, A/T to T/A transversions, A/T to G/C transitions and deletion mutations were equally common. Biases in the base composition flanking mutations differed between the mutagenesis types. Regarding the effects of the mutations on gene function, >90% of the mutations were located in intergenic regions, and only 0.2% were deleterious. In addition, we detected 1,140,687 spontaneous single nucleotide polymorphisms and indel polymorphisms in wild-type Micro-Tom lines. We also found copy number variation, deletions and insertions of chromosomal segments in both the mutant and wild-type lines. The results provide helpful information not only for mutation research, but also for mutant screening methodology with reverse-genetic approaches.
Collapse
Affiliation(s)
| | | | - Tsukasa Nunome
- NARO Institute of Vegetable and Tea Sciences, Tsu, Japan
| | | | | |
Collapse
|
221
|
Keith N, Tucker AE, Jackson CE, Sung W, Lucas Lledó JI, Schrider DR, Schaack S, Dudycha JL, Ackerman M, Younge AJ, Shaw JR, Lynch M. High mutational rates of large-scale duplication and deletion in Daphnia pulex. Genome Res 2016; 26:60-9. [PMID: 26518480 PMCID: PMC4691751 DOI: 10.1101/gr.191338.115] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 10/13/2015] [Indexed: 02/06/2023]
Abstract
Knowledge of the genome-wide rate and spectrum of mutations is necessary to understand the origin of disease and the genetic variation driving all evolutionary processes. Here, we provide a genome-wide analysis of the rate and spectrum of mutations obtained in two Daphnia pulex genotypes via separate mutation-accumulation (MA) experiments. Unlike most MA studies that utilize haploid, homozygous, or self-fertilizing lines, D. pulex can be propagated ameiotically while maintaining a naturally heterozygous, diploid genome, allowing the capture of the full spectrum of genomic changes that arise in a heterozygous state. While base-substitution mutation rates are similar to those in other multicellular eukaryotes (about 4 × 10(-9) per site per generation), we find that the rates of large-scale (>100 kb) de novo copy-number variants (CNVs) are significantly elevated relative to those seen in previous MA studies. The heterozygosity maintained in this experiment allowed for estimates of gene-conversion processes. While most of the conversion tract lengths we report are similar to those generated by meiotic processes, we also find larger tract lengths that are indicative of mitotic processes. Comparison of MA lines to natural isolates reveals that a majority of large-scale CNVs in natural populations are removed by purifying selection. The mutations observed here share similarities with disease-causing, complex, large-scale CNVs, thereby demonstrating that MA studies in D. pulex serve as a system for studying the processes leading to such alterations.
Collapse
Affiliation(s)
- Nathan Keith
- School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405, USA
| | - Abraham E Tucker
- Biology Department, Southern Arkansas University, Magnolia, Arkansas 71753, USA
| | - Craig E Jackson
- School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405, USA
| | - Way Sung
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | | | - Daniel R Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA
| | - Sarah Schaack
- Biology Department, Reed College, Portland, Oregon 97202, USA
| | - Jeffry L Dudycha
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Matthew Ackerman
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Andrew J Younge
- School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405, USA
| | - Joseph R Shaw
- School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405, USA; School of Biosciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| |
Collapse
|
222
|
García-Chequer AJ, Méndez-Tenorio A, Olguín-Ruiz G, Sánchez-Vallejo C, Isa P, Arias CF, Torres J, Hernández-Angeles A, Ramírez-Ortiz MA, Lara C, Cabrera-Muñoz ML, Sadowinski-Pine S, Bravo-Ortiz JC, Ramón-García G, Diegopérez-Ramírez J, Ramírez-Reyes G, Casarrubias-Islas R, Ramírez J, Orjuela MA, Ponce-Castañeda MV. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing. Cancer Genet 2015; 209:57-69. [PMID: 26883451 DOI: 10.1016/j.cancergen.2015.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 09/01/2015] [Accepted: 12/03/2015] [Indexed: 12/12/2022]
Abstract
Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development.
Collapse
Affiliation(s)
- A J García-Chequer
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - A Méndez-Tenorio
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - G Olguín-Ruiz
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - C Sánchez-Vallejo
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - P Isa
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - C F Arias
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - J Torres
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - A Hernández-Angeles
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | | | - C Lara
- Hospital Infantil de México Federico Gómez, México D.F., Mexico
| | | | | | - J C Bravo-Ortiz
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - G Ramón-García
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - J Diegopérez-Ramírez
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - G Ramírez-Reyes
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - R Casarrubias-Islas
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - J Ramírez
- Unidad de Microarreglos, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, México D.F., Mexico
| | | | - M V Ponce-Castañeda
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico.
| |
Collapse
|
223
|
Gao X. Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations. BMC Bioinformatics 2015; 16:407. [PMID: 26652207 PMCID: PMC4676147 DOI: 10.1186/s12859-015-0835-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 11/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. RESULT We develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors. CONCLUSION Our numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated.
Collapse
Affiliation(s)
- Xiaoli Gao
- Department of Mathematics and Statistics, University of North Carolina at Greensboro, 1400 Spring Garden St, Greensoboro, NC, USA.
| |
Collapse
|
224
|
Zhang Z, Hao K. SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data. PLoS Comput Biol 2015; 11:e1004618. [PMID: 26583378 DOI: 10.1371/journal.pcbi.1004618] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 10/20/2015] [Indexed: 11/18/2022] Open
Abstract
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Ke Hao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Department of Respiratory Medicine, Shanghai Tenth People's Hospital, Tongji University, Shanghai, China
| |
Collapse
|
225
|
Sillo F, Garbelotto M, Friedman M, Gonthier P. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure. Genome Biol Evol 2015; 7:3190-206. [PMID: 26527650 PMCID: PMC4700942 DOI: 10.1093/gbe/evv209] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2015] [Indexed: 12/27/2022] Open
Abstract
It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34-41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts.
Collapse
Affiliation(s)
- Fabiano Sillo
- Department of Agricultural, Forest and Food Sciences, University of Torino, Grugliasco, Italy
| | - Matteo Garbelotto
- Department of Environmental Science, Policy and Management, University of California, Berkeley
| | - Maria Friedman
- Department of Environmental Science, Policy and Management, University of California, Berkeley
| | - Paolo Gonthier
- Department of Agricultural, Forest and Food Sciences, University of Torino, Grugliasco, Italy
| |
Collapse
|
226
|
Abstract
With the rapid development of readily accessible molecular diagnostic tools, a growing number of patients and families with craniofacial anomalies will have access to a confirmed molecular diagnosis. This chapter provides an overview to current clinical and molecular resources and approaches used by diagnostician today. Clarifying the underlying cause of a congenital defect is necessary to provide proper counseling, identify carrier/risk status of family members, inform prognosis and direct appropriate management, treatments, and surveillance recommendations. The use of molecular testing has evolved to confirm a suspected clinical diagnosis, establish a diagnosis in an unclear condition and end a diagnostic odyssey for many children with underlying syndromes, but the use of these techniques to understand common nonsyndromic malformations like clefts and craniosynostosis is still an active area of research that will contribute to clinical care in the future.
Collapse
|
227
|
Palomares-Rius JE, Tsai IJ, Karim N, Akiba M, Kato T, Maruyama H, Takeuchi Y, Kikuchi T. Genome-wide variation in the pinewood nematode Bursaphelenchus xylophilus and its relationship with pathogenic traits. BMC Genomics 2015; 16:845. [PMID: 26493074 PMCID: PMC4619224 DOI: 10.1186/s12864-015-2085-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 10/14/2015] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Bursaphelenchus xylophilus is an emerging pathogenic nematode that is responsible for a devastating epidemic of pine wilt disease across Asia and Europe. In this study, we report the first genome-wide variation analysis of the nematode with an aim to obtain a full picture of its diversity. METHODS We sequenced six key B. xylophilus strains using Illumina HiSeq sequencer. All the strains were isolated in Japan and have been widely used in previous studies. Detection of genomic variations were done by mapping the reads to the reference genome. RESULTS Over 3 Mb of genetic variations, accounting for 4.1 % of the total genome, were detected as single nucleotide polymorphisms or small indels, suggesting multiple introductions of this invaded species from its native area into the country. The high level of genetic diversity of the pine wood nematode was related to its pathogenicity and ecological trait differences. Moreover, we identified a gene set affected by genomic variation, and functional annotation of those genes indicated that some of them had potential roles in pathogenesis. CONCLUSIONS This study provides an important resource for understanding the population structure, pathogenicity and evolutionary ecology of the nematode, and further analysis based on this study with geographically diverse B. xylophilus populations will greatly accelerate our understanding of the complex evolutionary/epidemic history of this emerging pathogen.
Collapse
Affiliation(s)
- Juan E Palomares-Rius
- Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki, 889-1692, Japan
| | - Isheng J Tsai
- Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki, 889-1692, Japan
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Nurul Karim
- Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki, 889-1692, Japan
- Department of Biochemistry and Molecular Biology, Jahangirnagar University, Savar, Dhaka, 1342, Bangladesh
| | - Mitsuteru Akiba
- Forestry and Forest Products Research Institute, Tsukuba, 305-8689, Japan
| | - Tetsuro Kato
- Laboratory of Terrestrial Microbial Ecology, Graduate School of Agriculture, Kyoto University, Kyoto, 606-8502, Japan
| | - Haruhiko Maruyama
- Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki, 889-1692, Japan
| | - Yuko Takeuchi
- Laboratory of Terrestrial Microbial Ecology, Graduate School of Agriculture, Kyoto University, Kyoto, 606-8502, Japan
| | - Taisei Kikuchi
- Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki, 889-1692, Japan.
| |
Collapse
|
228
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
229
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
230
|
Duan J, Wan M, Deng HW, Wang YP. A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data. IEEE Trans Biomed Eng 2015; 63:496-505. [PMID: 26258935 DOI: 10.1109/tbme.2015.2464674] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
GOAL Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants, such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. METHODS In this paper, a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively reweighted least squares algorithm is used to estimate the solution. RESULTS The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM, and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. CONCLUSION The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance: Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/ wyp/software/Exon_CNV.m.
Collapse
|
231
|
Wallace-Salinas V, Brink DP, Ahrén D, Gorwa-Grauslund MF. Cell periphery-related proteins as major genomic targets behind the adaptive evolution of an industrial Saccharomyces cerevisiae strain to combined heat and hydrolysate stress. BMC Genomics 2015; 16:514. [PMID: 26156140 PMCID: PMC4496855 DOI: 10.1186/s12864-015-1737-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 06/29/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Laboratory evolution is an important tool for developing robust yeast strains for bioethanol production since the biological basis behind combined tolerance requires complex alterations whose proper regulation is difficult to achieve by rational metabolic engineering. Previously, we reported on the evolved industrial Saccharomyces cerevisiae strain ISO12 that had acquired improved tolerance to grow and ferment in the presence of lignocellulose-derived inhibitors at high temperature (39 °C). In the current study, we used comparative genomics to uncover the extent of the genomic alterations that occurred during the evolution process and investigated possible associations between the mutations and the phenotypic traits in ISO12. RESULTS Through whole-genome sequencing and variant calling we identified a high number of strain-unique SNPs and INDELs in both ISO12 and the parental strain Ethanol Red. The variants were predicted to have 760 non-synonymous effects in both strains combined and were significantly enriched in Gene Ontology terms related to cell periphery, membranes and cell wall. Eleven genes, including MTL1, FLO9/FLO11, and CYC3 were found to be under positive selection in ISO12. Additionally, the FLO genes exhibited changes in copy number, and the alterations to this gene family were correlated with experimental results of multicellularity and invasive growth in the adapted strain. An independent lipidomic analysis revealed further differences between the strains in the content of nine lipid species. Finally, ISO12 displayed improved viability in undiluted spruce hydrolysate that was unrelated to reduction of inhibitors and changes in cell wall integrity, as shown by HPLC and lyticase assays. CONCLUSIONS Together, the results of the sequence comparison and the physiological characterisations indicate that cell-periphery proteins (e.g. extracellular sensors such as MTL1) and peripheral lipids/membranes are important evolutionary targets in the process of adaptation to the combined stresses. The capacity of ISO12 to develop complex colony formation also revealed multicellularity as a possible evolutionary strategy to improve competitiveness and tolerance to environmental stresses (also reflected by the FLO genes). Although a panel of altered genes with high relevance to the novel phenotype was detected, this study also demonstrates that the observed long-term molecular effects of thermal and inhibitor stress have polygenetic basis.
Collapse
Affiliation(s)
- Valeria Wallace-Salinas
- Applied Microbiology, Department of Chemistry, Lund University, P.O. Box 124, Lund, SE-22100, Sweden.
| | - Daniel P Brink
- Applied Microbiology, Department of Chemistry, Lund University, P.O. Box 124, Lund, SE-22100, Sweden.
| | - Dag Ahrén
- Microbial Ecology Group, Department of Biology, Lund University, Ecology Building, Lund, Sweden.
| | - Marie F Gorwa-Grauslund
- Applied Microbiology, Department of Chemistry, Lund University, P.O. Box 124, Lund, SE-22100, Sweden.
| |
Collapse
|
232
|
Wang H, Wang C, Yang K, Liu J, Zhang Y, Wang Y, Xu X, Michal JJ, Jiang Z, Liu B. Genome Wide Distributions and Functional Characterization of Copy Number Variations between Chinese and Western Pigs. PLoS One 2015; 10:e0131522. [PMID: 26154170 PMCID: PMC4496047 DOI: 10.1371/journal.pone.0131522] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Accepted: 06/03/2015] [Indexed: 01/02/2023] Open
Abstract
Copy number variations (CNVs) refer to large insertions, deletions and duplications in the genomic structure ranging from one thousand to several million bases in size. Since the development of next generation sequencing technology, several methods have been well built for detection of copy number variations with high credibility and accuracy. Evidence has shown that CNV occurring in gene region could lead to phenotypic changes due to the alteration in gene structure and dosage. However, it still remains unexplored whether CNVs underlie the phenotypic differences between Chinese and Western domestic pigs. Based on the read-depth methods, we investigated copy number variations using 49 individuals derived from both Chinese and Western pig breeds. A total of 3,131 copy number variation regions (CNVRs) were identified with an average size of 13.4 Kb in all individuals during domestication, harboring 1,363 genes. Among them, 129 and 147 CNVRs were Chinese and Western pig specific, respectively. Gene functional enrichments revealed that these CNVRs contribute to strong disease resistance and high prolificacy in Chinese domestic pigs, but strong muscle tissue development in Western domestic pigs. This finding is strongly consistent with the morphologic characteristics of Chinese and Western pigs, indicating that these group-specific CNVRs might have been preserved by artificial selection for the favored phenotypes during independent domestication of Chinese and Western pigs. In this study, we built high-resolution CNV maps in several domestic pig breeds and discovered the group specific CNVs by comparing Chinese and Western pigs, which could provide new insight into genomic variations during pigs’ independent domestication, and facilitate further functional studies of CNV-associated genes.
Collapse
Affiliation(s)
- Hongyang Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Chao Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Kui Yang
- Modern Educational & Technology Centre of Huazhong Agricultural University, Wuhan, PR China
| | - Jing Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Yu Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Yanan Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Xuewen Xu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Jennifer J. Michal
- Department of Animal Sciences, Washington State University, Pullman, WA, United States of America
| | - Zhihua Jiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- Department of Animal Sciences, Washington State University, Pullman, WA, United States of America
| | - Bang Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
- * E-mail:
| |
Collapse
|
233
|
Yiğiter A, Chen J, An L, Danacioğlu N. An online copy number variant detection method for short sequencing reads. J Appl Stat 2015. [DOI: 10.1080/02664763.2014.1001330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
234
|
Microevolution of Duplications and Deletions and Their Impact on Gene Expression in the Nematode Pristionchus pacificus. PLoS One 2015; 10:e0131136. [PMID: 26125626 PMCID: PMC4488370 DOI: 10.1371/journal.pone.0131136] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 05/27/2015] [Indexed: 11/19/2022] Open
Abstract
The evolution of diversity across the animal kingdom has been accompanied by tremendous gene loss and gain. While comparative genomics has been fruitful to characterize differences in gene content across highly diverged species, little is known about the microevolution of structural variations that cause these differences in the first place. In order to investigate the genomic impact of structural variations, we made use of genomic and transcriptomic data from the nematode Pristionchus pacificus, which has been established as a satellite model to Caenorhabditis elegans for comparative biology. We exploit the fact that P. pacificus is a highly diverse species for which various genomic data including the draft genome of a sister species P. exspectatus is available. Based on resequencing coverage data for two natural isolates we identified large (> 2kb) deletions and duplications relative to the reference strain. By restriction to completely syntenic regions between P. pacificus and P. exspectatus, we were able to polarize the comparison and to assess the impact of structural variations on expression levels. We found that while loss of genes correlates with lack of expression, duplication of genes has virtually no effect on gene expression. Further investigating expression of individual copies at sites that segregate between the duplicates, we found in the majority of cases only one of the copies to be expressed. Nevertheless, we still find that certain gene classes are strongly depleted in deletions as well as duplications, suggesting evolutionary constraint acting on synteny. In summary, our results are consistent with a model, where most structural variations are either deleterious or neutral and provide first insights into the microevolution of structural variations in the P. pacificus genome.
Collapse
|
235
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
236
|
Dorshorst B, Henegar C, Liao X, Sällman Almén M, Rubin CJ, Ito S, Wakamatsu K, Stothard P, Van Doormaal B, Plastow G, Barsh GS, Andersson L. Dominant Red Coat Color in Holstein Cattle Is Associated with a Missense Mutation in the Coatomer Protein Complex, Subunit Alpha (COPA) Gene. PLoS One 2015; 10:e0128969. [PMID: 26042826 PMCID: PMC4456281 DOI: 10.1371/journal.pone.0128969] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Accepted: 05/01/2015] [Indexed: 12/31/2022] Open
Abstract
Coat color in Holstein dairy cattle is primarily controlled by the melanocortin 1 receptor (MC1R) gene, a central determinant of black (eumelanin) vs. red/brown pheomelanin synthesis across animal species. The major MC1R alleles in Holsteins are Dominant Black (MC1RD) and Recessive Red (MC1Re). A novel form of dominant red coat color was first observed in an animal born in 1980. The mutation underlying this phenotype was named Dominant Red and is epistatic to the constitutively activated MC1RD. Here we show that a missense mutation in the coatomer protein complex, subunit alpha (COPA), a gene with previously no known role in pigmentation synthesis, is completely associated with Dominant Red in Holstein dairy cattle. The mutation results in an arginine to cysteine substitution at an amino acid residue completely conserved across eukaryotes. Despite this high level of conservation we show that both heterozygotes and homozygotes are healthy and viable. Analysis of hair pigment composition shows that the Dominant Red phenotype is similar to the MC1R Recessive Red phenotype, although less effective at reducing eumelanin synthesis. RNA-seq data similarly show that Dominant Red animals achieve predominantly pheomelanin synthesis by downregulating genes normally required for eumelanin synthesis. COPA is a component of the coat protein I seven subunit complex that is involved with retrograde and cis-Golgi intracellular coated vesicle transport of both protein and RNA cargo. This suggests that Dominant Red may be caused by aberrant MC1R protein or mRNA trafficking within the highly compartmentalized melanocyte, mimicking the effect of the Recessive Red loss of function MC1R allele.
Collapse
Affiliation(s)
- Ben Dorshorst
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
- * E-mail:
| | - Corneliu Henegar
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America
| | - Xiaoping Liao
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
| | - Markus Sällman Almén
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Carl-Johan Rubin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Shosuke Ito
- Department of Chemistry, Fujita Health University School of Health Sciences, Toyoake, Aichi, Japan
| | - Kazumasa Wakamatsu
- Department of Chemistry, Fujita Health University School of Health Sciences, Toyoake, Aichi, Japan
| | - Paul Stothard
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
| | | | - Graham Plastow
- Livestock Gentec, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
| | - Gregory S. Barsh
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Leif Andersson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
237
|
Espinosa JRF, Ayub Q, Chen Y, Xue Y, Tyler-Smith C. Structural variation on the human Y chromosome from population-scale resequencing. Croat Med J 2015; 56:194-207. [PMID: 26088844 PMCID: PMC4500966 DOI: 10.3325/cmj.2015.56.194] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 05/24/2015] [Indexed: 11/05/2022] Open
Abstract
AIM To investigate the information about Y-structural variants (SVs) in the general population that could be obtained by low-coverage whole-genome sequencing. METHODS We investigated SVs on the male-specific portion of the Y chromosome in the 70 individuals from Africa, Europe, or East Asia sequenced as part of the 1000 Genomes Pilot project, using data from this project and from additional studies on the same samples. We applied a combination of read-depth and read-pair methods to discover candidate Y-SVs, followed by validation using information from the literature, independent sequence and single nucleotide polymorphism-chip data sets, and polymerase chain reaction experiments. RESULTS We validated 19 Y-SVs, 2 of which were novel. Non-reference allele counts ranged from 1 to 64. The regions richest in variation were the heterochromatic segments near the centromere or the DYZ19 locus, followed by the ampliconic regions, but some Y-SVs were also present in the X-transposed and X-degenerate regions. In all, 5 of the 27 protein-coding gene families on the Y chromosome varied in copy number. CONCLUSIONS We confirmed that Y-SVs were readily detected from low-coverage sequence data and were abundant on the chromosome. We also reported both common and rare Y-SVs that are novel.
Collapse
Affiliation(s)
| | | | | | | | - Chris Tyler-Smith
- Chris Tyler-Smith,The Wellcome Trust Sanger Institute, Hinxton, Cambs. CB10 1SA, UK,
| |
Collapse
|
238
|
Mayer MG, Rödelsperger C, Witte H, Riebesell M, Sommer RJ. The Orphan Gene dauerless Regulates Dauer Development and Intraspecific Competition in Nematodes by Copy Number Variation. PLoS Genet 2015; 11:e1005146. [PMID: 26087034 PMCID: PMC4473527 DOI: 10.1371/journal.pgen.1005146] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 03/13/2015] [Indexed: 12/04/2022] Open
Abstract
Many nematodes form dauer larvae when exposed to unfavorable conditions, representing an example of phenotypic plasticity and a major survival and dispersal strategy. In Caenorhabditis elegans, the regulation of dauer induction is a model for pheromone, insulin, and steroid-hormone signaling. Recent studies in Pristionchus pacificus revealed substantial natural variation in various aspects of dauer development, i.e. pheromone production and sensing and dauer longevity and fitness. One intriguing example is a strain from Ohio, having extremely long-lived dauers associated with very high fitness and often forming the most dauers in response to other strains' pheromones, including the reference strain from California. While such examples have been suggested to represent intraspecific competition among strains, the molecular mechanisms underlying these dauer-associated patterns are currently unknown. We generated recombinant-inbred-lines between the Californian and Ohioan strains and used quantitative-trait-loci analysis to investigate the molecular mechanism determining natural variation in dauer development. Surprisingly, we discovered that the orphan gene dauerless controls dauer formation by copy number variation. The Ohioan strain has one dauerless copy causing high dauer formation, whereas the Californian strain has two copies, resulting in strongly reduced dauer formation. Transgenic animals expressing multiple copies do not form dauers. dauerless is exclusively expressed in CAN neurons, and both CAN ablation and dauerless mutations increase dauer formation. Strikingly, dauerless underwent several duplications and acts in parallel or downstream of steroid-hormone signaling but upstream of the nuclear-hormone-receptor daf-12. We identified the novel or fast-evolving gene dauerless as inhibitor of dauer development. Our findings reveal the importance of gene duplications and copy number variations for orphan gene function and suggest daf-12 as major target for dauer regulation. We discuss the consequences of the novel vs. fast-evolving nature of orphans for the evolution of developmental networks and their role in natural variation and intraspecific competition.
Collapse
Affiliation(s)
- Melanie G. Mayer
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Christian Rödelsperger
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Hanh Witte
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Metta Riebesell
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Ralf J. Sommer
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|
239
|
CONSERTING: integrating copy-number analysis with structural-variation detection. Nat Methods 2015; 12:527-30. [PMID: 25938371 DOI: 10.1038/nmeth.3394] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 03/22/2015] [Indexed: 12/26/2022]
Abstract
We developed Copy Number Segmentation by Regression Tree in Next Generation Sequencing (CONSERTING), an algorithm for detecting somatic copy-number alteration (CNA) using whole-genome sequencing (WGS) data. CONSERTING performs iterative analysis of segmentation on the basis of changes in read depth and the detection of localized structural variations, with high accuracy and sensitivity. Analysis of 43 cancer genomes from both pediatric and adult patients revealed novel oncogenic CNAs, complex rearrangements and subclonal CNAs missed by alternative approaches.
Collapse
|
240
|
Hybrid algorithms for multiple change-point detection in biological sequences. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 823:41-61. [PMID: 25381101 DOI: 10.1007/978-3-319-10984-8_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Array comparative genomic hybridization (aCGH) is one of the techniques that can be used to detect copy number variations in DNA sequences in high resolution. It has been identified that abrupt changes in the human genome play a vital role in the progression and development of many complex diseases. In this study we propose two distinct hybrid algorithms that combine efficient sequential change-point detection procedures (the Shiryaev-Roberts procedure and the cumulative sum control chart (CUSUM) procedure) with the Cross-Entropy method, which is an evolutionary stochastic optimization technique to estimate both the number of change-points and their corresponding locations in aCGH data. The proposed hybrid algorithms are applied to both artificially generated data and real aCGH experimental data to illustrate their usefulness. Our results show that the proposed methodologies are effective in detecting multiple change-points in biological sequences of continuous measurements.
Collapse
|
241
|
Margarido GRA, Heckerman D. ConPADE: genome assembly ploidy estimation from next-generation sequencing data. PLoS Comput Biol 2015; 11:e1004229. [PMID: 25880203 PMCID: PMC4400156 DOI: 10.1371/journal.pcbi.1004229] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 03/09/2015] [Indexed: 01/08/2023] Open
Abstract
As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed. Diploid organisms, such as human beings, have two “copies” of each chromosome, whereas polyploid organisms have multiple “copies” (we use quotes to stress that the “copies” are not identical). A key difference between diploid and polyploid organisms is that the “copies” tend to be less similar in polyploid organisms. This difference leads to important differences in the process of de novo genome assembly from short fragments of DNA. In particular, when assembling polyploid organisms, contigs corresponding to different copies of the chromosomes can be quite different, and merging them leads to loss of information. Thus, it is important to maintain distinct contigs, even though they correspond to copies of the same chromosomal region. An important step in doing so is to determine how many truly distinct copies of a chromosomal region are found in a single contig. For example, if there are 12 copies of a particular chromosome, the possible number of distinct copies could be anywhere from 1 to 12. We call this task “contig ploidy estimation”, and present a method for accomplishing it. This set of methods is useful for the de novo assembly of complex, polyploid genomes such as sugarcane, switchgrass, and wheat.
Collapse
Affiliation(s)
- Gabriel R. A. Margarido
- Microsoft Research, Los Angeles, California, United States of America
- Departamento de Genética, Escola Superior de Agricultura ‘‘Luiz de Queiroz”, Universidade de São Paulo, Piracicaba, Brazil
- * E-mail: (GRAM); (DH)
| | - David Heckerman
- Microsoft Research, Los Angeles, California, United States of America
- * E-mail: (GRAM); (DH)
| |
Collapse
|
242
|
Wang W, Wang W, Sun W, Crowley JJ, Szatkiewicz JP. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing. Nucleic Acids Res 2015; 43:e90. [PMID: 25883151 PMCID: PMC4538801 DOI: 10.1093/nar/gkv319] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 03/27/2015] [Indexed: 11/14/2022] Open
Abstract
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.
Collapse
Affiliation(s)
- WeiBo Wang
- Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599-3175, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7400, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| | - Jin P Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| |
Collapse
|
243
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|
244
|
Cohen K, Tzika A, Wood H, Berri S, Roberts P, Mason G, Sheridan E. Diagnosis of fetal submicroscopic chromosomal abnormalities in failed array CGH samples: copy number by sequencing as an alternative to microarrays for invasive fetal testing. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2015; 45:394-401. [PMID: 25510919 DOI: 10.1002/uog.14767] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2014] [Revised: 11/11/2014] [Accepted: 12/10/2014] [Indexed: 06/04/2023]
Abstract
OBJECTIVES Array comparative genomic hybridization (CGH) has become the technology of choice for high-resolution prenatal whole genome analysis. Limitations of microarrays are mainly related to the analog nature of the analysis, and poor-quality DNA can result in failed quality metrics with these platforms. We examined a cohort of abnormal fetuses with failed array CGH results using a next-generation sequencing algorithm, CNV-Seq. We assessed the ability of the platform to handle suboptimal prenatal samples and generate interpretable molecular karyotypes. METHODS Nine samples obtained from abnormal fetuses and one from a normal control fetus were sequenced using an Illumina GAIIx. A segmentation algorithm for sequencing data was used to determine regional copy number data on the sequencing datasets. RESULTS Phred quality scores were satisfactory for analysis of all samples. CNV-Seq identified both large- and small-scale abnormalities in the cohort, and normal results were obtained for fetuses for which microarray data were previously uninterpretable. No variants of uncertain significance were detected. Analysis of the digital sequencing datasets offered some advantages over array CGH output. CONCLUSIONS Using next-generation sequencing for the detection of genomic copy number variants may be advantageous for poor-quality, invasively-acquired prenatal samples. CNV-Seq could become a potential alternative to array CGH in this setting.
Collapse
Affiliation(s)
- K Cohen
- Department of Fetal Medicine, Leeds General Infirmary, Leeds, UK; Leeds Institute of Cancer and Pathology, Leeds, UK
| | | | | | | | | | | | | |
Collapse
|
245
|
Hirakawa H, Okada Y, Tabuchi H, Shirasawa K, Watanabe A, Tsuruoka H, Minami C, Nakayama S, Sasamoto S, Kohara M, Kishida Y, Fujishiro T, Kato M, Nanri K, Komaki A, Yoshinaga M, Takahata Y, Tanaka M, Tabata S, Isobe SN. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res 2015; 22:171-9. [PMID: 25805887 PMCID: PMC4401327 DOI: 10.1093/dnares/dsv002] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 02/17/2015] [Indexed: 12/21/2022] Open
Abstract
Ipomoea trifida (H. B. K.) G. Don. is the most likely diploid ancestor of the hexaploid sweet potato, I. batatas (L.) Lam. To assist in analysis of the sweet potato genome, de novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1, using the Illumina HiSeq platform. We classified the sequences thus obtained as either ‘core candidates’ (common to the two lines) or ‘line specific’. The total lengths of the assembled sequences of Mx23Hm (ITR_r1.0) was 513 Mb, while that of 0431-1 (ITRk_r1.0) was 712 Mb. Of the assembled sequences, 240 Mb (Mx23Hm) and 353 Mb (0431-1) were classified into core candidate sequences. A total of 62,407 (62.4 Mb) and 109,449 (87.2 Mb) putative genes were identified, respectively, in the genomes of Mx23Hm and 0431-1, of which 11,823 were derived from core sequences of Mx23Hm, while 28,831 were from the core candidate sequence of 0431-1. There were a total of 1,464,173 single-nucleotide polymorphisms and 16,682 copy number variations (CNVs) in the two assembled genomic sequences (under the condition of log2 ratio of >1 and CNV size >1,000 bases). The results presented here are expected to contribute to the progress of genomic and genetic studies of I. trifida, as well as studies of the sweet potato and the genus Ipomoea in general.
Collapse
Affiliation(s)
- Hideki Hirakawa
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Yoshihiro Okada
- Crop and Agribusiness Research Division, Kyushu Okinawa Agricultural Research Center, National Agriculture and Food Research Organization (NARO/KARC), Itoman, Okinawa 901-0336, Japan
| | - Hiroaki Tabuchi
- Upland Farming Research Division, Kyushu Okinawa Agricultural Research Center, National Agriculture and Food Research Organization (NARO/KARC), Miyakonojo, Miyazaki 885-0091, Japan
| | - Kenta Shirasawa
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Akiko Watanabe
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Hisano Tsuruoka
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Chiharu Minami
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | | | | | - Mitsuyo Kohara
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Yoshie Kishida
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | | | - Midori Kato
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Keiko Nanri
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Akiko Komaki
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Masaru Yoshinaga
- Upland Farming Research Division, Kyushu Okinawa Agricultural Research Center, National Agriculture and Food Research Organization (NARO/KARC), Miyakonojo, Miyazaki 885-0091, Japan
| | - Yasuhiro Takahata
- Upland Farming Research Division, Kyushu Okinawa Agricultural Research Center, National Agriculture and Food Research Organization (NARO/KARC), Miyakonojo, Miyazaki 885-0091, Japan
| | - Masaru Tanaka
- Upland Farming Research Division, Kyushu Okinawa Agricultural Research Center, National Agriculture and Food Research Organization (NARO/KARC), Miyakonojo, Miyazaki 885-0091, Japan
| | - Satoshi Tabata
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Sachiko N Isobe
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| |
Collapse
|
246
|
Smith SD, Kawash JK, Grigoriev A. GROM-RD: resolving genomic biases to improve read depth detection of copy number variants. PeerJ 2015; 3:e836. [PMID: 25802807 PMCID: PMC4369336 DOI: 10.7717/peerj.836] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 02/23/2015] [Indexed: 12/21/2022] Open
Abstract
Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/.
Collapse
Affiliation(s)
- Sean D Smith
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University , Camden, NJ , USA
| | - Joseph K Kawash
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University , Camden, NJ , USA
| | - Andrey Grigoriev
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University , Camden, NJ , USA
| |
Collapse
|
247
|
Manconi A, Manca E, Moscatelli M, Gnocchi M, Orro A, Armano G, Milanesi L. G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods. Front Bioeng Biotechnol 2015; 3:28. [PMID: 25806367 PMCID: PMC4354384 DOI: 10.3389/fbioe.2015.00028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/19/2015] [Indexed: 11/23/2022] Open
Abstract
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Emanuele Manca
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Marco Moscatelli
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Matteo Gnocchi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| |
Collapse
|
248
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|
249
|
Terol J, Ibañez V, Carbonell J, Alonso R, Estornell LH, Licciardello C, Gut IG, Dopazo J, Talon M. Involvement of a citrus meiotic recombination TTC-repeat motif in the formation of gross deletions generated by ionizing radiation and MULE activation. BMC Genomics 2015; 16:69. [PMID: 25758634 PMCID: PMC4334395 DOI: 10.1186/s12864-015-1280-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 01/26/2015] [Indexed: 02/07/2023] Open
Abstract
Background Transposable-element mediated chromosomal rearrangements require the involvement of two transposons and two double-strand breaks (DSB) located in close proximity. In radiobiology, DSB proximity is also a major factor contributing to rearrangements. However, the whole issue of DSB proximity remains virtually unexplored. Results Based on DNA sequencing analysis we show that the genomes of 2 derived mutations, Arrufatina (sport) and Nero (irradiation), share a similar 2 Mb deletion of chromosome 3. A 7 kb Mutator-like element found in Clemenules was present in Arrufatina in inverted orientation flanking the 5′ end of the deletion. The Arrufatina Mule displayed “dissimilar” 9-bp target site duplications separated by 2 Mb. Fine-scale single nucleotide variant analyses of the deleted fragments identified a TTC-repeat sequence motif located in the center of the deletion responsible of a meiotic crossover detected in the citrus reference genome. Conclusions Taken together, this information is compatible with the proposal that in both mutants, the TTC-repeat motif formed a triplex DNA structure generating a loop that brought in close proximity the originally distinct reactive ends. In Arrufatina, the loop brought the Mule ends nearby the 2 distinct insertion target sites and the inverted insertion of the transposable element between these target sites provoked the release of the in-between fragment. This proposal requires the involvement of a unique transposon and sheds light on the unresolved question of how two distinct sites become located in close proximity. These observations confer a crucial role to the TTC-repeats in fundamental plant processes as meiotic recombination and chromosomal rearrangements. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1280-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Javier Terol
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, 46113, Valencia, Spain.
| | - Victoria Ibañez
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, 46113, Valencia, Spain.
| | - José Carbonell
- Centro de Investigación Principe Felipe (CIPF), Avda, Autopista del Saler, 16-3, 46012, Valencia, Spain.
| | - Roberto Alonso
- Centro de Investigación Principe Felipe (CIPF), Avda, Autopista del Saler, 16-3, 46012, Valencia, Spain.
| | - Leandro H Estornell
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, 46113, Valencia, Spain.
| | - Concetta Licciardello
- CRA-ACM, Consiglio per la Ricerca e la Sperimentazione in Agricoltura, Corso Savoia 190, 95024, Acireale, Catania, Italy.
| | - Ivo G Gut
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, 08028, Barcelona, Spain.
| | - Joaquín Dopazo
- Centro de Investigación Principe Felipe (CIPF), Avda, Autopista del Saler, 16-3, 46012, Valencia, Spain.
| | - Manuel Talon
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, 46113, Valencia, Spain.
| |
Collapse
|
250
|
Brynildsrud O, Snipen LG, Bohlin J. CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics 2015; 31:1708-15. [DOI: 10.1093/bioinformatics/btv070] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 01/28/2015] [Indexed: 01/22/2023] Open
|