1
|
Berton MP, de Lemos MVA, Stafuzza NB, Simielli Fonseca LF, Silva DBDS, Peripolli E, Pereira ASC, Magalhães AFB, Albuquerque LG, Baldi F. Integration analyses of structural variations and differential gene expression associated with beef fatty acid profile in Nellore cattle. Anim Genet 2022; 53:570-582. [PMID: 35811456 DOI: 10.1111/age.13242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 06/06/2022] [Accepted: 06/22/2022] [Indexed: 11/26/2022]
Abstract
This study aimed to integrate analyses of structural variations and differentially expressed genes (DEGs) associated with the beef fatty acid (FA) profile in Nellore cattle. Copy numbers variation (CNV) detection was performed using the penncnv algorithm and CNVRuler software in 3794 genotyped animals through the High-Density Bovine BeadChip. In order to perform the genomic wide association study (GWAS), a total of 963 genotyped animals were selected to obtain the intramuscular lipid concentration and quantify the beef FA profile. A total of 48 animals belonging to the same farm and management lot were extracted from the 963 genotyped and phenotyped animals to carry out the transcriptomic and differentially expressed gene analyses. The GWAS with extreme groups of FA profiles was performed using a logistic model. A total of 43, 42, 66 and 35 significant CNV regions (p < 0.05) for saturated, monounsaturated, polyunsaturated and omega 3 and 6 fatty acids were identified respectively. The paired-end sequencing of 48 samples was performed using the Illumina HiSeq2500 platform. Real-time quantitative PCR was used to validate the DEGs identified by RNA-seq analysis. The results showed several DEGs associated with the FA profile of Longissimus thoracis, such as BSCL2 and SAMD8. Enriched terms as the cellular response to corticosteroid (GO:0071384) and glucocorticoid stimulus (GO:0071385) could be highlighted. The identification of structural variations harboring candidate genes for beef FA must contribute to the elucidation of the genetic basis that determines the beef FA composition of intramuscular fat in Nellore cattle. Our results will contribute to the identification of potential biomarkers for complex phenotypes, such as the FA profile, to improve the reliability of the genomic predictions including pre-selected variants using differentiated weighting in the genomic models.
Collapse
Affiliation(s)
- Mariana Piatto Berton
- Departamento de Zootecnia, Universidade Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, Brazil
| | | | | | | | | | - Elisa Peripolli
- Departamento de Zootecnia, Universidade Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, Brazil
| | - Angélica S C Pereira
- Departamento de Nutrição e Produção Animal, Universidade de São Paulo, Faculdade de Medicina Veterinária e Zootecnia, Pirassununga, Brazil
| | - Ana Fabricia Braga Magalhães
- Departamento de Zootecnia, Universidade Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, Brazil
| | - Lucia G Albuquerque
- Departamento de Zootecnia, Universidade Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, Brazil
| | - Fernando Baldi
- Departamento de Zootecnia, Universidade Estadual Paulista, Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal, Brazil
| |
Collapse
|
2
|
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. PLoS Comput Biol 2016; 12:e1004871. [PMID: 27177143 PMCID: PMC4866742 DOI: 10.1371/journal.pcbi.1004871] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 03/14/2016] [Indexed: 11/22/2022] Open
Abstract
By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings.
Collapse
|
3
|
Sykulski M, Gambin T, Bartnik M, Derwińska K, Wiśniowiecka-Kowalnik B, Stankiewicz P, Gambin A. Multiple samples aCGH analysis for rare CNVs detection. J Clin Bioinforma 2013; 3:12. [PMID: 23758813 PMCID: PMC3691624 DOI: 10.1186/2043-9113-3-12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 05/23/2013] [Indexed: 11/20/2022] Open
Abstract
Background DNA copy number variations (CNV) constitute an important source of genetic variability. The standard method used for CNV detection is array comparative genomic hybridization (aCGH). Results We propose a novel multiple sample aCGH analysis methodology aiming in rare CNVs detection. In contrast to the majority of previous approaches, which deal with cancer datasets, we focus on constitutional genomic abnormalities identified in a diverse spectrum of diseases in human. Our method is tested on exon targeted aCGH array of 366 patients affected with developmental delay/intellectual disability, epilepsy, or autism. The proposed algorithms can be applied as a post–processing filtering to any given segmentation method. Conclusions Thanks to the additional information obtained from multiple samples, we could efficiently detect significant segments corresponding to rare CNVs responsible for pathogenic changes. The robust statistical framework applied in our method enables to eliminate the influence of widespread technical artifact termed ‘waves’.
Collapse
Affiliation(s)
- Maciej Sykulski
- Institute of Informatics, University of Warsaw, Warsaw, Poland.
| | | | | | | | | | | | | |
Collapse
|
4
|
Functional performance of aCGH design for clinical cytogenetics. Comput Biol Med 2013; 43:775-85. [PMID: 23668354 DOI: 10.1016/j.compbiomed.2013.02.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Revised: 02/03/2013] [Accepted: 02/05/2013] [Indexed: 12/30/2022]
Abstract
Array-comparative genomic hybridization (aCGH) technology enables rapid, high-resolution analysis of genomic rearrangements. With the use of it, genome copy number changes and rearrangement breakpoints can be detected and analyzed at resolutions down to a few kilobases. An exon array CGH approach proposed recently accurately measures copy-number changes of individual exons in the human genome. The crucial and highly non-trivial starting task is the design of an array, i.e. the choice of appropriate (multi)set of oligos. The success of the whole high-level analysis depends on the quality of the design. Also, the comparison of several alternative designs of array CGH constitutes an important step in development of new diagnostic chip. In this paper, we deal with these two often neglected issues. We propose a new approach to measure the quality of array CGH designs. Our measures reflect the robustness of rearrangements detection to the noise (mostly experimental measurement error). The method is parametrized by the segmentation algorithm used to identify aberrations. We implemented the efficient Monte Carlo method for testing noise robustness within DNAcopy procedure. Developed framework has been applied to evaluation of functional quality of several optimized array designs.
Collapse
|
5
|
Pronold M, Vali M, Pique-Regi R, Asgharzadeh S. Copy number variation signature to predict human ancestry. BMC Bioinformatics 2012; 13:336. [PMID: 23270563 PMCID: PMC3598683 DOI: 10.1186/1471-2105-13-336] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 12/06/2012] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. RESULTS We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. CONCLUSIONS We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case-control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
Collapse
Affiliation(s)
- Melissa Pronold
- Department of Pediatrics, Children's Hospital Los Angeles and The Saban Research Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | | | | |
Collapse
|
6
|
Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I. Fast detection of de novo copy number variants from SNP arrays for case-parent trios. BMC Bioinformatics 2012; 13:330. [PMID: 23234608 PMCID: PMC3576329 DOI: 10.1186/1471-2105-13-330] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 12/07/2012] [Indexed: 11/10/2022] Open
Abstract
Background In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios. Results Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios. Conclusions Our results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster.
Collapse
Affiliation(s)
- Robert B Scharpf
- Department of Oncology, Johns Hopkins University, Baltimore, MD, USA.
| | | | | | | | | | | |
Collapse
|
7
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
8
|
Wartman LD, Larson DE, Xiang Z, Ding L, Chen K, Lin L, Cahan P, Klco JM, Welch JS, Li C, Payton JE, Uy GL, Varghese N, Ries RE, Hoock M, Koboldt DC, McLellan MD, Schmidt H, Fulton RS, Abbott RM, Cook L, McGrath SD, Fan X, Dukes AF, Vickery T, Kalicki J, Lamprecht TL, Graubert TA, Tomasson MH, Mardis ER, Wilson RK, Ley TJ. Sequencing a mouse acute promyelocytic leukemia genome reveals genetic events relevant for disease progression. J Clin Invest 2011; 121:1445-55. [PMID: 21436584 DOI: 10.1172/jci45284] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Accepted: 01/19/2011] [Indexed: 01/12/2023] Open
Abstract
Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML). It is characterized by the t(15;17)(q22;q11.2) chromosomal translocation that creates the promyelocytic leukemia-retinoic acid receptor α (PML-RARA) fusion oncogene. Although this fusion oncogene is known to initiate APL in mice, other cooperating mutations, as yet ill defined, are important for disease pathogenesis. To identify these, we used a mouse model of APL, whereby PML-RARA expressed in myeloid cells leads to a myeloproliferative disease that ultimately evolves into APL. Sequencing of a mouse APL genome revealed 3 somatic, nonsynonymous mutations relevant to APL pathogenesis, of which 1 (Jak1 V657F) was found to be recurrent in other affected mice. This mutation was identical to the JAK1 V658F mutation previously found in human APL and acute lymphoblastic leukemia samples. Further analysis showed that JAK1 V658F cooperated in vivo with PML-RARA, causing a rapidly fatal leukemia in mice. We also discovered a somatic 150-kb deletion involving the lysine (K)-specific demethylase 6A (Kdm6a, also known as Utx) gene, in the mouse APL genome. Similar deletions were observed in 3 out of 14 additional mouse APL samples and 1 out of 150 human AML samples. In conclusion, whole genome sequencing of mouse cancer genomes can provide an unbiased and comprehensive approach for discovering functionally relevant mutations that are also present in human leukemias.
Collapse
Affiliation(s)
- Lukas D Wartman
- Department of Internal Medicine, Division of Oncology, Stem Cell Biology Section, Washington University School of Medicine, Siteman Cancer Center, St. Louis, Missouri, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Norton N, Li D, Rieder M, Siegfried J, Rampersaud E, Züchner S, Mangos S, Gonzalez-Quintana J, Wang L, McGee S, Reiser J, Martin E, Nickerson D, Hershberger R. Genome-wide studies of copy number variation and exome sequencing identify rare variants in BAG3 as a cause of dilated cardiomyopathy. Am J Hum Genet 2011; 88:273-82. [PMID: 21353195 DOI: 10.1016/j.ajhg.2011.01.016] [Citation(s) in RCA: 227] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 01/26/2011] [Accepted: 01/29/2011] [Indexed: 12/18/2022] Open
Abstract
Dilated cardiomyopathy commonly causes heart failure and is the most frequent precipitating cause of heart transplantation. Familial dilated cardiomyopathy has been shown to be caused by rare variant mutations in more than 30 genes but only ~35% of its genetic cause has been identified, principally by using linkage-based or candidate gene discovery approaches. In a multigenerational family with autosomal dominant transmission, we employed whole-exome sequencing in a proband and three of his affected family members, and genome-wide copy number variation in the proband and his affected father and unaffected mother. Exome sequencing identified 428 single point variants resulting in missense, nonsense, or splice site changes. Genome-wide copy number analysis identified 51 insertion deletions and 440 copy number variants > 1 kb. Of these, a 8733 bp deletion, encompassing exon 4 of the heat shock protein cochaperone BCL2-associated athanogene 3 (BAG3), was found in seven affected family members and was absent in 355 controls. To establish the relevance of variants in this protein class in genetic DCM, we sequenced the coding exons in BAG3 in 311 other unrelated DCM probands and identified one frameshift, two nonsense, and four missense rare variants absent in 355 control DNAs, four of which were familial and segregated with disease. Knockdown of bag3 in a zebrafish model recapitulated DCM and heart failure. We conclude that new comprehensive genomic approaches have identified rare variants in BAG3 as causative of DCM.
Collapse
|
10
|
Vogler C, Gschwind L, Röthlisberger B, Huber A, Filges I, Miny P, Auschra B, Stetak A, Demougin P, Vukojevic V, Kolassa IT, Elbert T, de Quervain DJF, Papassotiropoulos A. Microarray-based maps of copy-number variant regions in European and sub-Saharan populations. PLoS One 2010; 5:e15246. [PMID: 21179565 PMCID: PMC3002949 DOI: 10.1371/journal.pone.0015246] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Accepted: 11/16/2010] [Indexed: 02/03/2023] Open
Abstract
The genetic basis of phenotypic variation can be partially explained by the presence of copy-number variations (CNVs). Currently available methods for CNV assessment include high-density single-nucleotide polymorphism (SNP) microarrays that have become an indispensable tool in genome-wide association studies (GWAS). However, insufficient concordance rates between different CNV assessment methods call for cautious interpretation of results from CNV-based genetic association studies. Here we provide a cross-population, microarray-based map of copy-number variant regions (CNVRs) to enable reliable interpretation of CNV association findings. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to scan the genomes of 1167 individuals from two ethnically distinct populations (Europe, N = 717; Rwanda, N = 450). Three different CNV-finding algorithms were tested and compared for sensitivity, specificity, and feasibility. Two algorithms were subsequently used to construct CNVR maps, which were also validated by processing subsamples with additional microarray platforms (Illumina 1M-Duo BeadChip, Nimblegen 385K aCGH array) and by comparing our data with publicly available information. Both algorithms detected a total of 42669 CNVs, 74% of which clustered in 385 CNVRs of a cross-population map. These CNVRs overlap with 862 annotated genes and account for approximately 3.3% of the haploid human genome. We created comprehensive cross-populational CNVR-maps. They represent an extendable framework that can leverage the detection of common CNVs and additionally assist in interpreting CNV-based association studies.
Collapse
Affiliation(s)
- Christian Vogler
- Department of Psychology, University of Basel, and Department of Biomedicine, University Children's Hospital, Basel, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Integrated genomics of susceptibility to alkylator-induced leukemia in mice. BMC Genomics 2010; 11:638. [PMID: 21080971 PMCID: PMC3018144 DOI: 10.1186/1471-2164-11-638] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Accepted: 11/17/2010] [Indexed: 11/10/2022] Open
Abstract
Background Therapy-related acute myeloid leukemia (t-AML) is a secondary, generally incurable, malignancy attributable to chemotherapy exposure. Although there is a genetic component to t-AML susceptibility in mice, the relevant loci and the mechanism(s) by which they contribute to t-AML are largely unknown. An improved understanding of susceptibility factors and the biological processes in which they act may lead to the development of t-AML prevention strategies. Results In this work we applied an integrated genomics strategy in inbred strains of mice to find novel factors that might contribute to susceptibility. We found that the pre-exposure transcriptional state of hematopoietic stem/progenitor cells predicts susceptibility status. More than 900 genes were differentially expressed between susceptible and resistant strains and were highly enriched in the apoptotic program, but it remained unclear which genes, if any, contribute directly to t-AML susceptibility. To address this issue, we integrated gene expression data with genetic information, including single nucleotide polymorphisms (SNPs) and DNA copy number variants (CNVs), to identify genetic networks underlying t-AML susceptibility. The 30 t-AML susceptibility networks we found are robust: they were validated in independent, previously published expression data, and different analytical methods converge on them. Further, the networks are enriched in genes involved in cell cycle and DNA repair (pathways not discovered in traditional differential expression analysis), suggesting that these processes contribute to t-AML susceptibility. Within these networks, the putative regulators (e.g., Parp2, Casp9, Polr1b) are the most likely to have a non-redundant role in the pathogenesis of t-AML. While identifying these networks, we found that current CNVR and SNP-based haplotype maps in mice represented distinct sources of genetic variation contributing to expression variation, implying that mapping studies utilizing either source alone will have reduced sensitivity. Conclusion The identification and prioritization of genes and networks not previously implicated in t-AML generates novel hypotheses on the biology and treatment of this disease that will be the focus of future research.
Collapse
|
12
|
Zhang ZD, Gerstein MB. Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model. BMC Bioinformatics 2010; 11:539. [PMID: 21034510 PMCID: PMC2992546 DOI: 10.1186/1471-2105-11-539] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 10/31/2010] [Indexed: 11/17/2022] Open
Abstract
Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.
Collapse
Affiliation(s)
- Zhengdong D Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| | | |
Collapse
|
13
|
Yau C, Papaspiliopoulos O, Roberts GO, Holmes C. Bayesian Nonparametric Hidden Markov Models with application to the analysis of copy-number-variation in mammalian genomes. J R Stat Soc Series B Stat Methodol 2010; 73:37-57. [PMID: 21687778 DOI: 10.1111/j.1467-9868.2010.00756.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.
Collapse
Affiliation(s)
- C Yau
- Department of Statistics and the Oxford-Man Institute for Quantitative Finance, University of Oxford, ,
| | | | | | | |
Collapse
|
14
|
Agam A, Yalcin B, Bhomra A, Cubin M, Webber C, Holmes C, Flint J, Mott R. Elusive copy number variation in the mouse genome. PLoS One 2010; 5:e12839. [PMID: 20877625 PMCID: PMC2943477 DOI: 10.1371/journal.pone.0012839] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 08/16/2010] [Indexed: 11/18/2022] Open
Abstract
Background Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question. Principal Findings Using a 2.1 million probe array we identified 1,477 deletions and 499 gains in 7 inbred mouse strains. Molecular characterization indicated that approximately one third of the CNVs detected by the array were false positives and we estimate the false negative rate to be more than 50%. We show that low concordance between studies is largely due to the molecular nature of CNVs, many of which consist of a series of smaller deletions and gains interspersed by regions where the DNA copy number is normal. Conclusions Our results indicate that CNVs detected by arrays may be the coincidental co-localization of smaller CNVs, whose presence is more likely to perturb an aCGH hybridization profile than the effect of an isolated, small, copy number alteration. Our findings help explain the hitherto unexplored discrepancies between array-based studies of copy number variation in the mouse genome.
Collapse
Affiliation(s)
- Avigail Agam
- Wellcome Trust Centre For Human Genetics, Oxford, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Liu Z, Li A, Schulz V, Chen M, Tuck D. MixHMM: inferring copy number variation and allelic imbalance using SNP arrays and tumor samples mixed with stromal cells. PLoS One 2010; 5:e10909. [PMID: 20532221 PMCID: PMC2879364 DOI: 10.1371/journal.pone.0010909] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 04/28/2010] [Indexed: 01/19/2023] Open
Abstract
Background Genotyping platforms such as single nucleotide polymorphism (SNP) arrays are powerful tools to study genomic aberrations in cancer samples. Allele specific information from SNP arrays provides valuable information for interpreting copy number variation (CNV) and allelic imbalance including loss-of-heterozygosity (LOH) beyond that obtained from the total DNA signal available from array comparative genomic hybridization (aCGH) platforms. Several algorithms based on hidden Markov models (HMMs) have been designed to detect copy number changes and copy-neutral LOH making use of the allele information on SNP arrays. However heterogeneity in clinical samples, due to stromal contamination and somatic alterations, complicates analysis and interpretation of these data. Methods We have developed MixHMM, a novel hidden Markov model using hidden states based on chromosomal structural aberrations. MixHMM allows CNV detection for copy numbers up to 7 and allows more complete and accurate description of other forms of allelic imbalance, such as increased copy number LOH or imbalanced amplifications. MixHMM also incorporates a novel sample mixing model that allows detection of tumor CNV events in heterogeneous tumor samples, where cancer cells are mixed with a proportion of stromal cells. Conclusions We validate MixHMM and demonstrate its advantages with simulated samples, clinical tumor samples and a dilution series of mixed samples. We have shown that the CNVs of cancer cells in a tumor sample contaminated with up to 80% of stromal cells can be detected accurately using Illumina BeadChip and MixHMM. Availability The MixHMM is available as a Python package provided with some other useful tools at http://genecube.med.yale.edu:8080/MixHMM.
Collapse
Affiliation(s)
- Zongzhi Liu
- Department of Pathology, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Ao Li
- Department of Pathology, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Vincent Schulz
- Department of Pediatrics, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Min Chen
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - David Tuck
- Department of Pathology, Yale University School of Medicine, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
16
|
Simpson JT, McIntyre RE, Adams DJ, Durbin R. Copy number variant detection in inbred strains from short read sequence data. Bioinformatics 2009; 26:565-7. [PMID: 20022973 PMCID: PMC2820678 DOI: 10.1093/bioinformatics/btp693] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets. Availability: The software described in this manuscript, named cnD for copy number detector, is free and released under the GPL. The program is implemented in the D programming language using the Tango library. Source code and pre-compiled binaries are available at http://www.sanger.ac.uk/resources/software/cnd.html Contact:rd@sanger.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 2009; 19:1586-92. [PMID: 19657104 PMCID: PMC2752127 DOI: 10.1101/gr.092981.109] [Citation(s) in RCA: 401] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Accepted: 07/15/2009] [Indexed: 11/25/2022]
Abstract
Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30x) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.
Collapse
Affiliation(s)
- Seungtai Yoon
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Zhenyu Xuan
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Vladimir Makarov
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Kenny Ye
- Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | - Jonathan Sebat
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
18
|
Wu LY, Chipman HA, Bull SB, Briollais L, Wang K. A Bayesian segmentation approach to ascertain copy number variations at the population level. ACTA ACUST UNITED AC 2009; 25:1669-79. [PMID: 19389735 DOI: 10.1093/bioinformatics/btp270] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share. RESULTS In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs. AVAILABILITY R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML
Collapse
Affiliation(s)
- Long Yang Wu
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
| | | | | | | | | |
Collapse
|
19
|
Cahan P, Li Y, Izumi M, Graubert TA. The impact of copy number variation on local gene expression in mouse hematopoietic stem and progenitor cells. Nat Genet 2009; 41:430-7. [PMID: 19270704 PMCID: PMC2728431 DOI: 10.1038/ng.350] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2008] [Accepted: 01/13/2009] [Indexed: 11/09/2022]
Abstract
The extent to which differences in germline DNA copy number contribute to natural phenotypic variation is unknown. We analyzed the copy number content of the mouse genome to sub-10-kb resolution. We identified over 1,300 copy number variant regions (CNVRs), most of which are <10 kb in length, are found in more than one strain, and, in total, span 3.2% (85 Mb) of the genome. To assess the potential functional impact of copy number variation, we mapped expression profiles of purified hematopoietic stem and progenitor cells, adipose tissue and hypothalamus to CNVRs in cis. Of the more than 600 significant associations between CNVRs and expression profiles, most map to CNVRs outside of the transcribed regions of genes. In hematopoietic stem and progenitor cells, up to 28% of strain-dependent expression variation is associated with copy number variation, supporting the role of germline CNVs as key contributors to natural phenotypic variation in the laboratory mouse.
Collapse
Affiliation(s)
- Patrick Cahan
- Department of Internal Medicine, Division of Oncology, Stem Cell Biology Section, Washington University, St Louis, MO, USA
| | | | | | | |
Collapse
|
20
|
Graubert TA, Payton MA, Shao J, Walgren RA, Monahan RS, Frater JL, Walshauser MA, Martin MG, Kasai Y, Walter MJ. Integrated genomic analysis implicates haploinsufficiency of multiple chromosome 5q31.2 genes in de novo myelodysplastic syndromes pathogenesis. PLoS One 2009; 4:e4583. [PMID: 19240791 PMCID: PMC2642994 DOI: 10.1371/journal.pone.0004583] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2008] [Accepted: 01/09/2009] [Indexed: 11/19/2022] Open
Abstract
Deletions spanning chromosome 5q31.2 are among the most common recurring cytogenetic abnormalities detectable in myelodysplastic syndromes (MDS). Prior genomic studies have suggested that haploinsufficiency of multiple 5q31.2 genes may contribute to MDS pathogenesis. However, this hypothesis has never been formally tested. Therefore, we designed this study to systematically and comprehensively evaluate all 28 chromosome 5q31.2 genes and directly test whether haploinsufficiency of a single 5q31.2 gene may result from a heterozygous nucleotide mutation or microdeletion. We selected paired tumor (bone marrow) and germline (skin) DNA samples from 46 de novo MDS patients (37 without a cytogenetic 5q31.2 deletion) and performed total exonic gene resequencing (479 amplicons) and array comparative genomic hybridization (CGH). We found no somatic nucleotide changes in the 46 MDS samples, and no cytogenetically silent 5q31.2 deletions in 20/20 samples analyzed by array CGH. Twelve novel single nucleotide polymorphisms were discovered. The mRNA levels of 7 genes in the commonly deleted interval were reduced by 50% in CD34+ cells from del(5q) MDS samples, and no gene showed complete loss of expression. Taken together, these data show that small deletions and/or point mutations in individual 5q31.2 genes are not common events in MDS, and implicate haploinsufficiency of multiple genes as the relevant genetic consequence of this common deletion.
Collapse
Affiliation(s)
- Timothy A. Graubert
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Michelle A. Payton
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jin Shao
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Richard A. Walgren
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Ryan S. Monahan
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - John L. Frater
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Mark A. Walshauser
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Mike G. Martin
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Yumi Kasai
- Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics & Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Matthew J. Walter
- Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
21
|
Li W, Lee A, Gregersen PK. Copy-number-variation and copy-number-alteration region detection by cumulative plots. BMC Bioinformatics 2009; 10 Suppl 1:S67. [PMID: 19208171 PMCID: PMC2648736 DOI: 10.1186/1471-2105-10-s1-s67] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Regions with copy number variations (in germline cells) or copy number alteration (in somatic cells) are of great interest for human disease gene mapping and cancer studies. They represent a new type of mutation and are larger-scaled than the single nucleotide polymorphisms. Using genotyping microarray for copy number variation detection has become standard, and there is a need for improving analysis methods. Results We apply the cumulative plot to the detection of regions with copy number variation/alteration, on samples taken from a chronic lymphocytic leukemia patient. Two sets of whole-genome genotyping of 317 k single nucleotide polymorphisms, one from the normal cell and another from the cancer cell, are analyzed. We demonstrate the utility of cumulative plot in detecting a 9 Mb (9 ×106 bases) hemizygous deletion and 1 Mb homozygous deletion on chromosome 13. We also show the possibility to detect smaller copy number variation/alteration regions below the 100 kb range. Conclusion As a graphic tool, the cumulative plot is an intuitive and a scale-free (window-less) way for detecting copy number variation/alteration regions, especially when such regions are small.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY 11030, USA.
| | | | | |
Collapse
|