1
|
Wang X, Li Y, Fang Z, Li Y. Elevated expression of NFE2L3 promotes the development of gastric cancer through epithelial-mesenchymal transformation. Bioengineered 2021; 12:12204-12214. [PMID: 34783304 PMCID: PMC8810066 DOI: 10.1080/21655979.2021.2005915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Gastric cancer (GC) is a malignant tumor with high mortality, but research on its molecular mechanisms remain limited. This study is the first to explore the biological role of nuclear factor NFE2L3 (nuclear factor, erythroid 2 like 3) in GC. We used Western blot and RT–qPCR to detect gene expression at the protein or mRNA level. Short hairpin RNA (shRNA) transfection was used to inhibit NFE2L3 expression. CCK-8 and colony formation assays were used to detect cell proliferation. Cell migration, invasion, cell cycle and apoptosis were detected by Transwell assays and flow cytometry. The results showed that NFE2L3 was highly expressed in gastric cancer tissues and promoted gastric cancer cell proliferation and metastasis. Inhibiting NFE2L3 expression blocks the cell cycle and increases the proportion of apoptotic cells, whereas NFE2L3 expression promotes the epithelial-mesenchymal transformation (EMT) process. In summary, NFE2L3 is highly expressed in gastric cancer and promotes gastric cancer cell proliferation and metastasis and the EMT process.
Collapse
Affiliation(s)
- Xiaodong Wang
- Department of General Surgery, the First Affiliated Hospital of Anhui Medical University, Hefei 230022, People's Republic of China
| | - Yaxian Li
- Department of General Surgery, the First Affiliated Hospital of Anhui Medical University, Hefei 230022, People's Republic of China
| | - Ziqing Fang
- Department of General Surgery, the First Affiliated Hospital of Anhui Medical University, Hefei 230022, People's Republic of China
| | - Yongxiang Li
- Department of General Surgery, the First Affiliated Hospital of Anhui Medical University, Hefei 230022, People's Republic of China
| |
Collapse
|
2
|
Alshawaqfeh M, Al Kawam A, Serpedin E, Datta A. Robust Recurrent CNV Detection in the Presence of Inter-Subject Variability. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1056-1067. [PMID: 30387737 DOI: 10.1109/tcbb.2018.2878560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The study of recurrent copy number variations (CNVs) plays an important role in understanding the onset and evolution of complex diseases such as cancer. Array-based comparative genomic hybridization (aCGH) is a widely used microarray based technology for identifying CNVs. However, due to high noise levels and inter-sample variability, detecting recurrent CNVs from aCGH data remains a challenging topic. This paper proposes a novel method for identification of the recurrent CNVs. In the proposed method, the noisy aCGH data is modeled as the superposition of three matrices: a full-rank matrix of weighted piece-wise generating signals accounting for the clean aCGH data, a Gaussian noise matrix to model the inherent experimentation errors and other sources of error, and a sparse matrix to capture the sparse inter-sample (sample-specific) variations. We demonstrated the ability of our method to separate accurately recurrent CNVs from sample-specific variations and noise in both simulated (artificial) data and real data. The proposed method produced more accurate results than current state-of-the-art methods used in recurrent CNV detection and exhibited robustness to noise and sample-specific variations.
Collapse
|
3
|
Wiedenhoeft J, Cagan A, Kozhemyakina R, Gulevich R, Schliep A. Bayesian localization of CNV candidates in WGS data within minutes. Algorithms Mol Biol 2019; 14:20. [PMID: 31572486 PMCID: PMC6757390 DOI: 10.1186/s13015-019-0154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 08/08/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. RESULTS In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler. CONCLUSIONS Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop.
Collapse
|
4
|
Toutounji H, Durstewitz D. Detecting Multiple Change Points Using Adaptive Regression Splines With Application to Neural Recordings. Front Neuroinform 2018; 12:67. [PMID: 30349472 PMCID: PMC6187984 DOI: 10.3389/fninf.2018.00067] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 09/11/2018] [Indexed: 11/13/2022] Open
Abstract
Time series, as frequently the case in neuroscience, are rarely stationary, but often exhibit abrupt changes due to attractor transitions or bifurcations in the dynamical systems producing them. A plethora of methods for detecting such change points in time series statistics have been developed over the years, in addition to test criteria to evaluate their significance. Issues to consider when developing change point analysis methods include computational demands, difficulties arising from either limited amount of data or a large number of covariates, and arriving at statistical tests with sufficient power to detect as many changes as contained in potentially high-dimensional time series. Here, a general method called Paired Adaptive Regressors for Cumulative Sum is developed for detecting multiple change points in the mean of multivariate time series. The method's advantages over alternative approaches are demonstrated through a series of simulation experiments. This is followed by a real data application to neural recordings from rat medial prefrontal cortex during learning. Finally, the method's flexibility to incorporate useful features from state-of-the-art change point detection techniques is discussed, along with potential drawbacks and suggestions to remedy them.
Collapse
Affiliation(s)
- Hazem Toutounji
- Department of Theoretical Neuroscience, Medical Faculty Mannheim, Bernstein Center for Computational Neuroscience, Central Institute of Mental Health, Heidelberg University, Mannheim, Germany
| | - Daniel Durstewitz
- Department of Theoretical Neuroscience, Medical Faculty Mannheim, Bernstein Center for Computational Neuroscience, Central Institute of Mental Health, Heidelberg University, Mannheim, Germany
- Faculty of Physics and Astronomy, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
5
|
Malekpour SA, Pezeshk H, Sadeghi M. MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples. Sci Rep 2018; 8:4009. [PMID: 29507384 PMCID: PMC5838159 DOI: 10.1038/s41598-018-22323-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 02/16/2018] [Indexed: 01/23/2023] Open
Abstract
Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
- Department of Mathematics and Statistics, Concordia University, Montreal, Canada.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
6
|
Abstract
CNV detection requires a high-quality segmentation of genomic data. In many WGS experiments, sample and control are sequenced together in a multiplexed fashion using DNA barcoding for economic reasons. Using the differential read depth of these two conditions cancels out systematic additive errors. Due to this detrending, the resulting data is appropriate for inference using a hidden Markov model (HMM), arguably one of the principal models for labeled segmentation. However, while the usual frequentist approaches such as Baum-Welch are problematic for several reasons, they are often preferred to Bayesian HMM inference, which normally requires prohibitively long running times and exceeds a typical user's computational resources on a genome scale data. HaMMLET solves this problem using a dynamic wavelet compression scheme, which makes Bayesian segmentation of WGS data feasible on standard consumer hardware.
Collapse
Affiliation(s)
- John Wiedenhoeft
- Chalmers University of Technology, Gothenburg, Sweden.
- Rutgers University, New Brunswick, NJ, USA.
| | | |
Collapse
|
7
|
Fan Z, Mackey L. Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences. Ann Appl Stat 2017. [DOI: 10.1214/17-aoas1075] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Malekpour SA, Pezeshk H, Sadeghi M. PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities. BMC Bioinformatics 2016; 18:30. [PMID: 27809781 PMCID: PMC5445519 DOI: 10.1186/s12859-016-1296-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2016] [Accepted: 10/20/2016] [Indexed: 11/23/2022] Open
Abstract
Background Copy Number Variation (CNV) is envisaged to be a major source of large structural variations in the human genome. In recent years, many studies apply Next Generation Sequencing (NGS) data for the CNV detection. However, still there is a necessity to invent more accurate computational tools. Results In this study, mate pair NGS data are used for the CNV detection in a Hidden Markov Model (HMM). The proposed HMM has position specific emission probabilities, i.e. a Gaussian mixture distribution. Each component in the Gaussian mixture distribution captures a different type of aberration that is observed in the mate pairs, after being mapped to the reference genome. These aberrations may include any increase (decrease) in the insertion size or change in the direction of mate pairs that are mapped to the reference genome. This HMM with Position-Specific Emission probabilities (PSE-HMM) is utilized for the genome-wide detection of deletions and tandem duplications. The performance of PSE-HMM is evaluated on a simulated dataset and also on a real data of a Yoruban HapMap individual, NA18507. Conclusions PSE-HMM is effective in taking observation dependencies into account and reaches a high accuracy in detecting genome-wide CNVs. MATLAB programs are available at http://bs.ipm.ir/softwares/PSE-HMM/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1296-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, 14155-6455, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, 14155-6455, Iran. .,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
9
|
Chi C, Ajwad R, Kuang Q, Hu P. A Novel Graph-based Algorithm to Infer Recurrent Copy Number Variations in Cancer. Cancer Inform 2016; 15:43-50. [PMID: 27773988 PMCID: PMC5063805 DOI: 10.4137/cin.s39368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 09/08/2016] [Accepted: 09/09/2016] [Indexed: 12/17/2022] Open
Abstract
Many cancers have been linked to copy number variations (CNVs) in the genomic DNA. Although there are existing methods to analyze CNVs from individual samples, cancer-causing genes are more frequently discovered in regions where CNVs are common among tumor samples, also known as recurrent CNVs. Integrating multiple samples and locating recurrent CNV regions remain a challenge, both computationally and conceptually. We propose a new graph-based algorithm for identifying recurrent CNVs using the maximal clique detection technique. The algorithm has an optimal solution, which means all maximal cliques can be identified, and guarantees that the identified CNV regions are the most frequent and that the minimal regions have been delineated among tumor samples. The algorithm has successfully been applied to analyze a large cohort of breast cancer samples and identified some breast cancer-associated genes and pathways.
Collapse
Affiliation(s)
- Chen Chi
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Centre for Healthcare Innovation, Winnipeg Regional Health Authority/University of Manitoba, Winnipeg, Canada
| | - Rasif Ajwad
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Department of Computer Science, University of Manitoba, Winnipeg, Canada
| | - Qin Kuang
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Centre for Healthcare Innovation, Winnipeg Regional Health Authority/University of Manitoba, Winnipeg, Canada; Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
10
|
Nieto-Barajas L, Ji Y, Baladandayuthapani V. A semiparametric Bayesian model for comparing DNA copy numbers. BRAZ J PROBAB STAT 2016; 30:345-365. [PMID: 37799327 PMCID: PMC10552905 DOI: 10.1214/15-bjps283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis of multiple samples from patients with different subtypes of a disease. While the biological interest is to identify regions of differential copy numbers across disease subtypes, our model also includes sample-specific random effects that account for copy number alterations between different samples in the same disease subtype. We model the subtype and sample-specific effects using a random effects mixture model. The subtype's main effects are characterized by a mixture distribution whose components are assigned Dirichlet process priors. The performance of the proposed model is examined using simulated data as well as a breast cancer genomic data set.
Collapse
Affiliation(s)
- Luis Nieto-Barajas
- Department of Statistics, ITAM, Rio Hondo 1, Progreso Tizapan, 01080 Mexico, D.F. Mexico
| | - Yuan Ji
- Biomedical Informatics, NorthShore University HealthSystem and University of Chicago, 1001 University Place, Evanston, Illinois 60201, USA
| | | |
Collapse
|
11
|
Malekpour SA, Pezeshk H, Sadeghi M. MGP-HMM: Detecting genome-wide CNVs using an HMM for modeling mate pair insertion sizes and read counts. Math Biosci 2016; 279:53-62. [PMID: 27424951 DOI: 10.1016/j.mbs.2016.07.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 06/12/2016] [Accepted: 07/10/2016] [Indexed: 01/02/2023]
Abstract
MOTIVATION Association of Copy Number Variation (CNV) with schizophrenia, autism, developmental disabilities and fatal diseases such as cancer is verified. Recent developments in Next Generation Sequencing (NGS) have facilitated the CNV studies. However, many of the current CNV detection tools are not capable of discriminating tandem duplication from non-tandem duplications. RESULTS In this study, we propose MGP-HMM as a tool which besides detecting genome-wide deletions discriminates tandem duplications from non-tandem duplications. MGP-HMM takes mate pair abnormalities into account and predicts the digitized number of tandem or non-tandem copies. Abnormalities in the mate pair directions and insertion sizes, after being mapped to the reference genome, are elucidated using a Hidden Markov Model (HMM). For this purpose, a Mixture Gaussian density with time-dependent parameters is applied for emitting mate pair insertion sizes from HMM states. Indeed, depending on observed abnormalities in mate pair insertion size or its orientation, each component in the mixture density will have different parameters. MGP-HMM also applies a Poisson distribution for modeling read depth data. This parametric modeling of the mate pair reads enables us to estimate the length of CNVs precisely, which is an advantage over methods which rely only on read depth approach for the CNV detection. Hidden state of the proposed HMM is the digitized copy number of a genomic segment and states correspond to the multipliers of the mixture Gaussian components. The accuracy of our model is validated on a set of next generation sequencing real and simulated data and is compared to other tools.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
| |
Collapse
|
12
|
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. PLoS Comput Biol 2016; 12:e1004871. [PMID: 27177143 PMCID: PMC4866742 DOI: 10.1371/journal.pcbi.1004871] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 03/14/2016] [Indexed: 11/22/2022] Open
Abstract
By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings.
Collapse
|
13
|
Jang H, Hur Y, Lee H. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data. Sci Rep 2016; 6:25582. [PMID: 27156852 PMCID: PMC4860638 DOI: 10.1038/srep25582] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 04/19/2016] [Indexed: 11/18/2022] Open
Abstract
DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes.
Collapse
Affiliation(s)
- Ho Jang
- Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science, Gwangju, 500-712, South Korea
| | - Youngmi Hur
- Yonsei University, Department of Mathematics, Seoul, 120-749, South Korea
| | - Hyunju Lee
- Gwangju Institute of Science and Technology, School of Electrical Engineering and Computer Science, Gwangju, 500-712, South Korea
| |
Collapse
|
14
|
|
15
|
Gao X. Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations. BMC Bioinformatics 2015; 16:407. [PMID: 26652207 PMCID: PMC4676147 DOI: 10.1186/s12859-015-0835-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 11/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. RESULT We develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors. CONCLUSION Our numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated.
Collapse
Affiliation(s)
- Xiaoli Gao
- Department of Mathematics and Statistics, University of North Carolina at Greensboro, 1400 Spring Garden St, Greensoboro, NC, USA.
| |
Collapse
|
16
|
Zhou X, Liu J, Wan X, Yu W. Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics 2014; 30:1943-9. [PMID: 24642062 DOI: 10.1093/bioinformatics/btu131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The post-genome era sees urgent need for more novel approaches to extracting useful information from the huge amount of genetic data. The identification of recurrent copy number variations (CNVs) from array-based comparative genomic hybridization (aCGH) data can help understand complex diseases, such as cancer. Most of the previous computational methods focused on single-sample analysis or statistical testing based on the results of single-sample analysis. Finding recurrent CNVs from multi-sample data remains a challenging topic worth further study. RESULTS We present a general and robust method to identify recurrent CNVs from multi-sample aCGH profiles. We express the raw dataset as a matrix and demonstrate that recurrent CNVs will form a low-rank matrix. Hence, we formulate the problem as a matrix recovering problem, where we aim to find a piecewise-constant and low-rank approximation (PLA) to the input matrix. We propose a convex formulation for matrix recovery and an efficient algorithm to globally solve the problem. We demonstrate the advantages of PLA compared with alternative methods using synthesized datasets and two breast cancer datasets. The experimental results show that PLA can successfully reconstruct the recurrent CNV patterns from raw data and achieve better performance compared with alternative methods under a wide range of scenarios. AVAILABILITY AND IMPLEMENTATION The MATLAB code is available at http://bioinformatics.ust.hk/pla.zip.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Jiming Liu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Xiang Wan
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| |
Collapse
|
17
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
18
|
van Dyk E, Reinders MJT, Wessels LFA. A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control. Nucleic Acids Res 2013; 41:e100. [PMID: 23476020 PMCID: PMC3643574 DOI: 10.1093/nar/gkt155] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Tumor formation is partially driven by DNA copy number changes, which are typically measured using array comparative genomic hybridization, SNP arrays and DNA sequencing platforms. Many techniques are available for detecting recurring aberrations across multiple tumor samples, including CMAR, STAC, GISTIC and KC-SMART. GISTIC is widely used and detects both broad and focal (potentially overlapping) recurring events. However, GISTIC performs false discovery rate control on probes instead of events. Here we propose Analytical Multi-scale Identification of Recurrent Events, a multi-scale Gaussian smoothing approach, for the detection of both broad and focal (potentially overlapping) recurring copy number alterations. Importantly, false discovery rate control is performed analytically (no need for permutations) on events rather than probes. The method does not require segmentation or calling on the input dataset and therefore reduces the potential loss of information due to discretization. An important characteristic of the approach is that the error rate is controlled across all scales and that the algorithm outputs a single profile of significant events selected from the appropriate scales. We perform extensive simulations and showcase its utility on a glioblastoma SNP array dataset. Importantly, ADMIRE detects focal events that are missed by GISTIC, including two events involving known glioma tumor-suppressor genes: CDKN2C and NF1.
Collapse
Affiliation(s)
- Ewald van Dyk
- Bioinformatics and Statistics group, Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | | | | |
Collapse
|
19
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
20
|
Abstract
This chapter summarizes the current knowledge on gene copy number changes found in lung tumors, and their application in the diagnosis, prognostication, and prediction of response to chemotherapy. Examples of the identification of specific "driver" oncogenes within amplified DNA segments are described. A model of how array-CGH could be integrated clinically into the routine workup of lung cancers in clinical laboratory is proposed.
Collapse
Affiliation(s)
- Kenneth J Craddock
- Department of Pathology, Toronto General Hospital University Health Network, Toronto, ON, Canada.
| | | | | |
Collapse
|
21
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|
22
|
Nilsen G, Liestøl K, Van Loo P, Moen Vollan HK, Eide MB, Rueda OM, Chin SF, Russell R, Baumbusch LO, Caldas C, Børresen-Dale AL, Lingjaerde OC. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 2012; 13:591. [PMID: 23442169 PMCID: PMC3582591 DOI: 10.1186/1471-2164-13-591] [Citation(s) in RCA: 191] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 10/15/2012] [Indexed: 12/15/2022] Open
Abstract
Background Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. Results A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. Conclusions The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
Collapse
Affiliation(s)
- Gro Nilsen
- Biomedical Informatics, Dept of Informatics, University of Oslo, Oslo, Norway
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Yuan X, Yu G, Hou X, Shih IM, Clarke R, Zhang J, Hoffman EP, Wang RR, Zhang Z, Wang Y. Genome-wide identification of significant aberrations in cancer genome. BMC Genomics 2012; 13:342. [PMID: 22839576 PMCID: PMC3428679 DOI: 10.1186/1471-2164-13-342] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 07/27/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme. RESULTS We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies. CONCLUSIONS Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open-source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at http://www.cbil.ece.vt.edu/software.htm.
Collapse
Affiliation(s)
- Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, P R China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Yuan X, Zhang J, Yang L, Zhang S, Chen B, Geng Y, Wang Y. TAGCNA: a method to identify significant consensus events of copy number alterations in cancer. PLoS One 2012; 7:e41082. [PMID: 22815924 PMCID: PMC3399811 DOI: 10.1371/journal.pone.0041082] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 06/17/2012] [Indexed: 01/20/2023] Open
Abstract
Somatic copy number alteration (CNA) is a common phenomenon in cancer genome. Distinguishing significant consensus events (SCEs) from random background CNAs in a set of subjects has been proven to be a valuable tool to study cancer. In order to identify SCEs with an acceptable type I error rate, better computational approaches should be developed based on reasonable statistics and null distributions. In this article, we propose a new approach named TAGCNA for identifying SCEs in somatic CNAs that may encompass cancer driver genes. TAGCNA employs a peel-off permutation scheme to generate a reasonable null distribution based on a prior step of selecting tag CNA markers from the genome being considered. We demonstrate the statistical power of TAGCNA on simulated ground truth data, and validate its applicability using two publicly available cancer datasets: lung and prostate adenocarcinoma. TAGCNA identifies SCEs that are known to be involved with proto-oncogenes (e.g. EGFR, CDK4) and tumor suppressor genes (e.g. CDKN2A, CDKN2B), and provides many additional SCEs with potential biological relevance in these data. TAGCNA can be used to analyze the significance of CNAs in various cancers. It is implemented in R and is freely available at http://tagcna.sourceforge.net/.
Collapse
Affiliation(s)
- Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
- * E-mail: (JZ); (YW)
| | - Liying Yang
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Shengli Zhang
- Department of Mathematics, Xidian University, Xi'an, People’s Republic of China
| | - Baodi Chen
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Yaojun Geng
- School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, United States of America
- * E-mail: (JZ); (YW)
| |
Collapse
|
25
|
Shen JJ, Zhang NR. Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas517] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Baladandayuthapani V, Ji Y, Talluri R, Nieto-Barajas LE, Morris JS. Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data. J Am Stat Assoc 2012; 105:1358-1375. [PMID: 21512611 DOI: 10.1198/jasa.2010.ap09250] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. We propose a hierarchical Bayesian random segmentation approach for modeling aCGH data that utilizes information across arrays from a common population to yield segments of shared copy number changes. These changes characterize the underlying population and allow us to compare different population aCGH profiles to assess which regions of the genome have differential alterations. Our method, referred to as BDSAcgh (Bayesian Detection of Shared Aberrations in aCGH), is based on a unified Bayesian hierarchical model that allows us to obtain probabilities of alteration states as well as probabilities of differential alteration that correspond to local false discovery rates. We evaluate the operating characteristics of our method via simulations and an application using a lung cancer aCGH data set.
Collapse
|
27
|
Jiang H, Zhu ZZ, Yu Y, Lin S, Hou L. Improved Statistical Analysis for Array CGH-Based DNA Copy Number Aberrations. Cancer Inform 2011; 10:249-58. [PMID: 22084565 PMCID: PMC3212864 DOI: 10.4137/cin.s8019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Array-based comparative genomic hybridization (aCGH) allows measuring DNA copy number at the whole genome scale. In cancer studies, one may be interested in identifying DNA copy number aberrations (CNAs) associated with certain clinicopathological characteristics such as cancer metastasis. We proposed to define test regions based on copy number pattern profiles across multiple samples, using either smoothed log2-ratio or discrete data of copy number gain/loss calls. Association test performed on the refined test regions instead of the probes has improved power due to reduced number of tests. We also compared three types of measurement of copy number levels, normalized log2-ratio, smoothed log2-ratio, and copy number gain or loss calls in statistical hypothesis testing. The relative strengths and weaknesses of the proposed method were demonstrated using both simulation studies and real data analysis of a liver cancer study.
Collapse
Affiliation(s)
- Hongmei Jiang
- Department of Statistics, Northwestern University, 2006 Sheridan Road, Evanston, IL 60208, USA
| | | | | | | | | |
Collapse
|
28
|
Mahmud MP, Schliep A. Fast MCMC sampling for hidden Markov Models to determine copy number variations. BMC Bioinformatics 2011; 12:428. [PMID: 22047014 PMCID: PMC3371636 DOI: 10.1186/1471-2105-12-428] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 11/02/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. RESULTS We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. CONCLUSIONS We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. AVAILABILITY An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.
Collapse
Affiliation(s)
- Md Pavel Mahmud
- Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA.
| | | |
Collapse
|
29
|
Cheung KJJ, Delaney A, Ben-Neriah S, Schein J, Lee T, Shah SP, Cheung D, Johnson NA, Mungall AJ, Telenius A, Lai B, Boyle M, Connors JM, Gascoyne RD, Marra MA, Horsman DE. High resolution analysis of follicular lymphoma genomes reveals somatic recurrent sites of copy-neutral loss of heterozygosity and copy number alterations that target single genes. Genes Chromosomes Cancer 2010; 49:669-81. [PMID: 20544841 DOI: 10.1002/gcc.20780] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
A multiplatform approach, including conventional cytogenetic techniques, BAC array comparative genomic hybridization, and Affymetrix 500K SNP arrays, was applied to the study of the tumor genomes of 25 follicular lymphoma biopsy samples with paired normal DNA samples to characterize balanced translocations, copy number imbalances, and copy-neutral loss of heterozygosity (cnLOH). In addition to the t(14;18), eight unique balanced translocations were found. Commonly reported FL-associated copy number regions were revealed including losses of 1p32-36, 6q, and 10q, and gains of 1q, 6p, 7, 12, 18, and X. The most frequent regions affected by copy-neutral loss of heterozygosity were 1p36.33 (28%), 6p21.3 (20%), 12q21.2-q24.33 (16%), and 16p13.3 (24%). We also identified by SNP analysis, 45 aberrant regions that each affected one gene, including CDKN2A, CDKN2B, FHIT, KIT, PEX14, and PTPRD, which were associated with canonical pathways involved in tumor development. This study illustrates the power of using complementary high-resolution platforms on paired tumor/normal specimens and computational analysis to provide potential insights into the significance of single-gene somatic aberrations in FL tumorigenesis.
Collapse
Affiliation(s)
- K-John J Cheung
- Center for Lymphoid Cancer, British Columbia Cancer Agency, Vancouver, BC, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Zhang NR, Siegmund DO, Ji H, Li JZ. Detecting simultaneous changepoints in multiple sequences. Biometrika 2010; 97:631-645. [PMID: 22822250 DOI: 10.1093/biomet/asq025] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
Collapse
Affiliation(s)
- Nancy R Zhang
- Department of Statistics , Stanford University , 390 Serra Mall, Stanford, California 94305-4065 , U.S.A.
| | | | | | | |
Collapse
|
31
|
van de Wiel MA, Picard F, van Wieringen WN, Ylstra B. Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief Bioinform 2010; 12:10-21. [PMID: 20172948 DOI: 10.1093/bib/bbq004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analysis of DNA copy number profiles requires methods tailored to the specific nature of these data. The number of available data analysis methods has grown enormously in the last 5 years. We discuss the typical characteristics of DNA copy number data, as measured by microarray technology and review the extensive literature on preprocessing methods such as segmentation and calling. Subsequently, the focus narrows to applications of DNA copy number in cancer, in particular, several downstream analyses of multi-sample data sets such as testing, clustering and classification. Finally, we look ahead: what should we prepare for and which methodology-related topics may deserve attention in the near future?
Collapse
Affiliation(s)
- Mark A van de Wiel
- Department of Epidemiology & Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
32
|
Leunen K, Gevaert O, Daemen A, Vanspauwen V, Michils G, De Moor B, Moerman P, Vergote I, Legius E. Recurrent copy number alterations in BRCA1-mutated ovarian tumors alter biological pathways. Hum Mutat 2010; 30:1693-702. [PMID: 19802895 DOI: 10.1002/humu.21135] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Array CGH was used to identify recurrent copy number alterations (RCNA) characteristic of either BRCA1-related or sporadic ovarian cancer. After preprocessing, both groups of patients were modeled using a recurrent Hidden Markov Model to detect RCNA. RCNA with a probability higher than 80% were called. After removing RCNA present in both groups, the genes present in the remaining RCNA were investigated for enrichment of pathways from external databases. More RCNA were observed in the BRCA1 group, and they display more losses than gains compared to the sporadic group. When focusing on the type of RCNA, no significant difference in length was seen for the gains, but there was a statistically significant difference for the losses. In the sporadic group, a great proportion of the altered regions contain genes known to have a function in cell adhesion and complement activation, whereas the BRCA1 samples are characterized by alterations in the HOX genes, metalloproteinases, tumor suppressor genes, and the estrogen-signaling pathways. We conclude that BRCA1 ovarian tumors present a different type, number, and length of RCNA; a huge amount of the genome is lost, resulting in important genomic instability. Moreover, important biological pathways are altered differentially when compared to the sporadic group.
Collapse
Affiliation(s)
- Karin Leunen
- Division of Gynecological Oncology, Department of Obstetrics and Gynecology, University Hospitals Leuven, Katholieke Universiteit Leuven, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Zhang NR. DNA Copy Number Profiling in Normal and Tumor Genomes. FRONTIERS IN COMPUTATIONAL AND SYSTEMS BIOLOGY 2010. [DOI: 10.1007/978-1-84996-196-7_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
34
|
Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, Borecki IB, Province MA. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. ACTA ACUST UNITED AC 2009; 26:464-9. [PMID: 20031968 DOI: 10.1093/bioinformatics/btp708] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Liu LX, Lee NP, Chan VW, Xue W, Zender L, Zhang C, Mao M, Dai H, Wang XL, Xu MZ, Lee TK, Ng IO, Chen Y, Kung HF, Lowe SW, Poon RTP, Wang JH, Luk JM. Targeting cadherin-17 inactivates Wnt signaling and inhibits tumor growth in liver carcinoma. Hepatology 2009; 50:1453-63. [PMID: 19676131 PMCID: PMC3328302 DOI: 10.1002/hep.23143] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
UNLABELLED Hepatocellular carcinoma (HCC) is a lethal malignancy for which there are no effective therapies. To develop rational therapeutic approaches for treating this disease, we are performing proof-of-principle studies targeting molecules crucial for the development of HCC. Here, we show that cadherin-17 (CDH17) adhesion molecule is up-regulated in human liver cancers and can transform premalignant liver progenitor cells to produce liver carcinomas in mice. RNA interference-mediated knockdown of CDH17 inhibited proliferation of both primary and highly metastatic HCC cell lines in vitro and in vivo. The antitumor mechanisms underlying CDH17 inhibition involve inactivation of Wnt signaling, because growth inhibition and cell death were accompanied by relocalization of beta-catenin to the cytoplasm and a concomitant reduction in cyclin D1 and an increase in retinoblastoma. CONCLUSION Our results identify CDH17 as a novel oncogene in HCC and suggest that CDH17 is a biomarker and attractive therapeutic target for this aggressive malignancy.
Collapse
Affiliation(s)
- Ling Xiao Liu
- Department of Surgery and Center for Cancer Research, The University of Hong Kong, Queen Mary Hospital, Pokfulam, Hong Kong
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Rueda OM, Diaz-Uriarte R. Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously. BMC Bioinformatics 2009; 10:308. [PMID: 19775444 PMCID: PMC2760535 DOI: 10.1186/1471-2105-10-308] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2009] [Accepted: 09/23/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Alterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods. RESULTS We have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions. CONCLUSION These methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genes.
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
- Breast Cancer Functional Genomics, Cancer Research UK, Cambridge, UK
| | - Ramon Diaz-Uriarte
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
37
|
Rueda OM, Diaz-Uriarte R. RJaCGH: Bayesian analysis of aCGH arrays for detecting copy number changes and recurrent regions. Bioinformatics 2009; 25:1959-60. [PMID: 19420051 PMCID: PMC2712338 DOI: 10.1093/bioinformatics/btp307] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Revised: 04/20/2009] [Accepted: 04/30/2009] [Indexed: 11/29/2022] Open
Abstract
SUMMARY Several methods have been proposed to detect copy number changes and recurrent regions of copy number variation from aCGH, but few methods return probabilities of alteration explicitly, which are the direct answer to the question 'is this probe/region altered?' RJaCGH fits a Non-Homogeneous Hidden Markov model to the aCGH data using Markov Chain Monte Carlo with Reversible Jump, and returns the probability that each probe is gained or lost. Using these probabilites, recurrent regions (over sets of individuals) of copy number alteration can be found. AVAILABILITY RJaCGH is available as an R package from CRAN repositories (e.g. http://cran.r-project.org/web/packages).
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural Biology and Biocomputing Programme, Spanish National Cancer Center (CNIO), Madrid 28029, Spain.
| | | |
Collapse
|
38
|
Shah SP, Cheung KJ, Johnson NA, Alain G, Gascoyne RD, Horsman DE, Ng RT, Murphy KP. Model-based clustering of array CGH data. ACTA ACUST UNITED AC 2009; 25:i30-8. [PMID: 19478003 PMCID: PMC2687959 DOI: 10.1093/bioinformatics/btp205] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population. Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile, which contains the locations of unusually frequent alterations. The profile is represented as a hidden Markov model. Samples are assigned to clusters based on their similarity to the cluster's profile. We simultaneously infer the cluster assignments and the cluster profiles using an expectation maximization-like algorithm. We show, using a realistic simulation study, that our method is significantly more accurate than standard clustering techniques. We then apply our method to two clinical datasets. In particular, we examine previously reported aCGH data from a cohort of 106 follicular lymphoma patients, and discover clusters that are known to correspond to clinically relevant subgroups. In addition, we examine a cohort of 92 diffuse large B-cell lymphoma patients, and discover previously unreported clusters of biological interest which have inspired followup clinical research on an independent cohort. Availability: Software and synthetic datasets are available at http://www.cs.ubc.ca/∼sshah/acgh as part of the CNA-HMMer package. Contact:sshah@bccrc.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sohrab P Shah
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Choi H, Nesvizhskii AI, Ghosh D, Qin ZS. Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data. ACTA ACUST UNITED AC 2009; 25:1715-21. [PMID: 19447789 DOI: 10.1093/bioinformatics/btp312] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has been widely used. Recently, massively parallel sequencing coupled with ChIP experiments (ChIP-seq) has been increasingly used as an alternative to ChIP-chip, offering cost-effective genome-wide coverage and resolution up to a single base pair. For many well-studied TFs, both ChIP-seq and ChIP-chip experiments have been applied and their data are publicly available. Previous analyses have revealed substantial technology-specific binding signals despite strong correlation between the two sets of results. Therefore, it is of interest to see whether the two data sources can be combined to enhance the detection of TFBS. RESULTS In this work, hierarchical hidden Markov model (HHMM) is proposed for combining data from ChIP-seq and ChIP-chip. In HHMM, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show the advantage of HHMM when data from both technologies co-exist. Analysis of two well-studied TFs, NRSF and CCCTC-binding factor (CTCF), also suggests that HHMM yields improved TFBS identification in comparison to analyses using individual data sources or a simple merger of the two. AVAILABILITY Source code for the software ChIPmeta is freely available for download at http://www.umich.edu/~hwchoi/HHMMsoftware.zip, implemented in C and supported on linux.
Collapse
Affiliation(s)
- Hyungwon Choi
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
40
|
Wu LY, Chipman HA, Bull SB, Briollais L, Wang K. A Bayesian segmentation approach to ascertain copy number variations at the population level. ACTA ACUST UNITED AC 2009; 25:1669-79. [PMID: 19389735 DOI: 10.1093/bioinformatics/btp270] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share. RESULTS In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs. AVAILABILITY R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML
Collapse
Affiliation(s)
- Long Yang Wu
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
| | | | | | | | | |
Collapse
|
41
|
Shay T, Lambiv WL, Reiner-Benaim A, Hegi ME, Domany E. Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes. Cancer Inform 2009; 7:91-104. [PMID: 19352461 PMCID: PMC2664703 DOI: 10.4137/cin.s2144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Many types of tumors exhibit characteristic chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletions are also observed. Typically, a region that is aberrant in more tumors, or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the "volume" associated with an aberration, as the product of three factors: (a) fraction of patients with the aberration, (b) the aberration's length and (c) its amplitude. Our algorithm compares the values of V derived from the real data to a null distribution obtained by permutations, and yields the statistical significance (p-value) of the measured value of V. We detected genetic locations that were significantly aberrant, and combine them with chromosomal arm status (gain/loss) to create a succinct fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co-occurring or mutually exclusive. We apply the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.
Collapse
Affiliation(s)
- Tal Shay
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Wanyu L. Lambiv
- Laboratory of Brain Tumor Biology and Genetics, Neurosurgery, University Hospital Lausanne (CHUV), Lausanne, Switzerland
| | - Anat Reiner-Benaim
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
- Department of Statistics, University of Haifa, Haifa, Israel
| | - Monika E. Hegi
- Laboratory of Brain Tumor Biology and Genetics, Neurosurgery, University Hospital Lausanne (CHUV), Lausanne, Switzerland
- National Center for Competence Research Molecular Oncology, ISREC, Epalinges, Switzerland
| | - Eytan Domany
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
42
|
Han ST, Kang HC, Choi HS, Jang MS. A Study on Development of Scoring Campaign System. KOREAN JOURNAL OF APPLIED STATISTICS 2009. [DOI: 10.5351/kjas.2009.22.1.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
43
|
Kim BS, Kim SC. A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment. KOREAN JOURNAL OF APPLIED STATISTICS 2009. [DOI: 10.5351/kjas.2009.22.1.115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
44
|
Barutcuoglu Z, Airoldi EM, Dumeaux V, Schapire RE, Troyanskaya OG. Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields. Bioinformatics 2008; 25:1307-13. [PMID: 19052061 PMCID: PMC2677736 DOI: 10.1093/bioinformatics/btn585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome. RESULTS Here, we present the heterogeneous hidden conditional random field, a new integrated array-CGH analysis method for jointly classifying tumors, inferring copy numbers and identifying clinically relevant positions in recurrent alteration regions. By capturing the sequentiality as well as the locality of changes, our integrated model provides better noise reduction, and achieves more relevant gene retrieval and more accurate classification than existing methods. We provide an efficient L1-regularized discriminative training algorithm, which notably selects a small set of candidate genes most likely to be clinically relevant and driving the recurrent amplicons of importance. Our method thus provides unbiased starting points in deciding which genomic regions and which genes in particular to pursue for further examination. Our experiments on synthetic data and real genomic cancer prediction data show that our method is superior, both in prediction accuracy and relevant feature discovery, to existing methods. We also demonstrate that it can be used to generate novel biological hypotheses for breast cancer.
Collapse
Affiliation(s)
- Zafer Barutcuoglu
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | | | | | | | | |
Collapse
|
45
|
Taylor BS, Barretina J, Socci ND, Decarolis P, Ladanyi M, Meyerson M, Singer S, Sander C. Functional copy-number alterations in cancer. PLoS One 2008; 3:e3179. [PMID: 18784837 PMCID: PMC2527508 DOI: 10.1371/journal.pone.0003179] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2008] [Accepted: 08/19/2008] [Indexed: 11/24/2022] Open
Abstract
Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer.
Collapse
Affiliation(s)
- Barry S Taylor
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Genome-wide profiling of follicular lymphoma by array comparative genomic hybridization reveals prognostically significant DNA copy number imbalances. Blood 2008; 113:137-48. [PMID: 18703704 DOI: 10.1182/blood-2008-02-140616] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The secondary genetic events associated with follicular lymphoma (FL) progression are not well defined. We applied genome-wide BAC array comparative genomic hybridization to 106 diagnostic biopsies of FL to characterize regional genomic imbalances. Using an analytical approach that defined regions of copy number change as intersections between visual annotations and a Hidden Markov model-based algorithm, we identified 71 regional alterations that were recurrent in at least 10% of cases. These ranged in size from approximately 200 kb to 44 Mb, affecting chromosomes 1, 5, 6, 7, 8, 10, 12, 17, 18, 19, and 22. We also demonstrated by cluster analysis that 46.2% of the 106 cases could be sub-grouped based on the presence of +1q, +6p/6q-, +7, or +18. Survival analysis showed that 21 of the 71 regions correlated significantly with inferior overall survival (OS). Of these 21 regions, 16 were independent predictors of OS using a multivariate Cox model that included the international prognostic index (IPI) score. Two of these 16 regions (1p36.22-p36.33 and 6q21-q24.3) were also predictors of transformation risk and independent of IPI. These prognostic features may be useful to identify high-risk patients as candidates for risk-adapted therapies.
Collapse
|
47
|
Affiliation(s)
- Edoardo M Airoldi
- Lewis-Sigler Institute for Integrative Genomics and Computer Science Department, Princeton University, Princeton, New Jersey, United States of America.
| |
Collapse
|