1
|
Esteves L, Caramelo F, Ribeiro IP, Carreira IM, de Melo JB. Probability distribution of copy number alterations along the genome: an algorithm to distinguish different tumour profiles. Sci Rep 2020; 10:14868. [PMID: 32913269 PMCID: PMC7483770 DOI: 10.1038/s41598-020-71859-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 08/13/2020] [Indexed: 11/11/2022] Open
Abstract
Copy number alterations (CNAs) comprise deletions or amplifications of fragments of genomic material that are particularly common in cancer and play a major contribution in its development and progression. High resolution microarray-based genome-wide technologies have been widely used to detect CNAs, generating complex datasets that require further steps to allow for the determination of meaningful results. In this work, we propose a methodology to determine common regions of CNAs from these datasets, that in turn are used to infer the probability distribution of disease profiles in the population. This methodology was validated using simulated data and assessed using real data from Head and Neck Squamous Cell Carcinoma and Lung Adenocarcinoma, from the TCGA platform. Probability distribution profiles were produced allowing for the distinction between different phenotypic groups established within that cohort. This method may be used to distinguish between groups in the diseased population, within well-established degrees of confidence. The application of such methods may be of greater value in the clinical context both as a diagnostic or prognostic tool and, even as a useful way for helping to establish the most adequate treatment and care plans.
Collapse
Affiliation(s)
- Luísa Esteves
- Cytogenetics and Genomics Laboratory, Faculty of Medicine, University of Coimbra, Polo Ciências da Saúde, 3000-354, Coimbra, Portugal
| | - Francisco Caramelo
- Laboratory of Biostatistics and Medical Informatics, IBILI-Faculty of Medicine, University of Coimbra, 3000-354, Coimbra, Portugal
| | - Ilda Patrícia Ribeiro
- Cytogenetics and Genomics Laboratory, Faculty of Medicine, University of Coimbra, Polo Ciências da Saúde, 3000-354, Coimbra, Portugal.,iCBR-CIMAGO-Center of Investigation on Environment, Genetics and Oncobiology-Faculty of Medicine, University of Coimbra, Coimbra, Portugal
| | - Isabel M Carreira
- Cytogenetics and Genomics Laboratory, Faculty of Medicine, University of Coimbra, Polo Ciências da Saúde, 3000-354, Coimbra, Portugal.,iCBR-CIMAGO-Center of Investigation on Environment, Genetics and Oncobiology-Faculty of Medicine, University of Coimbra, Coimbra, Portugal
| | - Joana Barbosa de Melo
- Cytogenetics and Genomics Laboratory, Faculty of Medicine, University of Coimbra, Polo Ciências da Saúde, 3000-354, Coimbra, Portugal. .,iCBR-CIMAGO-Center of Investigation on Environment, Genetics and Oncobiology-Faculty of Medicine, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
2
|
Torabi A, Ordonez J, Su BB, Palmer L, Mao C, Lara KE, Rubin LP, Xu C. Novel Somatic Copy Number Alteration Identified for Cervical Cancer in the Mexican American Population. Med Sci (Basel) 2016; 4:medsci4030012. [PMID: 29083376 PMCID: PMC5635801 DOI: 10.3390/medsci4030012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 07/15/2016] [Accepted: 07/25/2016] [Indexed: 01/12/2023] Open
Abstract
Cervical cancer affects millions of Americans, but the rate for cervical cancer in the Mexican American is approximately twice that for non-Mexican Americans. The etiologies of cervical cancer are still not fully understood. A number of somatic mutations, including several copy number alterations (CNAs), have been identified in the pathogenesis of cervical carcinomas in non-Mexican Americans. Thus, the purpose of this study was to investigate CNAs in association with cervical cancer in the Mexican American population. We conducted a pilot study of genome-wide CNA analysis using 2.5 million markers in four diagnostic groups: reference (n = 125), low grade dysplasia (cervical intraepithelial neoplasia (CIN)-I, n = 4), high grade dysplasia (CIN-II and -III, n = 5) and invasive carcinoma (squamous cell carcinoma (SCC), n = 5) followed by data analyses using Partek. We observed a statistically-significant difference of CNA burden between case and reference groups of different sizes (>100 kb, 10-100 kb and 1-10 kb) of CNAs that included deletions and amplifications, e.g., a statistically-significant difference of >100 kb deletions was observed between the reference (6.6%) and pre-cancer and cancer (91.3%) groups. Recurrent aberrations of 98 CNA regions were also identified in cases only. However, none of the CNAs have an impact on cancer progression. A total of 32 CNA regions identified contained tumor suppressor genes and oncogenes. Moreover, the pathway analysis revealed endometrial cancer and estrogen signaling pathways associated with this cancer (p < 0.05) using Kyoto Encyclopedia of Genes and Genomes (KEGG). This is the first report of CNAs identified for cervical cancer in the U.S. Latino population using high density markers. We are aware of the small sample size in the study. Thus, additional studies with a larger sample are needed to confirm the current findings.
Collapse
Affiliation(s)
- Alireza Torabi
- Department of Pathology, TTUHSC, El Paso 79905, TX, USA.
| | - Javier Ordonez
- Department of Biomedical Science, TTUHSC, El Paso 79905, TX, USA.
| | - Brenda Bin Su
- Department of Internal Medicine, College of Medicine and Health Sciences, UAE University, Al-Ain 15551, UAE.
| | - Laura Palmer
- Department of Pediatrics, Texas Tech University Health Sciences Center (TTUHSC), El Paso 79905, TX, USA.
| | - Chunxiang Mao
- Department of Pediatrics, Texas Tech University Health Sciences Center (TTUHSC), El Paso 79905, TX, USA.
| | - Katherine E Lara
- Department of Pediatrics, Texas Tech University Health Sciences Center (TTUHSC), El Paso 79905, TX, USA.
| | - Lewis P Rubin
- Department of Pediatrics, Texas Tech University Health Sciences Center (TTUHSC), El Paso 79905, TX, USA.
| | - Chun Xu
- Department of Pediatrics, Texas Tech University Health Sciences Center (TTUHSC), El Paso 79905, TX, USA.
| |
Collapse
|
3
|
Pon JR, Marra MA. Driver and Passenger Mutations in Cancer. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2015; 10:25-50. [DOI: 10.1146/annurev-pathol-012414-040312] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Julia R. Pon
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
| | - Marco A. Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, Canada V5Z 1L3;
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada V6T 1Z4;
| |
Collapse
|
4
|
Rueda OM, Diaz-Uriarte R, Caldas C. Finding common regions of alteration in copy number data. Methods Mol Biol 2013; 973:339-53. [PMID: 23412800 DOI: 10.1007/978-1-62703-281-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
In this chapter, we review some recent methods designed for detecting recurrent copy number regions, that is, genomic regions that show evidence of being altered in a set of samples. We analyze Affymetrix SNP6 data from 87 Her2-type breast tumors from a recent study using three different methods, showing different definitions and features of common regions: studying heterogeneity in copy number profiles, refining candidates for driver oncogenes, and consolidating broad amplifications.
Collapse
Affiliation(s)
- Oscar M Rueda
- Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK.
| | | | | |
Collapse
|
5
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|
6
|
Park C, Ahn J, Yoon Y, Park S. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data. PLoS One 2011; 6:e26975. [PMID: 22073121 PMCID: PMC3205051 DOI: 10.1371/journal.pone.0026975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 10/07/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Jaegyoon Ahn
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Youngmi Yoon
- Division of Information Engineering, Gachon University of Medicine and Science, Incheon, South Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| |
Collapse
|
7
|
Morganella S, Pagnotta SM, Ceccarelli M. Finding recurrent copy number alterations preserving within-sample homogeneity. ACTA ACUST UNITED AC 2011; 27:2949-56. [PMID: 21873327 DOI: 10.1093/bioinformatics/btr488] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published. RESULTS We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study. AVAILABILITY GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia. CONTACT ceccarelli@unisannio.it; morganella@unisannio.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Science, University of Sannio, 82100, Benevento, Italy.
| | | | | |
Collapse
|
8
|
Picard F, Lebarbier E, Hoebeke M, Rigaill G, Thiam B, Robin S. Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 2011; 12:413-28. [PMID: 21209153 DOI: 10.1093/biostatistics/kxq076] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed in the univariate case to the joint analysis of multiple CGH profiles. Our contribution is multiple: we propose an integrated model to perform joint segmentation, normalization, and calling for multiple array CGH profiles. This model shows great flexibility, especially in the modeling of the wave effect that gives a likelihood framework to approaches proposed by others. We propose a new dynamic programming algorithm for break point positioning, as well as a model selection criterion based on a modified bayesian information criterion proposed in the univariate case. The performance of our method is assessed using simulated and real data sets. Our method is implemented in the R package cghseg.
Collapse
Affiliation(s)
- Franck Picard
- Laboratoire de Biometrie et Biologie Evolutive, UMR CNRS 5558 - Univ. Lyon 1, F-69622, Villeurbanne, France.
| | | | | | | | | | | |
Collapse
|
9
|
van de Wiel MA, Picard F, van Wieringen WN, Ylstra B. Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief Bioinform 2010; 12:10-21. [PMID: 20172948 DOI: 10.1093/bib/bbq004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analysis of DNA copy number profiles requires methods tailored to the specific nature of these data. The number of available data analysis methods has grown enormously in the last 5 years. We discuss the typical characteristics of DNA copy number data, as measured by microarray technology and review the extensive literature on preprocessing methods such as segmentation and calling. Subsequently, the focus narrows to applications of DNA copy number in cancer, in particular, several downstream analyses of multi-sample data sets such as testing, clustering and classification. Finally, we look ahead: what should we prepare for and which methodology-related topics may deserve attention in the near future?
Collapse
Affiliation(s)
- Mark A van de Wiel
- Department of Epidemiology & Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
10
|
Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, Borecki IB, Province MA. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. ACTA ACUST UNITED AC 2009; 26:464-9. [PMID: 20031968 DOI: 10.1093/bioinformatics/btp708] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|