1
|
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5482750. [PMID: 29279850 PMCID: PMC5723949 DOI: 10.1155/2017/5482750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 10/18/2017] [Indexed: 12/14/2022]
Abstract
Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Collapse
|
2
|
Brito I, Hupé P, Neuvial P, Barillot E. Stability-based comparison of class discovery methods for DNA copy number profiles. PLoS One 2013; 8:e81458. [PMID: 24339933 PMCID: PMC3855312 DOI: 10.1371/journal.pone.0081458] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Accepted: 10/22/2013] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm) for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported. RESULTS We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test [1]. Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by [2]) for input data representation, sim [3] or agree [4] for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles. AVAILABILITY The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP)[5]. The data sets used are publicly available from ACTuDB [6].
Collapse
Affiliation(s)
- Isabel Brito
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| | - Philippe Hupé
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
- CNRS UMR144, Paris, France
| | - Pierre Neuvial
- Laboratoire Statistique & Génome, Université d′Évry Val d′Essonne, UMR CNRS 8071-USC INRA, Évry, France
| | - Emmanuel Barillot
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| |
Collapse
|
3
|
Kumar N, Cai H, von Mering C, Baudis M. Specific genomic regions are differentially affected by copy number alterations across distinct cancer types, in aggregated cytogenetic data. PLoS One 2012; 7:e43689. [PMID: 22937079 PMCID: PMC3427184 DOI: 10.1371/journal.pone.0043689] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 07/23/2012] [Indexed: 12/02/2022] Open
Abstract
Background Regional genomic copy number alterations (CNA) are observed in the vast majority of cancers. Besides specifically targeting well-known, canonical oncogenes, CNAs may also play more subtle roles in terms of modulating genetic potential and broad gene expression patterns of developing tumors. Any significant differences in the overall CNA patterns between different cancer types may thus point towards specific biological mechanisms acting in those cancers. In addition, differences among CNA profiles may prove valuable for cancer classifications beyond existing annotation systems. Principal Findings We have analyzed molecular-cytogenetic data from 25579 tumors samples, which were classified into 160 cancer types according to the International Classification of Disease (ICD) coding system. When correcting for differences in the overall CNA frequencies between cancer types, related cancers were often found to cluster together according to similarities in their CNA profiles. Based on a randomization approach, distance measures from the cluster dendrograms were used to identify those specific genomic regions that contributed significantly to this signal. This approach identified 43 non-neutral genomic regions whose propensity for the occurrence of copy number alterations varied with the type of cancer at hand. Only a subset of these identified loci overlapped with previously implied, highly recurrent (hot-spot) cytogenetic imbalance regions. Conclusions Thus, for many genomic regions, a simple null-hypothesis of independence between cancer type and relative copy number alteration frequency can be rejected. Since a subset of these regions display relatively low overall CNA frequencies, they may point towards second-tier genomic targets that are adaptively relevant but not necessarily essential for cancer development.
Collapse
Affiliation(s)
- Nitin Kumar
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | | | | | | |
Collapse
|
4
|
Lai Y. On the adaptive partition approach to the detection of multiple change-points. PLoS One 2011; 6:e19754. [PMID: 21629694 PMCID: PMC3101215 DOI: 10.1371/journal.pone.0019754] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 04/15/2011] [Indexed: 12/02/2022] Open
Abstract
With an adaptive partition procedure, we can partition a “time
course” into consecutive non-overlapped intervals such that the population
means/proportions of the observations in two adjacent intervals are
significantly different at a given level . However, the
widely used recursive combination or partition procedures do not guarantee a
global optimization. We propose a modified dynamic programming algorithm to
achieve a global optimization. Our method can provide consistent estimation
results. In a comprehensive simulation study, our method shows an improved
performance when it is compared to the recursive combination/partition
procedures. In practice, can be determined
based on a cross-validation procedure. As an application, we consider the
well-known Pima Indian Diabetes data. We explore the relationship among the
diabetes risk and several important variables including the plasma glucose
concentration, body mass index and age.
Collapse
Affiliation(s)
- Yinglei Lai
- Department of Statistics and Biostatistics Center, The George Washington University, Washington, DC, United States of America.
| |
Collapse
|
5
|
Kumar N, Rehrauer H, Cai H, Baudis M. CDCOCA: a statistical method to define complexity dependence of co-occuring chromosomal aberrations. BMC Med Genomics 2011; 4:21. [PMID: 21371302 PMCID: PMC3061884 DOI: 10.1186/1755-8794-4-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Accepted: 03/03/2011] [Indexed: 11/29/2022] Open
Abstract
Background Copy number alterations (CNA) play a key role in cancer development and progression. Since more than one CNA can be detected in most tumors, frequently co-occurring genetic CNA may point to cooperating cancer related genes. Existing methods for co-occurrence evaluation so far have not considered the overall heterogeneity of CNA per tumor, resulting in a preferential detection of frequent changes with limited specificity for each association due to the high genetic instability of many samples. Method We hypothesize that in cancer some linkage-independent CNA may display a non-random co-occurrence, and that these CNA could be of pathogenetic relevance for the respective cancer. We also hypothesize that the statistical relevance of co-occurring CNA may depend on the sample specific CNA complexity. We verify our hypotheses with a simulation based algorithm CDCOCA (complexity dependence of co-occurring chromosomal aberrations). Results Application of CDCOCA to example data sets identified co-occurring CNA from low complex background which otherwise went unnoticed. Identification of cancer associated genes in these co-occurring changes can provide insights of cooperative genes involved in oncogenesis. Conclusions We have developed a method to detect associations of regional copy number abnormalities in cancer data. Along with finding statistically relevant CNA co-occurrences, our algorithm points towards a generally low specificity for co-occurrence of regional imbalances in CNA rich samples, which may have negative impact on pathway modeling approaches relying on frequent CNA events.
Collapse
Affiliation(s)
- Nitin Kumar
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, Switzerland
| | | | | | | |
Collapse
|
6
|
Zhang NR, Siegmund DO, Ji H, Li JZ. Detecting simultaneous changepoints in multiple sequences. Biometrika 2010; 97:631-645. [PMID: 22822250 DOI: 10.1093/biomet/asq025] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
Collapse
Affiliation(s)
- Nancy R Zhang
- Department of Statistics , Stanford University , 390 Serra Mall, Stanford, California 94305-4065 , U.S.A.
| | | | | | | |
Collapse
|
7
|
van de Wiel MA, Picard F, van Wieringen WN, Ylstra B. Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief Bioinform 2010; 12:10-21. [PMID: 20172948 DOI: 10.1093/bib/bbq004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analysis of DNA copy number profiles requires methods tailored to the specific nature of these data. The number of available data analysis methods has grown enormously in the last 5 years. We discuss the typical characteristics of DNA copy number data, as measured by microarray technology and review the extensive literature on preprocessing methods such as segmentation and calling. Subsequently, the focus narrows to applications of DNA copy number in cancer, in particular, several downstream analyses of multi-sample data sets such as testing, clustering and classification. Finally, we look ahead: what should we prepare for and which methodology-related topics may deserve attention in the near future?
Collapse
Affiliation(s)
- Mark A van de Wiel
- Department of Epidemiology & Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
8
|
Rueda OM, Diaz-Uriarte R. Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously. BMC Bioinformatics 2009; 10:308. [PMID: 19775444 PMCID: PMC2760535 DOI: 10.1186/1471-2105-10-308] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2009] [Accepted: 09/23/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Alterations in the number of copies of genomic DNA that are common or recurrent among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for common or recurrent CNA regions that affect only some subsets of the samples (without knowledge of the regions and subsets affected), but this is neglected by most methods. RESULTS We have developed two methods to define recurrent CNA regions from aCGH data. Our methods are unique and qualitatively different from existing approaches: they detect regions over both the complete set of arrays and alterations that are common only to some subsets of the samples (i.e., alterations that might characterize previously unknown groups); they use probabilities of alteration as input and return probabilities of being a common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and that researchers can modify, as needed, thresholds of immediate interpretability and develop custom statistics to answer specific research questions. CONCLUSION These methods represent a qualitative advance in the location of recurrent CNA regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods developed can become important tools in the search for genomic regions harboring disease-critical genes.
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
- Breast Cancer Functional Genomics, Cancer Research UK, Cambridge, UK
| | - Ramon Diaz-Uriarte
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Melchor Fernández Almagro 3, 28029 Madrid, Spain
| |
Collapse
|
9
|
Liu J, Bandyopadhyay N, Ranka S, Baudis M, Kahveci T. Inferring progression models for CGH data. ACTA ACUST UNITED AC 2009; 25:2208-15. [PMID: 19528087 DOI: 10.1093/bioinformatics/btp365] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION One of the mutational processes that has been monitored genome-wide is the occurrence of regional DNA copy number alterations (CNAs), which may lead to deletion or over-expression of tumor suppressors or oncogenes, respectively. Understanding the relationship between CNAs and different cancer types is a fundamental problem in cancer studies. RESULTS This article develops an efficient method that can accurately model the progression of the cancer markers and reconstruct evolutionary relationship between multiple types of cancers using comparative genomic hybridization (CGH) data. Such modeling can lead to better understanding of the commonalities and differences between multiple cancer types and potential therapies. We have developed an automatic method to infer a graph model for the markers of multiple cancers from a large population of CGH data. Our method identifies highly related markers across different cancer types. It then builds a directed acyclic graph that shows the evolutionary history of these markers based on how common each marker is in different cancer types. We demonstrated the use of this model in determining the importance of markers in cancer evolution. We have also developed a new method to measure the evolutionary distance between different cancers based on their markers. This method employs the graph model we developed for the individual markers to measure the distance between pairs of cancers. We used this measure to create an evolutionary tree for multiple cancers. Our experiments on Progenetix database show that our markers are largely consistent to the reported hot-spot imbalances and most frequent imbalances. The results show that our distance measure can accurately reconstruct the evolutionary relationship between multiple cancer types.
Collapse
Affiliation(s)
- Jun Liu
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | | | | | | | | |
Collapse
|
10
|
Baudis M. Genomic imbalances in 5918 malignant epithelial tumors: an explorative meta-analysis of chromosomal CGH data. BMC Cancer 2007; 7:226. [PMID: 18088415 PMCID: PMC2225423 DOI: 10.1186/1471-2407-7-226] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2007] [Accepted: 12/18/2007] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Chromosomal abnormalities have been associated with most human malignancies, with gains and losses on some genomic regions associated with particular entities. METHODS Of the 15429 cases collected for the Progenetix molecular-cytogenetic database, 5918 malignant epithelial neoplasias analyzed by chromosomal Comparative Genomic Hybridization (CGH) were selected for further evaluation. For the 22 clinico-pathological entities with more than 50 cases, summary profiles for genomic imbalances were generated from case specific data and analyzed. RESULTS With large variation in overall genomic instability, recurring genomic gains and losses were prominent. Most entities showed frequent gains involving 8q2, while gains on 20q, 1q, 3q, 5p, 7q and 17q were frequent in different entities. Loss "hot spots" included 3p, 4q, 13q, 17p and 18q among others. Related average imbalance patterns were found for clinically distinct entities, e.g. hepatocellular carcinomas (ca.) and ductal breast ca., as well as for histologically related entities (squamous cell ca. of different sites). CONCLUSION Although considerable case-by-case variation of genomic profiles can be found by CGH in epithelial malignancies, a limited set of variously combined chromosomal imbalances may be typical for carcinogenesis. Focus on the respective regions should aid in target gene detection and pathway deduction.
Collapse
Affiliation(s)
- Michael Baudis
- Institute of Molecular Biology, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Germany.
| |
Collapse
|