1
|
Roman T, Xie L, Schwartz R. Automated deconvolution of structured mixtures from heterogeneous tumor genomic data. PLoS Comput Biol 2017; 13:e1005815. [PMID: 29059177 PMCID: PMC5695636 DOI: 10.1371/journal.pcbi.1005815] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Revised: 11/02/2017] [Accepted: 10/10/2017] [Indexed: 11/23/2022] Open
Abstract
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix.
Collapse
Affiliation(s)
- Theodore Roman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Lu Xie
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Biological Sciences Department, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
2
|
Roman T, Nayyeri A, Fasy BT, Schwartz R. A simplicial complex-based approach to unmixing tumor progression data. BMC Bioinformatics 2015; 16:254. [PMID: 26264682 PMCID: PMC4534068 DOI: 10.1186/s12859-015-0694-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 08/03/2015] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set. RESULTS We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set. CONCLUSIONS Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
Collapse
Affiliation(s)
- Theodore Roman
- Computatational Biology Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| | - Amir Nayyeri
- Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| | - Brittany Terese Fasy
- Department of Computer Science, Tulane University, 6834 St. Charles St., New Orleans, USA.
| | - Russell Schwartz
- Computatational Biology Department, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA. .,Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, USA.
| |
Collapse
|
3
|
Pennington G, Smith CA, Shackney S, Schwartz R. RECONSTRUCTING TUMOR PHYLOGENIES FROM HETEROGENEOUS SINGLE-CELL DATA. J Bioinform Comput Biol 2011; 5:407-27. [PMID: 17589968 DOI: 10.1142/s021972000700259x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Revised: 12/03/2006] [Accepted: 12/11/2006] [Indexed: 01/08/2023]
Abstract
Studies of gene expression in cancerous tumors have revealed that tumors presenting indistinguishable symptoms in the clinic can be substantially different entities at the molecular level. The ability to distinguish between these genetically distinct cancers will make possible more accurate prognoses and more finely targeted therapeutics, provided we can characterize commonly occurring cancer sub-types and the specific molecular abnormalities that produce them. We develop a new method for identifying these common tumor progression pathways by applying phylogeny inference algorithms to single-cell assays, taking advantage of information on tumor heterogeneity lost to prior microarray-based approaches. We combine this approach with expectation maximization to infer unknown parameters used in the phylogeny construction. We further develop new algorithms to merge inferred trees across different assays. We validate the expectation maximization method on simulated data and demonstrate the combined approach on a set of fluorescent in situ hybridization (FISH) data measuring cell-by-cell gene and chromosome copy numbers in a large sample of breast cancers. The results further validate the proposed computational methods by showing consistency with several previous findings on these cancers and provide novel insights into the mechanisms of tumor progression in these patients.
Collapse
Affiliation(s)
- Gregory Pennington
- Computer Science Department, Carnegie Mellon University, 4400 Fifth Ave., Pittsburgh, PA 15213, USA.
| | | | | | | |
Collapse
|
4
|
Zeng X, Shaikh FY, Harrison MK, Adon AM, Trimboli AJ, Carroll KA, Sharma N, Timmers C, Chodosh LA, Leone G, Saavedra HI. The Ras oncogene signals centrosome amplification in mammary epithelial cells through cyclin D1/Cdk4 and Nek2. Oncogene 2010; 29:5103-12. [PMID: 20581865 DOI: 10.1038/onc.2010.253] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Centrosome amplification (CA) contributes to carcinogenesis by generating aneuploidy. Elevated frequencies of CA in most benign breast lesions and primary tumors suggest a causative role for CA in breast cancers. Clearly, identifying which and how altered signal transduction pathways contribute to CA is crucial to breast cancer control. Although a causative and cooperative role for c-Myc and Ras in mammary tumorigenesis is well documented, their ability to generate CA during mammary tumor initiation remains unexplored. To answer that question, K-Ras(G12D) and c-Myc were induced in mouse mammary glands. Although CA was observed in mammary tumors initiated by c-Myc or K-Ras(G12D), it was detected only in premalignant mammary lesions expressing K-Ras(G12D). CA, both in vivo and in vitro, was associated with increased expression of the centrosome-regulatory proteins, cyclin D1 and Nek2. Abolishing the expression of cyclin D1, Cdk4 or Nek2 in MCF10A human mammary epithelial cells expressing H-Ras(G12V) abrogated Ras-induced CA, whereas silencing cyclin E1 or B2 had no effect. Thus, we conclude that CA precedes mammary tumorigenesis, and interfering with centrosome-regulatory targets suppresses CA.
Collapse
Affiliation(s)
- X Zeng
- Department of Radiation Oncology, Emory University School of Medicine, and Emory Winship Cancer Institute, Atlanta, GA 30322, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Tolliver D, Tsourakakis C, Subramanian A, Shackney S, Schwartz R. Robust unmixing of tumor states in array comparative genomic hybridization data. Bioinformatics 2010; 26:i106-14. [PMID: 20529894 PMCID: PMC2881397 DOI: 10.1093/bioinformatics/btq213] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data. RESULTS In the present work, we propose a new method to perform tumor mixture separation efficiently and robustly to an experimental error. The method builds on the prior geometric approach but uses a novel objective function allowing for robust fits that greatly reduces the sensitivity to noise and outliers. We further develop an efficient gradient optimization method to optimize this 'soft geometric unmixing' objective for measurements of tumor DNA copy numbers assessed by array comparative genomic hybridization (aCGH) data. We show, on a combination of semi-synthetic and real data, that the method yields fast and accurate separation of tumor states. CONCLUSIONS We have shown a novel objective function and optimization method for the robust separation of tumor sub-types from aCGH data and have shown that the method provides fast, accurate reconstruction of tumor states from mixed samples. Better solutions to this problem can be expected to improve our ability to accurately identify genetic abnormalities in primary tumor samples and to infer patterns of tumor evolution. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Tolliver
- Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, USA.
| | | | | | | | | |
Collapse
|
6
|
Schwartz R, Shackney SE. Applying unmixing to gene expression data for tumor phylogeny inference. BMC Bioinformatics 2010; 11:42. [PMID: 20089185 PMCID: PMC2823708 DOI: 10.1186/1471-2105-11-42] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 01/20/2010] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity. RESULTS The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development. CONCLUSIONS Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.
Collapse
Affiliation(s)
- Russell Schwartz
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA
| | - Stanley E Shackney
- Departments of Human Oncology and Human Genetics, Drexel University School of Medicine, Pittsburgh, PA USA
| |
Collapse
|
7
|
Day RS. Challenges of biological realism and validation in simulation-based medical education. Artif Intell Med 2006; 38:47-66. [PMID: 16621481 DOI: 10.1016/j.artmed.2006.01.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2005] [Revised: 01/20/2006] [Accepted: 01/20/2006] [Indexed: 11/27/2022]
Abstract
OVERVIEW Simulation, both physical and computer-based, has a rich history in support of medical education. Essentially all these efforts have been aimed at instilling concrete measurable skills, akin to vocational training. They present learners with choices, facilitating a degree of learning by doing. The sets of learner choices are usually limited, with choices clearly classified into "right" and "wrong". But much of medicine is not much like a multiple-choice test. The realm of choices is broad and not always easily converted to a short list. The "correct" answer is not always known by the experienced physician beforehand, sometimes not even after the die is cast and the future unfolds. Computer simulation of human disease and its treatment can in principle be tremendously useful in the education of both basic and clinical scientists. This paper describes some challenges in the construction of simulation-based "liberal arts" biomedical education. OBJECTIVES The educator attempting to develop a learning environment based on simulation of biology faces some special challenges. The challenges addressed in this paper are: face validity and deep validity; finding the right degree of realism; authoring biomedical models efficiently; managing randomness. To illustrate the issues, we trace the history of the Oncology Thinking Cap throughout several versions and expansions of educational objectives, and describe the detection and remediation of shortcomings related to these issues. DESIGN Dealing effectively with issues of validity and realism can be accomplished if the acquisition of information driving and justifying the model development choices is documented, preferably automatically, during the process. Efficiency in authoring is greatly enhanced by judicious modularity to encourage re-use, and by the use of templated statements rather than raw code or exotic graphical components to represent the instructions driving the model. Randomness can be used to familiarize learners with the true relative proportions of types of cases, or to enrich the encountered cases with rarer but more instructive cases. When a learner repeats an encounter with a scenario while changing a single option, proper management of randomness is essential to avoid artifacts of random number generators. Otherwise an outcome change caused by a shift in random number streams may masquerade as an outcome change due to the changed option. CONCLUSION Effective use of computer simulation of human disease and its treatment for biomedical education faces daunting obstacles, but these problems can be solved.
Collapse
Affiliation(s)
- Roger S Day
- Department of Biostatistics and Center for Biomedical Informatics, University of Pittsburgh, PA 15213, USA.
| |
Collapse
|
8
|
Shackney S, Emlet DR, Pollice A, Smith C, Brown K, Kociban D. Guidelines for improving the reproducibility of quantitative multiparameter immunofluorescence measurements by laser scanning cytometry on fixed cell suspensions from human solid tumors. CYTOMETRY PART B-CLINICAL CYTOMETRY 2006; 70:10-9. [PMID: 16342079 DOI: 10.1002/cyto.b.20084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND Laser scanning Cytometry (LSC) is a versatile technology that makes it possible to perform multiple measurements on individual cells and correlate them cell by cell with other cellular features. It would be highly desirable to be able to perform reproducible, quantitative, correlated cell-based immunofluorescence studies on individual cells from human solid tumors. However, such studies can be challenging because of the presence of large numbers of cell aggregates and other confounding factors. Techniques have been developed to deal with cell aggregates in data sets collected by LSC. Experience has also been gained in addressing other key technical and methodological issues that can affect the reproducibility of such cell-based immunofluorescence measurements. METHODS AND RESULTS We describe practical aspects of cell sample collection, cell fixation and staining, protocols for performing multiparameter immunofluorescence measurements by LSC, use of controls and reference samples, and approaches to data analysis that we have found useful in improving the accuracy and reproducibility of LSC data obtained in human tumor samples. We provide examples of the potential advantages of LSC in examining quantitative aspects of cell-based analysis. Improvements in the quality of cell-based multiparameter immunofluorescence measurements make it possible to extract useful information from relatively small numbers of cells. This, in turn, permits the performance of multiple multicolor panels on each tumor sample. With links among the different panels that are provided by overlapping measurements, it is possible to develop increasingly more extensive profiles of intracellular expression of multiple proteins in clinical samples of human solid tumors. Examples of such linked panels of measurements are provided. CONCLUSIONS Advances in methodology can improve cell-based multiparameter immunofluorescence measurements on cell suspensions from human solid tumors by LSC for use in prognostic and predictive clinical applications.
Collapse
Affiliation(s)
- Stanley Shackney
- Laboratory of Cancer Cell Biology and Genetics, Department of Human Oncology, Allegheny Singer Research Institute, Allegheny General Hospital, Pittsburgh, Pennsylvania 15212, USA.
| | | | | | | | | | | |
Collapse
|