1
|
Fisher TB, Saini G, Rekha TS, Krishnamurthy J, Bhattarai S, Callagy G, Webber M, Janssen EAM, Kong J, Aneja R. Digital image analysis and machine learning-assisted prediction of neoadjuvant chemotherapy response in triple-negative breast cancer. Breast Cancer Res 2024; 26:12. [PMID: 38238771 PMCID: PMC10797728 DOI: 10.1186/s13058-023-01752-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/11/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30-40% of TNBC patients treated with neoadjuvant chemotherapy (NAC) show pCR, while the remaining 60-70% show residual disease (RD). The role of the tumor microenvironment in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response. METHODS H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) of the model development cohort and 79 patients (41 with pCR and 38 with RD) of the validation cohort were separated through a stratified eightfold cross-validation strategy for the first step and leave-one-out cross-validation strategy for the second step. A tile-level histology label prediction pipeline and four machine-learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction. RESULTS The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy of the model development cohort. The model was validated with an independent cohort with tile histology validation accuracy of 83.59% and NAC prediction accuracy of 81.01%. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD. CONCLUSION Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.
Collapse
Affiliation(s)
- Timothy B Fisher
- Department of Biology, Georgia State University, Atlanta, GA, 30302, USA
| | - Geetanjali Saini
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - T S Rekha
- JSSAHER (JSS Academy of Higher Education and Research) Medical College, Mysuru, Karnataka, India
| | - Jayashree Krishnamurthy
- JSSAHER (JSS Academy of Higher Education and Research) Medical College, Mysuru, Karnataka, India
| | - Shristi Bhattarai
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Grace Callagy
- Discipline of Pathology, University of Galway, Galway, Ireland
| | - Mark Webber
- Discipline of Pathology, University of Galway, Galway, Ireland
| | - Emiel A M Janssen
- Department of Pathology, Stavanger University Hospital, Stavanger, Norway
- Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, Stavanger, Norway
| | - Jun Kong
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, 30303, USA.
| | - Ritu Aneja
- Department of Biology, Georgia State University, Atlanta, GA, 30302, USA.
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
| |
Collapse
|
2
|
Fisher TB, Saini G, Ts R, Krishnamurthy J, Bhattarai S, Callagy G, Webber M, Janssen EAM, Kong J, Aneja R. Digital image analysis and machine learning-assisted prediction of neoadjuvant chemotherapy response in triple-negative breast cancer. RESEARCH SQUARE 2023:rs.3.rs-3243195. [PMID: 37645881 PMCID: PMC10462230 DOI: 10.21203/rs.3.rs-3243195/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Background Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30-40% of TNBC patients treated with neoadjuvant chemotherapy (NAC) show pCR, while the remaining 60-70% show residual disease (RD). The role of the tumor microenvironment (TME) in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response. Methods H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) were separated through a stratified 8-fold cross validation strategy for the first step and leave one out cross validation strategy for the second step. A tile-level histology label prediction pipeline and four machine learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction. Results The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD. Conclusion Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.
Collapse
Affiliation(s)
| | | | - Rekha Ts
- JSSAHER (JSS Academy of Higher Education and Research) Medical College
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Lapierre-Landry M, Liu Z, Ling S, Bayat M, Wilson DL, Jenkins MW. Nuclei Detection for 3D Microscopy With a Fully Convolutional Regression Network. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:60396-60408. [PMID: 35024261 PMCID: PMC8751907 DOI: 10.1109/access.2021.3073894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Advances in three-dimensional microscopy and tissue clearing are enabling whole-organ imaging with single-cell resolution. Fast and reliable image processing tools are needed to analyze the resulting image volumes, including automated cell detection, cell counting and cell analytics. Deep learning approaches have shown promising results in two- and three-dimensional nuclei detection tasks, however detecting overlapping or non-spherical nuclei of different sizes and shapes in the presence of a blurring point spread function remains challenging and often leads to incorrect nuclei merging and splitting. Here we present a new regression-based fully convolutional network that located a thousand nuclei centroids with high accuracy in under a minute when combined with V-net, a popular three-dimensional semantic-segmentation architecture. High nuclei detection F1-scores of 95.3% and 92.5% were obtained in two different whole quail embryonic hearts, a tissue type difficult to segment because of its high cell density, and heterogeneous and elliptical nuclei. Similar high scores were obtained in the mouse brain stem, demonstrating that this approach is highly transferable to nuclei of different shapes and intensities. Finally, spatial statistics were performed on the resulting centroids. The spatial distribution of nuclei obtained by our approach most resembles the spatial distribution of manually identified nuclei, indicating that this approach could serve in future spatial analyses of cell organization.
Collapse
Affiliation(s)
- Maryse Lapierre-Landry
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Zexuan Liu
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Shan Ling
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mahdi Bayat
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA
| | - David L Wilson
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Radiology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Michael W Jenkins
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Pediatrics, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
4
|
Smolinska A, Engel J, Szymanska E, Buydens L, Blanchet L. General Framing of Low-, Mid-, and High-Level Data Fusion With Examples in the Life Sciences. DATA HANDLING IN SCIENCE AND TECHNOLOGY 2019. [DOI: 10.1016/b978-0-444-63984-4.00003-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
5
|
Neagu AN. Proteome Imaging: From Classic to Modern Mass Spectrometry-Based Molecular Histology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1140:55-98. [PMID: 31347042 DOI: 10.1007/978-3-030-15950-4_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In order to overcome the limitations of classic imaging in Histology during the actually era of multiomics, the multi-color "molecular microscope" by its emerging "molecular pictures" offers quantitative and spatial information about thousands of molecular profiles without labeling of potential targets. Healthy and diseased human tissues, as well as those of diverse invertebrate and vertebrate animal models, including genetically engineered species and cultured cells, can be easily analyzed by histology-directed MALDI imaging mass spectrometry. The aims of this review are to discuss a range of proteomic information emerging from MALDI mass spectrometry imaging comparative to classic histology, histochemistry and immunohistochemistry, with applications in biology and medicine, concerning the detection and distribution of structural proteins and biological active molecules, such as antimicrobial peptides and proteins, allergens, neurotransmitters and hormones, enzymes, growth factors, toxins and others. The molecular imaging is very well suited for discovery and validation of candidate protein biomarkers in neuroproteomics, oncoproteomics, aging and age-related diseases, parasitoproteomics, forensic, and ecotoxicology. Additionally, in situ proteome imaging may help to elucidate the physiological and pathological mechanisms involved in developmental biology, reproductive research, amyloidogenesis, tumorigenesis, wound healing, neural network regeneration, matrix mineralization, apoptosis and oxidative stress, pain tolerance, cell cycle and transformation under oncogenic stress, tumor heterogeneity, behavior and aggressiveness, drugs bioaccumulation and biotransformation, organism's reaction against environmental penetrating xenobiotics, immune signaling, assessment of integrity and functionality of tissue barriers, behavioral biology, and molecular origins of diseases. MALDI MSI is certainly a valuable tool for personalized medicine and "Eco-Evo-Devo" integrative biology in the current context of global environmental challenges.
Collapse
Affiliation(s)
- Anca-Narcisa Neagu
- Laboratory of Animal Histology, Faculty of Biology, "Alexandru Ioan Cuza" University of Iasi, Iasi, Romania.
| |
Collapse
|
6
|
Lavatera critica, a green leafy vegetable, controls high fat diet induced hepatic lipid accumulation and oxidative stress through the regulation of lipogenesis and lipolysis genes. Biomed Pharmacother 2017; 96:1349-1357. [DOI: 10.1016/j.biopha.2017.11.072] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 11/09/2017] [Accepted: 11/10/2017] [Indexed: 12/11/2022] Open
|
7
|
Roy S, Yun D, Madahian B, Berry MW, Deng LY, Goldowitz D, Homayouni R. Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts. Front Bioeng Biotechnol 2017; 5:48. [PMID: 28894735 PMCID: PMC5581332 DOI: 10.3389/fbioe.2017.00048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/31/2017] [Indexed: 01/09/2023] Open
Abstract
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.
Collapse
Affiliation(s)
- Sujoy Roy
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States
| | - Daqing Yun
- Computer and Information Sciences Program, Harrisburg University of Science and Technology, Harrisburg, PA, United States
| | - Behrouz Madahian
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Michael W Berry
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, United States
| | - Lih-Yuan Deng
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Daniel Goldowitz
- Center for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| | - Ramin Homayouni
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States.,Department of Biological Sciences, University of Memphis, Memphis, TN, United States
| |
Collapse
|
8
|
Papalexakis EE, Faloutsos C, Sidiropoulos ND. Tensors for Data Mining and Data Fusion. ACM T INTEL SYST TEC 2017. [DOI: 10.1145/2915921] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Tensors and tensor decompositions are very powerful and versatile tools that can model a wide variety of heterogeneous, multiaspect data. As a result, tensor decompositions, which extract useful latent information out of multiaspect data tensors, have witnessed increasing popularity and adoption by the data mining community. In this survey, we present some of the most widely used tensor decompositions, providing the key insights behind them, and summarizing them from a practitioner’s point of view. We then provide an overview of a very broad spectrum of applications where tensors have been instrumental in achieving state-of-the-art performance, ranging from social network analysis to brain data analysis, and from web mining to healthcare. Subsequently, we present recent algorithmic advances in scaling tensor decompositions up to today’s big data, outlining the existing systems and summarizing the key ideas behind them. Finally, we conclude with a list of challenges and open problems that outline exciting future research directions.
Collapse
|
9
|
Papalexakis EE, Faloutsos C, Mitchell TM, Talukdar PP, Sidiropoulos ND, Murphy B. Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications. Stat Anal Data Min 2016; 9:269-290. [PMID: 27672406 DOI: 10.1002/sam.11315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, 'friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.
Collapse
|
10
|
Han J, Fontenay GV, Wang Y, Mao JH, Chang H. PHENOTYPIC CHARACTERIZATION OF BREAST INVASIVE CARCINOMA VIA TRANSFERABLE TISSUE MORPHOMETRIC PATTERNS LEARNED FROM GLIOBLASTOMA MULTIFORME. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2016; 2016:1025-1028. [PMID: 27390615 DOI: 10.1109/isbi.2016.7493440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Quantitative analysis of whole slide images (WSIs) in a large cohort may provide predictive models of clinical outcome. However, the performance of the existing techniques is hindered as a result of large technical variations (e.g., fixation, staining) and biological heterogeneities (e.g., cell type, cell state) that are always present in a large cohort. Although unsupervised feature learning provides a promising way in learning pertinent features without human intervention, its capability can be greatly limited due to the lack of well-curated examples. In this paper, we explored the transferability of knowledge acquired from a well-curated Glioblastoma Multiforme (GBM) dataset through its application to the representation and characterization of tissue histology from the Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma (BRCA) cohort. Our experimental results reveals two major phenotypic subtypes with statistically significantly different survival curves. Further differential expression analysis of these two subtypes indicates enrichment of genes regulated by NF-kB in response to TNF and genes up-regulated in response to IFNG.
Collapse
Affiliation(s)
- Ju Han
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno, Nevada, USA
| | - Gerald V Fontenay
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Yunfu Wang
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA; Department of Neurology, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei, China
| | - Jian-Hua Mao
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Hang Chang
- Department of Electrical and Biomedical Engineering, University of Nevada, Reno, Nevada, USA; Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| |
Collapse
|
11
|
Acar E, Lawaetz AJ, Rasmussen MA, Bro R. Structure-revealing data fusion model with applications in metabolomics. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2013:6023-6. [PMID: 24111112 DOI: 10.1109/embc.2013.6610925] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In many disciplines, data from multiple sources are acquired and jointly analyzed for enhanced knowledge discovery. For instance, in metabolomics, different analytical techniques are used to measure biological fluids in order to identify the chemicals related to certain diseases. It is widely-known that, some of these analytical methods, e.g., LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) spectroscopy, provide complementary data sets and their joint analysis may enable us to capture a larger proportion of the complete metabolome belonging to a specific biological system. Fusing data from multiple sources has proved useful in many fields including bioinformatics, signal processing and social network analysis. However, identification of common (shared) and individual (unshared) structures across multiple data sets remains a major challenge in data fusion studies. With a goal of addressing this challenge, we propose a novel unsupervised data fusion model. Our contributions are two-fold: (i) We formulate a data fusion model based on joint factorization of matrices and higher-order tensors, which can automatically reveal common and individual components. (ii) We demonstrate that the proposed approach provides promising results in joint analysis of metabolomics data sets consisting of fluorescence and NMR measurements of plasma samples in terms of separation of colorectal cancer patients from controls.
Collapse
|
12
|
Chang H, Zhou Y, Borowsky A, Barner K, Spellman P, Parvin B. Stacked Predictive Sparse Decomposition for Classification of Histology Sections. Int J Comput Vis 2014; 113:3-18. [PMID: 27721567 DOI: 10.1007/s11263-014-0790-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Image-based classification of histology sections, in terms of distinct components (e.g., tumor, stroma, normal), provides a series of indices for histology composition (e.g., the percentage of each distinct components in histology sections), and enables the study of nuclear properties within each component. Furthermore, the study of these indices, constructed from each whole slide image in a large cohort, has the potential to provide predictive models of clinical outcome. For example, correlations can be established between the constructed indices and the patients' survival information at cohort level, which is a fundamental step towards personalized medicine. However, performance of the existing techniques is hindered as a result of large technical variations (e.g., variations of color/textures in tissue images due to non-standard experimental protocols) and biological heterogeneities (e.g., cell type, cell state) that are always present in a large cohort. We propose a system that automatically learns a series of dictionary elements for representing the underlying spatial distribution using stacked predictive sparse decomposition. The learned representation is then fed into the spatial pyramid matching framework with a linear support vector machine classifier. The system has been evaluated for classification of distinct histological components for two cohorts of tumor types. Throughput has been increased by using of graphical processing unit (GPU), and evaluation indicates a superior performance results, compared with previous research.
Collapse
Affiliation(s)
- Hang Chang
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Yin Zhou
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | | | - Paul Spellman
- Center for Spatial Systems Biomedicine, Oregon Health Sciences University, Portland, ON, USA
| | - Bahram Parvin
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
13
|
Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, Bro R. Structure-revealing data fusion. BMC Bioinformatics 2014; 15:239. [PMID: 25015427 PMCID: PMC4117975 DOI: 10.1186/1471-2105-15-239] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 06/26/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
Collapse
Affiliation(s)
- Evrim Acar
- Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark.
| | | | | | | | | | | | | |
Collapse
|
14
|
Papalexakis EE, Faloutsos C, Mitchell TM, Talukdar PP, Sidiropoulos ND, Murphy B. Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2014; 2014:118-126. [PMID: 26473087 PMCID: PMC4603425 DOI: 10.1137/1.9781611973440.14] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200×, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline. We apply TURBO-SMT to BRAINQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. TURBO-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.
Collapse
|
15
|
Chang H, Zhou Y, Spellman P, Parvin B. Stacked Predictive Sparse Coding for Classification of Distinct Regions of Tumor Histopathology. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION 2013:169-176. [PMID: 24770492 DOI: 10.1109/iccv.2013.28] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Image-based classification of tissue histology, in terms of distinct histopathology (e.g., tumor or necrosis regions), provides a series of indices for tumor composition. Furthermore, aggregation of these indices from each whole slide image (WSI) in a large cohort can provide predictive models of clinical outcome. However, the performance of the existing techniques is hindered as a result of large technical variations (e.g., fixation, staining) and biological heterogeneities (e.g., cell type, cell state) that are always present in a large cohort. We suggest that, compared with human engineered features widely adopted in existing systems, unsupervised feature learning is more tolerant to batch effect (e.g., technical variations associated with sample preparation) and pertinent features can be learned without user intervention. This leads to a novel approach for classification of tissue histology based on unsupervised feature learning and spatial pyramid matching (SPM), which utilize sparse tissue morphometric signatures at various locations and scales. This approach has been evaluated on two distinct datasets consisting of different tumor types collected from The Cancer Genome Atlas (TCGA), and the experimental results indicate that the proposed approach is (i) extensible to different tumor types; (ii) robust in the presence of wide technical variations and biological heterogeneities; and (iii) scalable with varying training sample sizes.
Collapse
Affiliation(s)
- Hang Chang
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, U.S.A
| | - Yin Zhou
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, U.S.A
| | - Paul Spellman
- Center for Spatial Systems Biomedicine, Oregon Health Sciences University, Portland, Oregon, U.S.A
| | - Bahram Parvin
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, U.S.A
| |
Collapse
|
16
|
Andersen MLM, Rasmussen MA, Pörksen S, Svensson J, Vikre-Jørgensen J, Thomsen J, Hertel NT, Johannesen J, Pociot F, Petersen JS, Hansen L, Mortensen HB, Nielsen LB. Complex multi-block analysis identifies new immunologic and genetic disease progression patterns associated with the residual β-cell function 1 year after diagnosis of type 1 diabetes. PLoS One 2013; 8:e64632. [PMID: 23755131 PMCID: PMC3674006 DOI: 10.1371/journal.pone.0064632] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 04/16/2013] [Indexed: 02/07/2023] Open
Abstract
The purpose of the present study is to explore the progression of type 1 diabetes (T1D) in Danish children 12 months after diagnosis using Latent Factor Modelling. We include three data blocks of dynamic paraclinical biomarkers, baseline clinical characteristics and genetic profiles of diabetes related SNPs in the analyses. This method identified a model explaining 21.6% of the total variation in the data set. The model consists of two components: (1) A pattern of declining residual β-cell function positively associated with young age, presence of diabetic ketoacidosis and long duration of disease symptoms (P = 0.0004), and with risk alleles of WFS1, CDKN2A/2B and RNLS (P = 0.006). (2) A second pattern of high ZnT8 autoantibody levels and low postprandial glucagon levels associated with risk alleles of IFIH1, TCF2, TAF5L, IL2RA and PTPN2 and protective alleles of ERBB3 gene (P = 0.0005). These results demonstrate that Latent Factor Modelling can identify associating patterns in clinical prospective data – future functional studies will be needed to clarify the relevance of these patterns.
Collapse
Affiliation(s)
- Marie Louise Max Andersen
- Department of Pediatrics, Herlev Hospital, Faculty of Health Science, University of Copenhagen, Copenhagen, Denmark.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Chang H, Han J, Borowsky A, Loss L, Gray JW, Spellman PT, Parvin B. Invariant delineation of nuclear architecture in glioblastoma multiforme for clinical and molecular association. IEEE TRANSACTIONS ON MEDICAL IMAGING 2013; 32:670-82. [PMID: 23221815 PMCID: PMC3728287 DOI: 10.1109/tmi.2012.2231420] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Automated analysis of whole mount tissue sections can provide insights into tumor subtypes and the underlying molecular basis of neoplasm. However, since tumor sections are collected from different laboratories, inherent technical and biological variations impede analysis for very large datasets such as The Cancer Genome Atlas (TCGA). Our objective is to characterize tumor histopathology, through the delineation of the nuclear regions, from hematoxylin and eosin (H&E) stained tissue sections. Such a representation can then be mined for intrinsic subtypes across a large dataset for prediction and molecular association. Furthermore, nuclear segmentation is formulated within a multi-reference graph framework with geodesic constraints, which enables computation of multidimensional representations, on a cell-by-cell basis, for functional enrichment and bioinformatics analysis. Here, we present a novel method, multi-reference graph cut (MRGC), for nuclear segmentation that overcomes technical variations associated with sample preparation by incorporating prior knowledge from manually annotated reference images and local image features. The proposed approach has been validated on manually annotated samples and then applied to a dataset of 377 Glioblastoma Multiforme (GBM) whole slide images from 146 patients. For the GBM cohort, multidimensional representation of the nuclear features and their organization have identified 1) statistically significant subtypes based on several morphometric indexes, 2) whether each subtype can be predictive or not, and 3) that the molecular correlates of predictive subtypes are consistent with the literature. Data and intermediaries for a number of tumor types (GBM, low grade glial, and kidney renal clear carcinoma) are available at: http://tcga.lbl.gov for correlation with TCGA molecular data. The website also provides an interface for panning and zooming of whole mount tissue sections with/without overlaid segmentation results for quality control.
Collapse
Affiliation(s)
- Hang Chang
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720 U.S.A
| | - Ju Han
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720 U.S.A
| | - Alexander Borowsky
- Center for Comparative Medicine, University of California, Davis, California, 95616 U.S.A
| | - Leandro Loss
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720 U.S.A
| | - Joe W. Gray
- Center for Spatial Systems Biomedicine, Oregon Health Sciences University, Portland, Oregon, 97239 U.S.A
| | - Paul T. Spellman
- Center for Spatial Systems Biomedicine, Oregon Health Sciences University, Portland, Oregon, 97239 U.S.A
| | - Bahram Parvin
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720 U.S.A
| |
Collapse
|