26
|
Panayides AS, Amini A, Filipovic ND, Sharma A, Tsaftaris SA, Young A, Foran D, Do N, Golemati S, Kurc T, Huang K, Nikita KS, Veasey BP, Zervakis M, Saltz JH, Pattichis CS. AI in Medical Imaging Informatics: Current Challenges and Future Directions. IEEE J Biomed Health Inform 2020; 24:1837-1857. [PMID: 32609615 PMCID: PMC8580417 DOI: 10.1109/jbhi.2020.2991043] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine.
Collapse
|
27
|
Le H, Gupta R, Hou L, Abousamra S, Fassler D, Torre-Healy L, Moffitt RA, Kurc T, Samaras D, Batiste R, Zhao T, Rao A, Van Dyke AL, Sharma A, Bremer E, Almeida JS, Saltz J. Utilizing Automated Breast Cancer Detection to Identify Spatial Distributions of Tumor-Infiltrating Lymphocytes in Invasive Breast Cancer. THE AMERICAN JOURNAL OF PATHOLOGY 2020; 190:1491-1504. [PMID: 32277893 PMCID: PMC7369575 DOI: 10.1016/j.ajpath.2020.03.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/28/2020] [Accepted: 03/19/2020] [Indexed: 11/22/2022]
Abstract
Quantitative assessment of spatial relations between tumor and tumor-infiltrating lymphocytes (TIL) is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network analysis pipelines to generate combined maps of cancer regions and TILs in routine diagnostic breast cancer whole slide tissue images. The combined maps provide insight about the structural patterns and spatial distribution of lymphocytic infiltrates and facilitate improved quantification of TILs. Both tumor and TIL analyses were evaluated by using three convolutional neural network networks (34-layer ResNet, 16-layer VGG, and Inception v4); the results compared favorably with those obtained by using the best published methods. We have produced open-source tools and a public data set consisting of tumor/TIL maps for 1090 invasive breast cancer images from The Cancer Genome Atlas. The maps can be downloaded for further downstream analyses.
Collapse
|
28
|
Moore M, Friesner ID, Rizk EM, Trager M, Celebi JT, Rich J, Chikeka I, Kurc T, Wang J, Rohr B, Robinson E, Geskin LJ, Horst B, Gardner K, Niedt G, Messina J, Ferringer T, Saltz JH, Vanguri R, Saenger YM. Effect of automated TIL quantification in early-stage melanoma on accuracy of standard T staging using AJCC guidelines. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.10076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
10076 Background: Patients diagnosed with early stage melanoma are at risk of recurrence and death. Adjuvant therapy decreases risk but incurs toxicity and expense. While tumor-infiltrating lymphocytes (TILs) improve prognosis, studies have shown conflicting results due, at least in part, to inter-observer variability. Thus, TILs are not included in standard American Joint Committee on Cancer (AJCC) staging. Here, we quantitatively analyze TILs in hematoxylin and eosin (H&E) melanoma images using two machine learning algorithms. Methods: H&E images were evaluated by two methods for patients with resectable stage I-III melanoma from Columbia (N = 81) and validated using samples from Geisinger and Moffitt (N = 128). For both methods, H&E images were manually annotated using open source software, QuPath, to specify tumor regions. For Method A, images were divided into patches and, for each patch, a probability was generated to detect lymphocytes. Patches above a set threshold were considered to be “TIL positive”. Ratio of TIL positive patches to total patches was assessed for every image. For Method B, a classifier was manually trained in QuPath and then applied on each image to determine the ratio of the areas of all immune cells to all tumor cells as previously published. Cutoff values to define high and low risk groups were established based on a test set and then validated in an independent cohort. Results: Both methods distinguished patients with visceral recurrence from those without for the Columbia training set (Method A p = .0015, Method B p = .043). Using Method A, Kaplan-Meier curve at the selected cutoff also correlated significantly with disease specific survival (DSS) for Columbia (p = .022) and was validated in the Geisinger/Moffitt (p = .046) cohort. Cox analysis using Method A showed that TIL status predicted DSS in the validation set (p = .047) and added significantly to depth and ulceration (HR = 3.43, CI: 1.047-11.257, p = .042). Conclusions: Both open source machine learning algorithms find significantly higher TILs in patients who do not develop metastasis. Notably, Method A may add to standard predictors, such as depth and ulceration. These results demonstrate the promise of computational algorithms to enhance visual grading, and suggest that digital TIL evaluation may add to current AJCC staging. [Table: see text]
Collapse
|
29
|
Kurc T, Bakas S, Ren X, Bagari A, Momeni A, Huang Y, Zhang L, Kumar A, Thibault M, Qi Q, Wang Q, Kori A, Gevaert O, Zhang Y, Shen D, Khened M, Ding X, Krishnamurthi G, Kalpathy-Cramer J, Davis J, Zhao T, Gupta R, Saltz J, Farahani K. Segmentation and Classification in Digital Pathology for Glioma Research: Challenges and Deep Learning Approaches. Front Neurosci 2020; 14:27. [PMID: 32153349 PMCID: PMC7046596 DOI: 10.3389/fnins.2020.00027] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/10/2020] [Indexed: 12/12/2022] Open
Abstract
Biomedical imaging Is an important source of information in cancer research. Characterizations of cancer morphology at onset, progression, and in response to treatment provide complementary information to that gleaned from genomics and clinical data. Accurate extraction and classification of both visual and latent image features Is an increasingly complex challenge due to the increased complexity and resolution of biomedical image data. In this paper, we present four deep learning-based image analysis methods from the Computational Precision Medicine (CPM) satellite event of the 21st International Medical Image Computing and Computer Assisted Intervention (MICCAI 2018) conference. One method Is a segmentation method designed to segment nuclei in whole slide tissue images (WSIs) of adult diffuse glioma cases. It achieved a Dice similarity coefficient of 0.868 with the CPM challenge datasets. Three methods are classification methods developed to categorize adult diffuse glioma cases into oligodendroglioma and astrocytoma classes using radiographic and histologic image data. These methods achieved accuracy values of 0.75, 0.80, and 0.90, measured as the ratio of the number of correct classifications to the number of total cases, with the challenge datasets. The evaluations of the four methods indicate that (1) carefully constructed deep learning algorithms are able to produce high accuracy in the analysis of biomedical image data and (2) the combination of radiographic with histologic image information improves classification performance.
Collapse
|
30
|
Barreiros W, Moreira J, Kurc T, Kong J, Melo AC, Saltz JH, Teodoro G. Optimizing parameter sensitivity analysis of large-scale microscopy image analysis workflows with multilevel computation reuse. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2020; 32:e5403. [PMID: 32669980 PMCID: PMC7363336 DOI: 10.1002/cpe.5403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 05/18/2019] [Indexed: 06/11/2023]
Abstract
Parameter sensitivity analysis (SA) is an effective tool to gain knowledge about complex analysis applications and assess the variability in their analysis results. However, it is an expensive process as it requires the execution of the target application multiple times with a large number of different input parameter values. In this work, we propose optimizations to reduce the overall computation cost of SA in the context of analysis applications that segment high-resolution slide tissue images, ie, images with resolutions of 100k × 100k pixels. Two cost-cutting techniques are combined to efficiently execute SA: use of distributed hybrid systems for parallel execution and computation reuse at multiple levels of an analysis pipeline to reduce the amount of computation. These techniques were evaluated using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. Our parallel execution method attained an efficiency of over 90% on 256 nodes. The hybrid execution on the CPU and Intel Phi improved the performance by 2×. Multilevel computation reuse led to performance gains of over 2.9×.
Collapse
|
31
|
Prior F, Almeida J, Kathiravelu P, Kurc T, Smith K, Fitzgerald TJ, Saltz J. Open access image repositories: high-quality data to enable machine learning research. Clin Radiol 2020; 75:7-12. [PMID: 31040006 PMCID: PMC6815686 DOI: 10.1016/j.crad.2019.04.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 04/01/2019] [Indexed: 02/07/2023]
Abstract
Originally motivated by the need for research reproducibility and data reuse, large-scale, open access information repositories have become key resources for training and testing of advanced machine learning applications in biomedical and clinical research. To be of value, such repositories must provide large, high-quality data sets, where quality is defined as minimising variance due to data collection protocols and data misrepresentations. Curation is the key to quality. We have constructed a large public access image repository, The Cancer Imaging Archive, dedicated to the promotion of open science to advance the global effort to diagnose and treat cancer. Drawing on this experience and our experience in applying machine learning techniques to the analysis of radiology and pathology image data, we will review the requirements placed on such information repositories by state-of-the-art machine learning applications and how these requirements can be met.
Collapse
|
32
|
Kobayashi S, Le H, Chrastecka L, Gupta R, Hou L, Abousamra S, Fassler D, Shroyer KR, Samaras D, Kurc T, Moffitt RA, Saltz JH. Abstract A27: Deep learning for analysis of tumor-lymphocyte interactions in pancreatic ductal adenocarcinoma. Cancer Res 2019. [DOI: 10.1158/1538-7445.panca19-a27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The interaction of tumor, stroma, and immune cells in pancreatic ductal adenocarcinoma (PDAC) is complex and difficult to quantify in patient samples. Recently, deep learning algorithms have shown successes in identifying tumor and lymphocytes regions on whole-slide images derived from routinely collected histopathologic specimens. The Cancer Genome Atlas (TCGA) in particular has generated whole-slide images as well as paired molecular data, thus allowing for combined spatial and molecular analyses of tumor-lymphocyte interactions. We have previously highlighted this resource by computationally mapping tumor-infiltrating lymphocytes (TILs) on digital images across 13 tumor types. To achieve this, convolutional neural networks were trained on lymphocyte images annotated by expert pathologists and then used to detect spatial TIL patterns. This led to identification of four qualitative TIL pattern categories, which varied depending on tumor type as well as molecular immune subtype, demonstrating the potential of these spatial structures to provide further insights into tumor microenvironments and their relationship to overall survival. We have now extended this deep learning pipeline to include identification of tumor regions in PDAC, allowing study of TIL patterns in the context of their relative spatial localization to tumors. Using the deep learning algorithm to define the tumor region, we applied erosion and dilation operations to further capture the peritumoral region, the outer and inner regions of the tumor, as well as desmoplasia far from the tumor cells. We thus defined lymphocytes by their spatial localization as being internal, tumoral, peritumoral, or outer with these masks. We then used nearest-neighbor and density-based approaches to quantify TIL infiltration patterns with respect to tumor. These features vary significantly across the previously identified TIL patterns and may serve as additional parameters to define the microenvironment conditions in patient samples. Here we demonstrate that features extracted using our pipeline recapitulate canonical histologic properties. Using immune cell abundance estimates from gene expression generated by CIBERSORT, we find that samples with tumor TIL densities above median have more M1 macrophages, while those below median have more M2 macrophages. We also observe that slides with a higher peritumoral TIL density relative to tumoral TIL density have higher Treg fractions. Ongoing work on improving the resolution and cell specificity of our pipeline will allow us to ask more specific questions and permit higher granularity in linking clinical outcomes to spatial immune phenotypes.
Citation Format: Soma Kobayashi, Han Le, Lucie Chrastecka, Rajarsi Gupta, Le Hou, Shahira Abousamra, Danielle Fassler, Kenneth R. Shroyer, Dimitris Samaras, Tahsin Kurc, Richard A. Moffitt, Joel H. Saltz. Deep learning for analysis of tumor-lymphocyte interactions in pancreatic ductal adenocarcinoma [abstract]. In: Proceedings of the AACR Special Conference on Pancreatic Cancer: Advances in Science and Clinical Care; 2019 Sept 6-9; Boston, MA. Philadelphia (PA): AACR; Cancer Res 2019;79(24 Suppl):Abstract nr A27.
Collapse
|
33
|
Gupta R, Kurc T, Sharma A, Almeida JS, Saltz J. The Emergence of Pathomics. CURRENT PATHOBIOLOGY REPORTS 2019. [DOI: 10.1007/s40139-019-00200-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
34
|
Vu QD, Graham S, Kurc T, To MNN, Shaban M, Qaiser T, Koohbanani NA, Khurram SA, Kalpathy-Cramer J, Zhao T, Gupta R, Kwak JT, Rajpoot N, Saltz J, Farahani K. Methods for Segmentation and Classification of Digital Microscopy Tissue Images. Front Bioeng Biotechnol 2019; 7:53. [PMID: 31001524 PMCID: PMC6454006 DOI: 10.3389/fbioe.2019.00053] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/01/2019] [Indexed: 12/12/2022] Open
Abstract
High-resolution microscopy images of tissue specimens provide detailed information about the morphology of normal and diseased tissue. Image analysis of tissue morphology can help cancer researchers develop a better understanding of cancer biology. Segmentation of nuclei and classification of tissue images are two common tasks in tissue image analysis. Development of accurate and efficient algorithms for these tasks is a challenging problem because of the complexity of tissue morphology and tumor heterogeneity. In this paper we present two computer algorithms; one designed for segmentation of nuclei and the other for classification of whole slide tissue images. The segmentation algorithm implements a multiscale deep residual aggregation network to accurately segment nuclear material and then separate clumped nuclei into individual nuclei. The classification algorithm initially carries out patch-level classification via a deep learning method, then patch-level statistical and morphological features are used as input to a random forest regression model for whole slide image classification. The segmentation and classification algorithms were evaluated in the MICCAI 2017 Digital Pathology challenge. The segmentation algorithm achieved an accuracy score of 0.78. The classification algorithm achieved an accuracy score of 0.81. These scores were the highest in the challenge.
Collapse
|
35
|
Gomes J, Barreiros W, Kurc T, Melo ACMA, Kong J, Saltz JH, Teodoro G. Sensitivity analysis in digital pathology: Handling large number of parameters with compute expensive workflows. Comput Biol Med 2019; 108:371-381. [PMID: 31054503 DOI: 10.1016/j.compbiomed.2019.03.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 02/28/2019] [Accepted: 03/07/2019] [Indexed: 12/19/2022]
Abstract
Digital pathology imaging enables valuable quantitative characterizations of tissue state at the sub-cellular level. While there is a growing set of methods for analysis of whole slide tissue images, many of them are sensitive to changes in input parameters. Evaluating how analysis results are affected by variations in input parameters is important for the development of robust methods. Executing algorithm sensitivity analyses by systematically varying input parameters is an expensive task because a single evaluation run with a moderate number of tissue images may take hours or days. Our work investigates the use of Surrogate Models (SMs) along with parallel execution to speed up parameter sensitivity analysis (SA). This approach significantly reduces the SA cost, because the SM execution is inexpensive. The evaluation of several SM strategies with two image segmentation workflows demonstrates that a SA study with SMs attains results close to a SA with real application runs (mean absolute error lower than 0.022), while the SM accelerates the SA execution by 51 × . We also show that, although the number of parameters in the example workflows is high, most of the uncertainty can be associated with a few parameters. In order to identify the impact of variations in segmentation results to downstream analyses, we carried out a survival analysis with 387 Lung Squamous Cell Carcinoma cases. This analysis was repeated using 3 values for the most significant parameters identified by the SA for the two segmentation algorithms; about 600 million cell nuclei were segmented per run. The results show that significance of the survival correlations of patient groups, assessed by a logrank test, are strongly affected by the segmentation parameter changes. This indicates that sensitivity analysis is an important tool for evaluating the stability of conclusions from image analyses.
Collapse
|
36
|
Pantanowitz L, Sharma A, Carter AB, Kurc T, Sussman A, Saltz J. Twenty Years of Digital Pathology: An Overview of the Road Travelled, What is on the Horizon, and the Emergence of Vendor-Neutral Archives. J Pathol Inform 2018; 9:40. [PMID: 30607307 PMCID: PMC6289005 DOI: 10.4103/jpi.jpi_69_18] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 10/28/2018] [Indexed: 12/13/2022] Open
Abstract
Almost 20 years have passed since the commercial introduction of whole-slide imaging (WSI) scanners. During this time, the creation of various WSI devices with the ability to digitize an entire glass slide has transformed the field of pathology. Parallel advances in computational technology and storage have permitted rapid processing of large-scale WSI datasets. This article provides an overview of important past and present efforts related to WSI. An account of how the virtual microscope evolved from the need to visualize and manage satellite data for earth science applications is provided. The article also discusses important milestones beginning from the first WSI scanner designed by Bacus to the Food and Drug Administration approval of the first digital pathology system for primary diagnosis in surgical pathology. As pathology laboratories commit to going fully digitalize, the need has emerged to include WSIs into an enterprise-level vendor-neutral archive (VNA). The different types of VNAs available are reviewed as well as how best to implement them and how pathology can benefit from participating in this effort. Differences between traditional image algorithms that extract pixel-, object-, and semantic-level features versus deep learning methods are highlighted. The need for large-scale data management, analysis, and visualization in computational pathology is also addressed.
Collapse
|
37
|
Gomes J, de Melo ACMA, Kong J, Kurc T, Saltz JH, Teodoro G. Cooperative and out-of-core execution of the irregular wavefront propagation pattern on hybrid machines with Intel Ⓡ Xeon Phi™. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2018; 30:e4425. [PMID: 30344454 PMCID: PMC6195363 DOI: 10.1002/cpe.4425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Irregular Wavefront Propagation Pattern (IWPP) is a core computing structure in several image analysis operations. Efficient implementation of IWPP on the Intel Xeon Phi is difficult because of the irregular data access and computation characteristics. The traditional IWPP algorithm relies on atomic instructions, which are not available in the SIMD set of the Intel Phi. To overcome this limitation, we have proposed a new IWPP algorithm that can take advantage of non-atomic SIMD instructions supported on the Intel Xeon Phi. We have also developed and evaluated methods to use CPU and Intel Phi cooperatively for parallel execution of the IWPP algorithms. Our new cooperative IWPP version is also able to handle large out-of-core images that would not fit into the memory of the accelerator. The new IWPP algorithm is used to implement the Morphological Reconstruction and Fill Holes operations, which are operations commonly found in image analysis applications. The vectorization implemented with the new IWPP has attained improvements of up to about 5× on top of the original IWPP and significant gains as compared to state-of-the-art the CPU and GPU versions. The new version running on an Intel Phi is 6.21× and 3.14× faster than running on a 16-core CPU and on a GPU, respectively. Finally, the cooperative execution using two Intel Phi devices and a multi-core CPU has reached performance gains of 2.14× as compared to the execution using a single Intel Xeon Phi.
Collapse
|
38
|
Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, Samaras D, Shroyer KR, Zhao T, Batiste R, Van Arnam J, Shmulevich I, Rao AUK, Lazar AJ, Sharma A, Thorsson V. Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. Cell Rep 2018; 23:181-193.e7. [PMID: 29617659 PMCID: PMC5943714 DOI: 10.1016/j.celrep.2018.03.086] [Citation(s) in RCA: 509] [Impact Index Per Article: 84.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 02/27/2018] [Accepted: 03/20/2018] [Indexed: 02/07/2023] Open
Abstract
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumor-infiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment.
Collapse
|
39
|
Saltz J, Sharma A, Iyer G, Bremer E, Wang F, Jasniewski A, DiPrima T, Almeida JS, Gao Y, Zhao T, Saltz M, Kurc T. A Containerized Software System for Generation, Management, and Exploration of Features from Whole Slide Tissue Images. Cancer Res 2017; 77:e79-e82. [PMID: 29092946 DOI: 10.1158/0008-5472.can-17-0316] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/17/2017] [Accepted: 09/01/2017] [Indexed: 11/16/2022]
Abstract
Well-curated sets of pathology image features will be critical to clinical studies that aim to evaluate and predict treatment responses. Researchers require information synthesized across multiple biological scales, from the patient to the molecular scale, to more effectively study cancer. This article describes a suite of services and web applications that allow users to select regions of interest in whole slide tissue images, run a segmentation pipeline on the selected regions to extract nuclei and compute shape, size, intensity, and texture features, store and index images and analysis results, and visualize and explore images and computed features. All the services are deployed as containers and the user-facing interfaces as web-based applications. The set of containers and web applications presented in this article is used in cancer research studies of morphologic characteristics of tumor tissues. The software is free and open source. Cancer Res; 77(21); e79-82. ©2017 AACR.
Collapse
|
40
|
Baig F, Vo H, Kurc T, Saltz J, Wang F. SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. PROCEEDINGS OF THE ... ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS : ACM GIS. ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS 2017; 2017:28. [PMID: 30035278 PMCID: PMC6054321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.
Collapse
|
41
|
Barreiros W, Teodoro G, Kurc T, Kong J, Melo ACMA, Saltz J. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING 2017; 2017:25-35. [PMID: 29081725 DOI: 10.1109/cluster.2017.28] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.
Collapse
|
42
|
Saltz J, Almeida J, Gao Y, Sharma A, Bremer E, DiPrima T, Saltz M, Kalpathy-Cramer J, Kurc T. Towards Generation, Management, and Exploration of Combined Radiomics and Pathomics Datasets for Cancer Research. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:85-94. [PMID: 28815113 PMCID: PMC5543366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cancer is a complex multifactorial disease state and the ability to anticipate and steer treatment results will require information synthesis across multiple scales from the host to the molecular level. Radiomics and Pathomics, where image features are extracted from routine diagnostic Radiology and Pathology studies, are also evolving as valuable diagnostic and prognostic indicators in cancer. This information explosion provides new opportunities for integrated, multi-scale investigation of cancer, but also mandates a need to build systematic and integrated approaches to manage, query and mine combined Radiomics and Pathomics data. In this paper, we describe a suite of tools and web-based applications towards building a comprehensive framework to support the generation, management and interrogation of large volumes of Radiomics and Pathomics feature sets and the investigation of correlations between image features, molecular data, and clinical outcome.
Collapse
|
43
|
Zhou N, Yu X, Zhao T, Wen S, Wang F, Zhu W, Kurc T, Tannenbaum A, Saltz J, Gao Y. Evaluation of nucleus segmentation in digital pathology images through large scale image synthesis. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2017; 10140. [PMID: 30344361 DOI: 10.1117/12.2254220] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Digital histopathology images with more than 1 Gigapixel are drawing more and more attention in clinical, biomedical research, and computer vision fields. Among the multiple observable features spanning multiple scales in the pathology images, the nuclear morphology is one of the central criteria for diagnosis and grading. As a result it is also the mostly studied target in image computing. Large amount of research papers have devoted to the problem of extracting nuclei from digital pathology images, which is the foundation of any further correlation study. However, the validation and evaluation of nucleus extraction have yet been formulated rigorously and systematically. Some researches report a human verified segmentation with thousands of nuclei, whereas a single whole slide image may contain up to million. The main obstacle lies in the difficulty of obtaining such a large number of validated nuclei, which is essentially an impossible task for pathologist. We propose a systematic validation and evaluation approach based on large scale image synthesis. This could facilitate a more quantitatively validated study for current and future histopathology image analysis field.
Collapse
|
44
|
Bremer E, Kurc T, Gao Y, Saltz J, Almeida JS. Safe "cloudification" of large images through picker APIs. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:342-351. [PMID: 28269829 PMCID: PMC5333212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The "Box model" allows users with no particular training in informatics, or access to specialized infrastructure, operate generic cloud computing resources through a temporary URI dereferencing mechanism known as "drop-file-picker API" ("picker API" for sort). This application programming interface (API) was popularized in the web app development community by DropBox, and is now a consumer-facing feature of all major cloud computing platforms such as Box.com, Google Drive and Amazon S3. This reports describes a prototype web service application that uses picker APIs to expose a new, "cloudified", API tailored for image analysis, without compromising the private governance of the data exposed. In order to better understand this cross-platform cloud computing landscape, we first measured the time for both transfer and traversing of large image files generated by whole slide imaging (WSI) in Digital Pathology. The verification that there is extensive interconnectivity between cloud resources let to the development of a prototype software application that exposes an image-traversing REST API to image files stored in any of the consumer-facing "boxes". In summary, an image file can be upload/synchronized into a any cloud resource with a file picker API and the prototype service described here will expose an HTTP REST API that remains within the safety of the user's own governance. The open source prototype is publicly available at sbu-bmi.github.io/imagebox. Availability The accompanying prototype application is made publicly available, fully functional, with open source, at http://sbu-bmi.github.io/imagebox://sbu-bmi.github.io/imagebox. An illustrative webcasted use of this Web App is included with the project codebase at https://github.com/SBU-BMI/imageboxs://github.com/SBU-BMI/imagebox.
Collapse
|
45
|
Teodoro G, Kurc T, Andrade G, Kong J, Ferreira R, Saltz J. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis. THE INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017; 31:32-51. [PMID: 28239253 PMCID: PMC5319667 DOI: 10.1177/1094342015594519] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.
Collapse
|
46
|
Baig F, Mehrotra M, Vo H, Wang F, Saltz J, Kurc T. SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies. BIOMEDICAL DATA MANAGEMENT AND GRAPH ONLINE QUERYING : VLDB 2015 WORKSHOPS, BIG-O(Q) AND DMAH, WAIKOLOA, HI, USA, AUGUST 31-SEPTEMBER 4, 2015, REVISED SELECTED PAPERS. INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES (41ST : 2015 : WAI... 2016; 9579:134-146. [PMID: 30198025 PMCID: PMC6126541 DOI: 10.1007/978-3-319-41576-5_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison of multiple results, and facilitate algorithm sensitivity studies. The sizes of images and analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present SparkGIS, a distributed, in-memory spatial data processing framework to query, retrieve, and compare large volumes of analytical image result data for algorithm evaluation. Our approach combines the in-memory distributed processing capabilities of Apache Spark and the efficient spatial query processing of Hadoop-GIS. The experimental evaluation of SparkGIS for heatmap computations used to compare nucleus segmentation results from multiple images and analysis runs shows that SparkGIS is efficient and scalable, enabling algorithm evaluation and algorithm sensitivity studies on large datasets.
Collapse
|
47
|
Gao Y, Ratner V, Zhu L, Diprima T, Kurc T, Tannenbaum A, Saltz J. Hierarchical nucleus segmentation in digital pathology images. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2016; 9791. [PMID: 27375315 DOI: 10.1117/12.2217029] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Extracting nuclei is one of the most actively studied topic in the digital pathology researches. Most of the studies directly search the nuclei (or seeds for the nuclei) from the finest resolution available. While the richest information has been utilized by such approaches, it is sometimes difficult to address the heterogeneity of nuclei in different tissues. In this work, we propose a hierarchical approach which starts from the lower resolution level and adaptively adjusts the parameters while progressing into finer and finer resolution. The algorithm is tested on brain and lung cancers images from The Cancer Genome Atlas data set.
Collapse
|
48
|
Gao Y, Liu W, Arjun S, Zhu L, Ratner V, Kurc T, Saltz J, Tannenbaum A. Multi-scale learning based segmentation of glands in digital colonrectal pathology images. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2016; 9791:97910M. [PMID: 27818565 PMCID: PMC5091801 DOI: 10.1117/12.2216790] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Digital histopathological images provide detailed spatial information of the tissue at micrometer resolution. Among the available contents in the pathology images, meso-scale information, such as the gland morphology, texture, and distribution, are useful diagnostic features. In this work, focusing on the colon-rectal cancer tissue samples, we propose a multi-scale learning based segmentation scheme for the glands in the colon-rectal digital pathology slides. The algorithm learns the gland and non-gland textures from a set of training images in various scales through a sparse dictionary representation. After the learning step, the dictionaries are used collectively to perform the classification and segmentation for the new image.
Collapse
|
49
|
Kurc T, Qi X, Wang D, Wang F, Teodoro G, Cooper L, Nalisnik M, Yang L, Saltz J, Foran DJ. Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies. BMC Bioinformatics 2015; 16:399. [PMID: 26627175 PMCID: PMC4667532 DOI: 10.1186/s12859-015-0831-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 11/16/2015] [Indexed: 11/16/2022] Open
Abstract
Background We describe a suite of tools and methods that form a core set of capabilities for researchers and clinical investigators to evaluate multiple analytical pipelines and quantify sensitivity and variability of the results while conducting large-scale studies in investigative pathology and oncology. The overarching objective of the current investigation is to address the challenges of large data sizes and high computational demands. Results The proposed tools and methods take advantage of state-of-the-art parallel machines and efficient content-based image searching strategies. The content based image retrieval (CBIR) algorithms can quickly detect and retrieve image patches similar to a query patch using a hierarchical analysis approach. The analysis component based on high performance computing can carry out consensus clustering on 500,000 data points using a large shared memory system. Conclusions Our work demonstrates efficient CBIR algorithms and high performance computing can be leveraged for efficient analysis of large microscopy images to meet the challenges of clinically salient applications in pathology. These technologies enable researchers and clinical investigators to make more effective use of the rich informational content contained within digitized microscopy specimens.
Collapse
|
50
|
Almeida JS, Hajagos J, Crnosija I, Kurc T, Saltz M, Saltz J. OpenHealth Platform for Interactive Contextualization of Population Health Open Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:297-305. [PMID: 26958160 PMCID: PMC4765591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The financial incentives for data science applications leading to improved health outcomes, such as DSRIP (bit.ly/dsrip), are well-aligned with the broad adoption of Open Data by State and Federal agencies. This creates entirely novel opportunities for analytical applications that make exclusive use of the pervasive Web Computing platform. The framework described here explores this new avenue to contextualize Health data in a manner that relies exclusively on the native JavaScript interpreter and data processing resources of the ubiquitous Web Browser. The OpenHealth platform is made publicly available, and is publicly hosted with version control and open source, at https://github.com/mathbiol/openHealth. The different data/analytics workflow architectures explored are accompanied with live applications ranging from DSRIP, such as Hospital Inpatient Prevention Quality Indicators at http://bit.ly/pqiSuffolk, to The Cancer Genome Atlas (TCGA) as illustrated by http://bit.ly/tcgascopeGBM.
Collapse
|