1
|
Jabłoński B, Puig Sitjes A, Makowski D, Jakubowski M, Gao Y, Fischer S, Winter A. Implementation and performance evaluation of the real-time algorithms for Wendelstein 7-X divertor protection system for OP2.1. FUSION ENGINEERING AND DESIGN 2023. [DOI: 10.1016/j.fusengdes.2023.113524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
2
|
Barreiros W, Moreira J, Kurc T, Kong J, Melo AC, Saltz JH, Teodoro G. Optimizing parameter sensitivity analysis of large-scale microscopy image analysis workflows with multilevel computation reuse. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2020; 32:e5403. [PMID: 32669980 PMCID: PMC7363336 DOI: 10.1002/cpe.5403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 05/18/2019] [Indexed: 06/11/2023]
Abstract
Parameter sensitivity analysis (SA) is an effective tool to gain knowledge about complex analysis applications and assess the variability in their analysis results. However, it is an expensive process as it requires the execution of the target application multiple times with a large number of different input parameter values. In this work, we propose optimizations to reduce the overall computation cost of SA in the context of analysis applications that segment high-resolution slide tissue images, ie, images with resolutions of 100k × 100k pixels. Two cost-cutting techniques are combined to efficiently execute SA: use of distributed hybrid systems for parallel execution and computation reuse at multiple levels of an analysis pipeline to reduce the amount of computation. These techniques were evaluated using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. Our parallel execution method attained an efficiency of over 90% on 256 nodes. The hybrid execution on the CPU and Intel Phi improved the performance by 2×. Multilevel computation reuse led to performance gains of over 2.9×.
Collapse
Affiliation(s)
- Willian Barreiros
- Department of Computer Science, University of Brasília, Brasília, Brazil
| | - Jeremias Moreira
- Department of Computer Science, University of Brasília, Brasília, Brazil
| | - Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Jun Kong
- Department of Biomedical Informatics, Emory University, Atlanta, Georgia
- Department of Computer Science, Emory University, Atlanta, Georgia
- Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia
| | - Alba C.M.A. Melo
- Department of Computer Science, University of Brasília, Brasília, Brazil
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - George Teodoro
- Department of Computer Science, University of Brasília, Brasília, Brazil
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
3
|
Gomes J, de Melo ACMA, Kong J, Kurc T, Saltz JH, Teodoro G. Cooperative and out-of-core execution of the irregular wavefront propagation pattern on hybrid machines with Intel Ⓡ Xeon Phi™. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2018; 30:e4425. [PMID: 30344454 PMCID: PMC6195363 DOI: 10.1002/cpe.4425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Irregular Wavefront Propagation Pattern (IWPP) is a core computing structure in several image analysis operations. Efficient implementation of IWPP on the Intel Xeon Phi is difficult because of the irregular data access and computation characteristics. The traditional IWPP algorithm relies on atomic instructions, which are not available in the SIMD set of the Intel Phi. To overcome this limitation, we have proposed a new IWPP algorithm that can take advantage of non-atomic SIMD instructions supported on the Intel Xeon Phi. We have also developed and evaluated methods to use CPU and Intel Phi cooperatively for parallel execution of the IWPP algorithms. Our new cooperative IWPP version is also able to handle large out-of-core images that would not fit into the memory of the accelerator. The new IWPP algorithm is used to implement the Morphological Reconstruction and Fill Holes operations, which are operations commonly found in image analysis applications. The vectorization implemented with the new IWPP has attained improvements of up to about 5× on top of the original IWPP and significant gains as compared to state-of-the-art the CPU and GPU versions. The new version running on an Intel Phi is 6.21× and 3.14× faster than running on a 16-core CPU and on a GPU, respectively. Finally, the cooperative execution using two Intel Phi devices and a multi-core CPU has reached performance gains of 2.14× as compared to the execution using a single Intel Xeon Phi.
Collapse
Affiliation(s)
- Jeremias Gomes
- Department of Computer Science, University of Brasília, Brasília-DF, Brazil
| | | | - Jun Kong
- Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - George Teodoro
- Department of Computer Science, University of Brasília, Brasília-DF, Brazil
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
4
|
Teodoro G, Kurc T, Andrade G, Kong J, Ferreira R, Saltz J. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis. THE INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017; 31:32-51. [PMID: 28239253 PMCID: PMC5319667 DOI: 10.1177/1094342015594519] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.
Collapse
Affiliation(s)
- George Teodoro
- Department of Computer Science, University of Brasília, Brasília, DF, Brazil
| | - Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Guilherme Andrade
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil
| | - Jun Kong
- Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Renato Ferreira
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| |
Collapse
|
5
|
Kurc T, Qi X, Wang D, Wang F, Teodoro G, Cooper L, Nalisnik M, Yang L, Saltz J, Foran DJ. Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies. BMC Bioinformatics 2015; 16:399. [PMID: 26627175 PMCID: PMC4667532 DOI: 10.1186/s12859-015-0831-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 11/16/2015] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND We describe a suite of tools and methods that form a core set of capabilities for researchers and clinical investigators to evaluate multiple analytical pipelines and quantify sensitivity and variability of the results while conducting large-scale studies in investigative pathology and oncology. The overarching objective of the current investigation is to address the challenges of large data sizes and high computational demands. RESULTS The proposed tools and methods take advantage of state-of-the-art parallel machines and efficient content-based image searching strategies. The content based image retrieval (CBIR) algorithms can quickly detect and retrieve image patches similar to a query patch using a hierarchical analysis approach. The analysis component based on high performance computing can carry out consensus clustering on 500,000 data points using a large shared memory system. CONCLUSIONS Our work demonstrates efficient CBIR algorithms and high performance computing can be leveraged for efficient analysis of large microscopy images to meet the challenges of clinically salient applications in pathology. These technologies enable researchers and clinical investigators to make more effective use of the rich informational content contained within digitized microscopy specimens.
Collapse
Affiliation(s)
- Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, USA.
| | - Xin Qi
- Department of Pathology & Laboratory Medicine, Rutgers -- Robert Wood Johnson Medical School, New Brunswick, USA.
- Rutgers Cancer Institute of New Jersey, New Brunswick, USA.
| | - Daihou Wang
- Department of Electrical and Computer Engineering, Rutgers University, New Brunswick, USA.
| | - Fusheng Wang
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, USA.
- Department of Computer Science, Stony Brook University, Stony Brook, USA.
| | - George Teodoro
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, USA.
- Department of Computer Science, University of Brasilia, Brasília, Brazil.
| | - Lee Cooper
- Department of Biomedical Informatics, Emory University, Atlanta, USA.
| | - Michael Nalisnik
- Department of Biomedical Informatics, Emory University, Atlanta, USA.
| | - Lin Yang
- Department of Biomedical Engineering, University of Florida, Gainesville, USA.
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, USA.
| | - David J Foran
- Department of Pathology & Laboratory Medicine, Rutgers -- Robert Wood Johnson Medical School, New Brunswick, USA.
- Rutgers Cancer Institute of New Jersey, New Brunswick, USA.
| |
Collapse
|
6
|
Gomes JM, Teodoro G, de Melo A, Kong J, Kurc T, Saltz JH. Efficient irregular wavefront propagation algorithms on Intel ® Xeon Phi ™. PROCEEDINGS. SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING 2015; 2015:25-32. [PMID: 27298591 DOI: 10.1109/sbac-pad.2015.13] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations.
Collapse
Affiliation(s)
| | | | | | - Jun Kong
- Emory University, Atlanta, GA, USA
| | | | | |
Collapse
|
7
|
Teodoro G, Pan T, Kurc T, Kong J, Cooper L, Klasky S, Saltz J. Region Templates: Data Representation and Management for High-Throughput Image Analysis. PARALLEL COMPUTING 2014; 40:589-610. [PMID: 26139953 PMCID: PMC4484879 DOI: 10.1016/j.parco.2014.09.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13×. Finally, a processing rate of 11,730 4K×4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1,200 CPU cores). This computation rate enables studies with very large datasets.
Collapse
Affiliation(s)
- George Teodoro
- Department of Computer Science, University of Brasília, Brasília, DF, Brazil
| | - Tony Pan
- Biomedical Informatics Department, Emory University, Atlanta, GA, USA
| | - Tahsin Kurc
- Biomedical Informatics Department, Stony Brook University, Stony Brook, NY, USA
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jun Kong
- Biomedical Informatics Department, Emory University, Atlanta, GA, USA
| | - Lee Cooper
- Biomedical Informatics Department, Emory University, Atlanta, GA, USA
| | - Scott Klasky
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Joel Saltz
- Biomedical Informatics Department, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
8
|
Andrade G, Ferreira R, Teodoro G, Rocha L, Saltz JH, Kurc T. Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems. PROCEEDINGS. SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING 2014; 2014:89-96. [PMID: 26640423 PMCID: PMC4670037 DOI: 10.1109/sbac-pad.2014.15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Collapse
|
9
|
Teodoro G, Kurc T, Kong J, Cooper L, Saltz J. Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS : A PUBLICATION OF THE IEEE COMPUTER SOCIETY 2014; 2014:1063-1072. [PMID: 25419088 PMCID: PMC4240026 DOI: 10.1109/ipdps.2014.111] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs).
Collapse
Affiliation(s)
- George Teodoro
- Department of Computer Science, University of Brasília, Brasília, DF, Brazil
| | - Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA ; Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jun Kong
- Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Lee Cooper
- Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
10
|
Kong J, Wang F, Teodoro G, Cooper L, Moreno CS, Kurc T, Pan T, Saltz J, Brat D. High-Performance Computational Analysis of Glioblastoma Pathology Images with Database Support Identifies Molecular and Survival Correlates. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2013:229-236. [PMID: 25098236 DOI: 10.1109/bibm.2013.6732495] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In this paper, we present a novel framework for microscopic image analysis of nuclei, data management, and high performance computation to support translational research involving nuclear morphometry features, molecular data, and clinical outcomes. Our image analysis pipeline consists of nuclei segmentation and feature computation facilitated by high performance computing with coordinated execution in multi-core CPUs and Graphical Processor Units (GPUs). All data derived from image analysis are managed in a spatial relational database supporting highly efficient scientific queries. We applied our image analysis workflow to 159 glioblastomas (GBM) from The Cancer Genome Atlas dataset. With integrative studies, we found statistics of four specific nuclear features were significantly associated with patient survival. Additionally, we correlated nuclear features with molecular data and found interesting results that support pathologic domain knowledge. We found that Proneural subtype GBMs had the smallest mean of nuclear Eccentricity and the largest mean of nuclear Extent, and MinorAxisLength. We also found gene expressions of stem cell marker MYC and cell proliferation maker MKI67 were correlated with nuclear features. To complement and inform pathologists of relevant diagnostic features, we queried the most representative nuclear instances from each patient population based on genetic and transcriptional classes. Our results demonstrate that specific nuclear features carry prognostic significance and associations with transcriptional and genetic classes, highlighting the potential of high throughput pathology image analysis as a complementary approach to human-based review and translational research.
Collapse
Affiliation(s)
- Jun Kong
- Department of Biomedical Informatics, Emory University
| | - Fusheng Wang
- Department of Biomedical Informatics, Emory University
| | - George Teodoro
- Department of Biomedical Informatics, Emory University ; College of Computing, Georgia Institute of Technology
| | - Lee Cooper
- Department of Biomedical Informatics, Emory University
| | - Carlos S Moreno
- Department of Pathology and Laboratory Medicine, Emory University
| | - Tahsin Kurc
- Department of Biomedical Informatics, Emory University
| | - Tony Pan
- Department of Biomedical Informatics, Emory University
| | - Joel Saltz
- Department of Biomedical Informatics, Emory University
| | - Daniel Brat
- Department of Pathology and Laboratory Medicine, Emory University
| |
Collapse
|
11
|
Teodoro G, Pan T, Kurc TM, Kong J, Cooper LAD, Podhorszki N, Klasky S, Saltz JH. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms. PROCEEDINGS. IPDPS (CONFERENCE) 2013; 2013:103-114. [PMID: 25419546 PMCID: PMC4240318 DOI: 10.1109/ipdps.2013.11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.
Collapse
Affiliation(s)
- George Teodoro
- Center for Comprehensive Informatics, Emory University, Atlanta, GA
| | - Tony Pan
- Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN
| | | | | | | | | | | | | |
Collapse
|