51
|
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark. SENSORS 2021; 21:s21020365. [PMID: 33430375 PMCID: PMC7827788 DOI: 10.3390/s21020365] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/29/2022]
Abstract
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
Collapse
|
52
|
Williams-Young DB, de Jong WA, van Dam HJJ, Yang C. On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters. Front Chem 2020; 8:581058. [PMID: 33363105 PMCID: PMC7758429 DOI: 10.3389/fchem.2020.581058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 09/14/2020] [Indexed: 11/20/2022] Open
Abstract
The predominance of Kohn–Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high-performance computing (HPC). With recent trends in HPC leading toward increasing reliance on heterogeneous accelerator-based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high levels of performance that have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn–Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.
Collapse
|
53
|
Wang T, Ma Y, Zhao L, Jiang J. Portably parallel construction of a configuration-interaction wave function from a matrix-product state using the Charm++ framework. J Comput Chem 2020; 41:2707-2721. [PMID: 32986283 DOI: 10.1002/jcc.26424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 08/16/2020] [Accepted: 09/02/2020] [Indexed: 11/10/2022]
Abstract
The construction of configuration-interaction (CI) expansions from a matrix product state (MPS) involves numerous matrix operations and the skillful sampling of important configurations in a large Hilbert space. In this work, we present an efficient procedure for constructing CI expansions from MPS employing the parallel object-oriented Charm++ programming framework, upon which automatic load-balancing and object migrating facilities can be employed. This procedure was employed in the MPS-to-CI utility (Moritz et al., J. Chem. Phys. 2007, 126, 224109), the sampling-reconstructed complete active-space algorithm (SR-CAS, Boguslawski et al., J. Chem. Phys. 2011, 134, 224101), and the entanglement-driven genetic algorithm (EDGA, Luo et al., J. Chem. Theory Comput. 2017, 13, 4699). It enhances productivity and allows the sampling programs to evolve to their population-expansion versions, for example, EDGA with population expansion (PE-EDGA). Further, examples of 1,2-dioxetanone and firefly dioxetanone anion (FDO- ) molecules demonstrated the following: (a) parallel efficiencies can be persistently improved by simply by increasing the proportions of the asynchronous executions and (b) a sampled CAS-type CI wave function of a bi-radical-state FDO- molecule utilizing the full valence (30e,26o) active space can be constructed within a few hours with using thousands of cores.
Collapse
|
54
|
Moreno Escobar JJ, Morales Matamoros O, Tejeida Padilla R, Chanona Hernández L, Posadas Durán JPF, Pérez Martínez AK, Lina Reyes I, Quintana Espinosa H. Biomedical Signal Acquisition Using Sensors under the Paradigm of Parallel Computing. SENSORS 2020; 20:s20236991. [PMID: 33297388 PMCID: PMC7730710 DOI: 10.3390/s20236991] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 11/18/2022]
Abstract
There are several pathologies attacking the central nervous system and diverse therapies for each specific disease. These therapies seek as far as possible to minimize or offset the consequences caused by these types of pathologies and disorders in the patient. Therefore, comprehensive neurological care has been performed by neurorehabilitation therapies, to improve the patients’ life quality and facilitating their performance in society. One way to know how the neurorehabilitation therapies contribute to help patients is by measuring changes in their brain activity by means of electroencephalograms (EEG). EEG data-processing applications have been used in neuroscience research to be highly computing- and data-intensive. Our proposal is an integrated system of Electroencephalographic, Electrocardiographic, Bioacoustic, and Digital Image Acquisition Analysis to provide neuroscience experts with tools to estimate the efficiency of a great variety of therapies. The three main axes of this proposal are: parallel or distributed capture, filtering and adaptation of biomedical signals, and synchronization in real epochs of sampling. Thus, the present proposal underlies a general system, whose main objective is to be a wireless benchmark in the field. In this way, this proposal could acquire and give some analysis tools for biomedical signals used for measuring brain interactions when it is stimulated by an external system during therapies, for example. Therefore, this system supports extreme environmental conditions, when necessary, which broadens the spectrum of its applications. In addition, in this proposal sensors could be added or eliminated depending on the needs of the research, generating a wide range of configuration limited by the number of CPU cores, i.e., the more biosensors, the more CPU cores will be required. To validate the proposed integrated system, it is used in a Dolphin-Assisted Therapy in patients with Infantile Cerebral Palsy and Obsessive–Compulsive Disorder, as well as with a neurotypical one. Event synchronization of sample periods helped isolate the same therapy stimulus and allowed it to be analyzed by tools such as the Power Spectrum or the Fractal Geometry.
Collapse
|
55
|
Tang M, Yu Y, Mahmood AR, Malluhi QM, Ouzzani M, Aref WG. LocationSpark: In-memory Distributed Spatial Query Processing and Optimization. Front Big Data 2020; 3:30. [PMID: 33693403 PMCID: PMC7931877 DOI: 10.3389/fdata.2020.00030] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 08/05/2020] [Indexed: 11/13/2022] Open
Abstract
Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happens in practice, and minimizes communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.
Collapse
|
56
|
Meyerov I, Kozinov E, Liniov A, Volokitin V, Yusipov I, Ivanchenko M, Denisov S. Transforming Lindblad Equations into Systems of Real-Valued Linear Equations: Performance Optimization and Parallelization of an Algorithm. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1133. [PMID: 33286901 PMCID: PMC7597275 DOI: 10.3390/e22101133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 09/26/2020] [Accepted: 09/30/2020] [Indexed: 11/30/2022]
Abstract
With their constantly increasing peak performance and memory capacity, modern supercomputers offer new perspectives on numerical studies of open many-body quantum systems. These systems are often modeled by using Markovian quantum master equations describing the evolution of the system density operators. In this paper, we address master equations of the Lindblad form, which are a popular theoretical tools in quantum optics, cavity quantum electrodynamics, and optomechanics. By using the generalized Gell-Mann matrices as a basis, any Lindblad equation can be transformed into a system of ordinary differential equations with real coefficients. Recently, we presented an implementation of the transformation with the computational complexity, scaling as O(N5logN) for dense Lindbaldians and O(N3logN) for sparse ones. However, infeasible memory costs remains a serious obstacle on the way to large models. Here, we present a parallel cluster-based implementation of the algorithm and demonstrate that it allows us to integrate a sparse Lindbladian model of the dimension N=2000 and a dense random Lindbladian model of the dimension N=200 by using 25 nodes with 64 GB RAM per node.
Collapse
|
57
|
Chen K, Xie K, Wen C, Tang XG. Weak Signal Enhance Based on the Neural Network Assisted Empirical Mode Decomposition. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20123373. [PMID: 32549237 PMCID: PMC7348951 DOI: 10.3390/s20123373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/12/2020] [Accepted: 06/13/2020] [Indexed: 06/11/2023]
Abstract
In order to enhance weak signals in strong noise background, a weak signal enhancement method based on EMDNN (neural network-assisted empirical mode decomposition) is proposed. This method combines CEEMD (complementary ensemble empirical mode decomposition), GAN (generative adversarial networks) and LSTM (long short-term memory), it enhances the efficiency of selecting effective natural mode components in empirical mode decomposition, thus the SNR (signal-noise ratio) is improved. It can also reconstruct and enhance weak signals. The experimental results show that the SNR of this method is improved from 4.1 to 6.2, and the weak signal is clearly recovered.
Collapse
|
58
|
Tahmasebi N, Boulanger P, Yun J, Fallone G, Noga M, Punithakumar K. Real-Time Lung Tumor Tracking Using a CUDA Enabled Nonrigid Registration Algorithm for MRI. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2020; 8:4300308. [PMID: 32411543 PMCID: PMC7217296 DOI: 10.1109/jtehm.2020.2989124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 11/30/2019] [Accepted: 12/30/2019] [Indexed: 11/09/2022]
Abstract
Objective: This study intends to develop an accurate, real-time tumor tracking algorithm for the automated radiation therapy for cancer treatment using Graphics Processing Unit (GPU) computing. Although a previous moving mesh based tumor tracking approach has been shown to be successful in delineating the tumor regions from a sequence of magnetic resonance image, the algorithm is computationally intensive and its computation time on standard Central Processing Unit (CPU) processors is too slow to be used clinically especially for automated radiation therapy system. Method: A re-implementation of the algorithm on a low-cost parallel GPU-based computing platform is utilized to accelerate this computation at a speed that is amicable to clinical usages. Several components in the registration algorithm such as the computation of similarity metric are inherently parallel which fits well with the GPU parallel processing capabilities. Solving a partial differential equation numerically to generate the mesh deformation is one of the computationally intensive components which has been accelerated by utilizing a much faster shared memory on the GPU. Results: Implemented on an NVIDIA Tesla K40c GPU, the proposed approach yielded a computational acceleration improvement of over 5 times its implementation on a CPU. The proposed approach yielded an average Dice score of 0.87 evaluated over 600 images acquired from six patients. Conclusion: This study demonstrated that the GPU computing approach can be used to accelerate tumor tracking for automated radiation therapy for mobile lung tumors. Clinical Impact: Accurately tracking mobile tumor boundaries in real-time is important to automate radiation therapy and the proposed study offers an excellent option for fast tumor region tracking for cancer treatment.
Collapse
|
59
|
Straggler-Aware Distributed Learning: Communication-Computation Latency Trade-Off. ENTROPY 2020; 22:e22050544. [PMID: 33286316 PMCID: PMC7517046 DOI: 10.3390/e22050544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/03/2020] [Accepted: 05/07/2020] [Indexed: 11/17/2022]
Abstract
When gradient descent (GD) is scaled to many parallel workers for large-scale machine learning applications, its per-iteration computation time is limited by straggling workers. Straggling workers can be tolerated by assigning redundant computations and/or coding across data and computations, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (PS) after completing all its computations. Imposing such a limitation results in two drawbacks: over-computation due to inaccurate prediction of the straggling behavior, and under-utilization due to discarding partial computations carried out by stragglers. To overcome these drawbacks, we consider multi-message communication (MMC) by allowing multiple computations to be conveyed from each worker per iteration, and propose novel straggler avoidance techniques for both coded computation and coded communication with MMC. We analyze how the proposed designs can be employed efficiently to seek a balance between the computation and communication latency. Furthermore, we identify the advantages and disadvantages of these designs in different settings through extensive simulations, both model-based and real implementation on Amazon EC2 servers, and demonstrate that proposed schemes with MMC can help improve upon existing straggler avoidance schemes.
Collapse
|
60
|
Jordan J, Helias M, Diesmann M, Kunkel S. Efficient Communication in Distributed Simulations of Spiking Neuronal Networks With Gap Junctions. Front Neuroinform 2020; 14:12. [PMID: 32431602 PMCID: PMC7214808 DOI: 10.3389/fninf.2020.00012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 03/06/2020] [Indexed: 12/01/2022] Open
Abstract
Investigating the dynamics and function of large-scale spiking neuronal networks with realistic numbers of synapses is made possible today by state-of-the-art simulation code that scales to the largest contemporary supercomputers. However, simulations that involve electrical interactions, also called gap junctions, besides chemical synapses scale only poorly due to a communication scheme that collects global data on each compute node. In comparison to chemical synapses, gap junctions are far less abundant. To improve scalability we exploit this sparsity by integrating an existing framework for continuous interactions with a recently proposed directed communication scheme for spikes. Using a reference implementation in the NEST simulator we demonstrate excellent scalability of the integrated framework, accelerating large-scale simulations with gap junctions by more than an order of magnitude. This allows, for the first time, the efficient exploration of the interactions of chemical and electrical coupling in large-scale neuronal networks models with natural synapse density distributed across thousands of compute nodes.
Collapse
|
61
|
A Machine Learning-based Algorithm for Water Network Contamination Source Localization. SENSORS 2020; 20:s20092613. [PMID: 32375289 PMCID: PMC7248744 DOI: 10.3390/s20092613] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 11/19/2022]
Abstract
In this paper, a novel machine learning based algorithm for water supply pollution source identification is presented built specifically for high performance parallel systems. The algorithm utilizes the combination of Artificial Neural Networks for classification of the pollution source with Random Forests for regression analysis to determine significant variables of a contamination event such as start time, end time and contaminant chemical concentration. The algorithm is based on performing Monte Carlo water quality and hydraulic simulations in parallel, recording data with sensors placed within a water supply network and selecting a most probable pollution source based on a tournament style selection between suspect nodes in a network with mentioned machine learning methods. The novel algorithmic framework is tested on a small (92 nodes) and medium sized (865 nodes) water supply sensor network benchmarks with a set contamination event start time, end time and chemical concentration. Out of the 30 runs, the true source node was the finalist of the algorithm’s tournament style selection for 30/30 runs for the small network, and 29/30 runs for the medium sized network. For all the 30 runs on the small sensor network, the true contamination event scenario start time, end time and chemical concentration was set as 14:20, 20:20 and 813.7 mg/L, respectively. The root mean square errors for all 30 algorithm runs for the three variables were 48 min, 4.38 min and 18.06 mg/L. For the 29 successful medium sized network runs the start time was 06:50, end time 07:40 and chemical concentration of 837 mg/L and the root mean square errors were 6.06 min, 12.36 min and 299.84 mg/L. The algorithmic framework successfully narrows down the potential sources of contamination leading to a pollution source identification, start and ending time of the event and the contaminant chemical concentration.
Collapse
|
62
|
He B, Yang Z, Fan L, Gao B, Li H, Ye C, You B, Jiang T. MonkeyCBP: A Toolbox for Connectivity-Based Parcellation of Monkey Brain. Front Neuroinform 2020; 14:14. [PMID: 32410977 PMCID: PMC7198896 DOI: 10.3389/fninf.2020.00014] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/10/2020] [Indexed: 01/24/2023] Open
Abstract
Non-human primate models are widely used in studying the brain mechanism underlying brain development, cognitive functions, and psychiatric disorders. Neuroimaging techniques, such as magnetic resonance imaging, play an important role in the examinations of brain structure and functions. As an indispensable tool for brain imaging data analysis, brain atlases have been extensively investigated, and a variety of versions constructed. These atlases diverge in the criteria based on which they are plotted. The criteria range from cytoarchitectonic features, neurotransmitter receptor distributions, myelination fingerprints, and transcriptomic patterns to structural and functional connectomic profiles. Among them, brainnetome atlas is tightly related to brain connectome information and built by parcellating the brain on the basis of the anatomical connectivity profiles derived from structural neuroimaging data. The pipeline for building the brainnetome atlas has been published as a toolbox named ATPP (A Pipeline for Automatic Tractography-Based Brain Parcellation). In this paper, we present a variation of ATPP, which is dedicated to monkey brain parcellation, to address the significant differences in the process between the two species. The new toolbox, MonkeyCBP, has major alterations in three aspects: brain extraction, image registration, and validity indices. By parcellating two different brain regions (posterior cingulate cortex) and (frontal pole) of the rhesus monkey, we demonstrate the efficacy of these alterations. The toolbox has been made public (https://github.com/bheAI/MonkeyCBP_CLI, https://github.com/bheAI/MonkeyCBP_GUI). It is expected that the toolbox can benefit the non-human primate neuroimaging community with high-throughput computation and low labor involvement.
Collapse
|
63
|
Ni Y, Ji Y, Müller P. Consensus Monte Carlo for Random Subsets using Shared Anchors. J Comput Graph Stat 2020; 29:703-714. [PMID: 33456293 PMCID: PMC7810350 DOI: 10.1080/10618600.2020.1737085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Revised: 12/07/2019] [Accepted: 02/25/2020] [Indexed: 10/24/2022]
Abstract
We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.
Collapse
|
64
|
van der Heide O, Sbrizzi A, Luijten PR, van den Berg CA. High-resolution in vivo MR-STAT using a matrix-free and parallelized reconstruction algorithm. NMR IN BIOMEDICINE 2020; 33:e4251. [PMID: 31985134 PMCID: PMC7079175 DOI: 10.1002/nbm.4251] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 11/12/2019] [Accepted: 12/05/2019] [Indexed: 05/25/2023]
Abstract
MR-STAT is a recently proposed framework that allows the reconstruction of multiple quantitative parameter maps from a single short scan by performing spatial localisation and parameter estimation on the time-domain data simultaneously, without relying on the fast Fourier transform (FFT). To do this at high resolution, specialized algorithms are required to solve the underlying large-scale nonlinear optimisation problem. We propose a matrix-free and parallelized inexact Gauss-Newton based reconstruction algorithm for this purpose. The proposed algorithm is implemented on a high-performance computing cluster and is demonstrated to be able to generate high-resolution (1 mm × 1 mm in-plane resolution) quantitative parameter maps in simulation, phantom, and in vivo brain experiments. Reconstructed T1 and T2 values for the gel phantoms are in agreement with results from gold standard measurements and, for the in vivo experiments, the quantitative values show good agreement with literature values. In all experiments, short pulse sequences with robust Cartesian sampling are used, for which MR fingerprinting reconstructions are shown to fail.
Collapse
|
65
|
Yelick K, Buluç A, Awan M, Azad A, Brock B, Egan R, Ekanayake S, Ellis M, Georganas E, Guidi G, Hofmeyr S, Selvitopi O, Teodoropol C, Oliker L. The parallelism motifs of genomic data analysis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2020; 378:20190394. [PMID: 31955674 PMCID: PMC7015300 DOI: 10.1098/rsta.2019.0394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/09/2019] [Indexed: 05/06/2023]
Abstract
Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
Collapse
|
66
|
Wang Z, Zhang J, Gao W, Liu Z, Wan X, Zhang F. A Consensus Framework of Distributed Multiple-Tilt Reconstruction in Electron Tomography. J Comput Biol 2020; 27:212-222. [PMID: 31794252 DOI: 10.1089/cmb.2019.0287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The "missing wedge" of a single tilt in electron tomography introduces severe artifacts into the reconstructed results. To reduce the "missing wedge" effect, a widely used method is "multiple-tilt reconstruction," which collects projections using multiple axes. However, as the number of tilt series increases, the computing and memory costs also rise. The degree of parallelism is limited by the sample thickness, and a large memory requirement cannot be met by most multicore computers. In our study, we present a new fully distributed multiple-tilt simultaneous iterative reconstruction technique (DM-SIRT). To improve the parallelism of the reconstruction process and reduce the memory requirements of each process, we formulate the multiple-tilt reconstruction as a consensus optimization problem and design a DM-SIRT algorithm. Experiments show that in addition to slightly better resolution, DM-SIRT can obtain a 13.9 × accelerated ratio compared with the full multiple-tilt reconstruction version. It also has a 97% decrease in memory overhead and is 16 times more scalable than the full reconstruction version.
Collapse
|
67
|
He Y, Zheng S, Zhu F, Huang X. Real-Time 3D Reconstruction of Thin Surface Based on Laser Line Scanner. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20020534. [PMID: 31963669 PMCID: PMC7014519 DOI: 10.3390/s20020534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 01/12/2020] [Accepted: 01/13/2020] [Indexed: 05/27/2023]
Abstract
The truncated signed distance field (TSDF) has been applied as a fast, accurate, and flexible geometric fusion method in 3D reconstruction of industrial products based on a hand-held laser line scanner. However, this method has some problems for the surface reconstruction of thin products. The surface mesh will collapse to the interior of the model, resulting in some topological errors, such as overlap, intersections, or gaps. Meanwhile, the existing TSDF method ensures real-time performance through significant graphics processing unit (GPU) memory usage, which limits the scale of reconstruction scene. In this work, we propose three improvements to the existing TSDF methods, including: (i) a thin surface attribution judgment method in real-time processing that solves the problem of interference between the opposite sides of the thin surface; we distinguish measurements originating from different parts of a thin surface by the angle between the surface normal and the observation line of sight; (ii) a post-processing method to automatically detect and repair the topological errors in some areas where misjudgment of thin-surface attribution may occur; (iii) a framework that integrates the central processing unit (CPU) and GPU resources to implement our 3D reconstruction approach, which ensures real-time performance and reduces GPU memory usage. The proposed results show that this method can provide more accurate 3D reconstruction of a thin surface, which is similar to the state-of-the-art laser line scanners with 0.02 mm accuracy. In terms of performance, the algorithm can guarantee a frame rate of more than 60 frames per second (FPS) with the GPU memory footprint under 500 MB. In total, the proposed method can achieve a real-time and high-precision 3D reconstruction of a thin surface.
Collapse
|
68
|
Goudie RJB, Turner RM, De Angelis D, Thomas A. MultiBUGS: A Parallel Implementation of the BUGS Modelling Framework for Faster Bayesian Inference. J Stat Softw 2020; 95. [PMID: 33071678 DOI: 10.18637/jss.v095.i07] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
MultiBUGS is a new version of the general-purpose Bayesian modelling software BUGS that implements a generic algorithm for parallelising Markov chain Monte Carlo (MCMC) algorithms to speed up posterior inference of Bayesian models. The algorithm parallelises evaluation of the product-form likelihoods formed when a parameter has many children in the directed acyclic graph (DAG) representation; and parallelises sampling of conditionally-independent sets of parameters. A heuristic algorithm is used to decide which approach to use for each parameter and to apportion computation across computational cores. This enables MultiBUGS to automatically parallelise the broad range of statistical models that can be fitted using BUGS-language software, making the dramatic speed-ups of modern multi-core computing accessible to applied statisticians, without requiring any experience of parallel programming. We demonstrate the use of MultiBUGS on simulated data designed to mimic a hierarchical e-health linked-data study of methadone prescriptions including 425,112 observations and 20,426 random effects. Posterior inference for the e-health model takes several hours in existing software, but MultiBUGS can perform inference in only 28 minutes using 48 computational cores.
Collapse
|
69
|
Valdiviezo-N JC, Hernandez-Lopez FJ, Toxqui-Quitl C. Parallel implementations to accelerate the autofocus process in microscopy applications. J Med Imaging (Bellingham) 2020; 7:014001. [PMID: 31956664 PMCID: PMC6968793 DOI: 10.1117/1.jmi.7.1.014001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 12/24/2019] [Indexed: 03/29/2024] Open
Abstract
Several autofocus algorithms based on the analysis of image sharpness have been proposed for microscopy applications. Since autofocus functions (AFs) are computed from several images captured at different lens positions, these algorithms are considered computationally intensive. With the aim of presenting the capabilities of dedicated hardware to speed-up the autofocus process, we discuss the implementation of four AFs using, respectively, a multicore central processing unit (CPU) architecture and a graphic processing unit (GPU) card. Throughout different experiments performed on 300 image stacks previously identified with tuberculosis bacilli, the proposed implementations have allowed for the acceleration of the computation time for some AFs up to 23 times with respect to the serial version. These results show that the optimal use of multicore CPU and GPUs can be used effectively for autofocus in real-time microscopy applications.
Collapse
|
70
|
Chattopadhyay A, Lu TP. Gene-gene interaction: the curse of dimensionality. ANNALS OF TRANSLATIONAL MEDICINE 2019; 7:813. [PMID: 32042829 DOI: 10.21037/atm.2019.12.87] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
Collapse
|
71
|
Igarashi J, Yamaura H, Yamazaki T. Large-Scale Simulation of a Layered Cortical Sheet of Spiking Network Model Using a Tile Partitioning Method. Front Neuroinform 2019; 13:71. [PMID: 31849631 PMCID: PMC6895031 DOI: 10.3389/fninf.2019.00071] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 11/12/2019] [Indexed: 11/13/2022] Open
Abstract
One of the grand challenges for computational neuroscience and high-performance computing is computer simulation of a human-scale whole brain model with spiking neurons and synaptic plasticity using supercomputers. To achieve such a simulation, the target network model must be partitioned onto a number of computational nodes, and the sub-network models are executed in parallel while communicating spike information across different nodes. However, it remains unclear how the target network model should be partitioned for efficient computing on next generation of supercomputers. Specifically, reducing the communication of spike information across compute nodes is essential, because of the relatively slower network performance than processor and memory. From the viewpoint of biological features, the cerebral cortex and cerebellum contain 99% of neurons and synapses and form layered sheet structures. Therefore, an efficient method to split the network should exploit the layered sheet structures. In this study, we indicate that a tile partitioning method leads to efficient communication. To demonstrate it, a simulation software called MONET (Millefeuille-like Organization NEural neTwork simulator) that partitions a network model as described above was developed. The MONET simulator was implemented on the Japanese flagship supercomputer K, which is composed of 82,944 computational nodes. We examined a performance of calculation, communication and memory consumption in the tile partitioning method for a cortical model with realistic anatomical and physiological parameters. The result showed that the tile partitioning method drastically reduced communication data amount by replacing network communication with DRAM access and sharing the communication data with neighboring neurons. We confirmed the scalability and efficiency of the tile partitioning method on up to 63,504 compute nodes of the K computer for the cortical model. In the companion paper by Yamaura et al., the performance for a cerebellar model was examined. These results suggest that the tile partitioning method will have advantage for a human-scale whole-brain simulation on exascale computers.
Collapse
|
72
|
Ullah E, Yosafshahi M, Hassoun S. Towards scaling elementary flux mode computation. Brief Bioinform 2019; 21:1875-1885. [PMID: 31745550 DOI: 10.1093/bib/bbz094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Revised: 07/04/2019] [Accepted: 07/05/2019] [Indexed: 01/05/2023] Open
Abstract
While elementary flux mode (EFM) analysis is now recognized as a cornerstone computational technique for cellular pathway analysis and engineering, EFM application to genome-scale models remains computationally prohibitive. This article provides a review of aspects of EFM computation that elucidates bottlenecks in scaling EFM computation. First, algorithms for computing EFMs are reviewed. Next, the impact of redundant constraints, sensitivity to constraint ordering and network compression are evaluated. Then, the advantages and limitations of recent parallelization and GPU-based efforts are highlighted. The article then reviews alternative pathway analysis approaches that aim to reduce the EFM solution space. Despite advances in EFM computation, our review concludes that continued scaling of EFM computation is necessary to apply EFM to genome-scale models. Further, our review concludes that pathway analysis methods that target specific pathway properties can provide powerful alternatives to EFM analysis.
Collapse
|
73
|
Ayres DL, Cummings MP, Baele G, Darling AE, Lewis PO, Swofford DL, Huelsenbeck JP, Lemey P, Rambaut A, Suchard MA. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Syst Biol 2019; 68:1052-1061. [PMID: 31034053 PMCID: PMC6802572 DOI: 10.1093/sysbio/syz020] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 04/10/2019] [Accepted: 04/10/2019] [Indexed: 11/12/2022] Open
Abstract
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
Collapse
|
74
|
Xu Z, Wang Y, Sun N, Li Z, Hu S, Liu Q. Parallel Computing for Quantitative Blood Flow Imaging in Photoacoustic Microscopy. SENSORS 2019; 19:s19184000. [PMID: 31527505 PMCID: PMC6767147 DOI: 10.3390/s19184000] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/30/2019] [Accepted: 09/14/2019] [Indexed: 02/07/2023]
Abstract
Photoacoustic microscopy (PAM) is an emerging biomedical imaging technology capable of quantitative measurement of the microvascular blood flow by correlation analysis. However, the computational cost is high, limiting its applications. Here, we report a parallel computation design based on graphics processing unit (GPU) for high-speed quantification of blood flow in PAM. Two strategies were utilized to improve the computational efficiency. First, the correlation method in the algorithm was optimized to avoid redundant computation and a parallel computing structure was designed. Second, the parallel design was realized on GPU and optimized by maximizing the utilization of computing resource in GPU. The detailed timings and speedup for each calculation step were given and the MATLAB and C/C++ code versions based on CPU were presented as a comparison. Full performance test shows that a stable speedup of ~80-fold could be achieved with the same calculation accuracy and the computation time could be reduced from minutes to just several seconds with the imaging size ranging from 1 × 1 mm2 to 2 × 2 mm2. Our design accelerates PAM-based blood flow measurement and paves the way for real-time PAM imaging and processing by significantly improving the computational efficiency.
Collapse
|
75
|
Hirschfeld G, Thiele C. Cloud-based simulation studies in R - A tutorial on using doRedis with Amazon spot fleets. Stat Med 2019; 38:3947-3959. [PMID: 31049978 DOI: 10.1002/sim.8188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 03/15/2019] [Accepted: 04/08/2019] [Indexed: 11/09/2022]
Abstract
Simulation studies are helpful in testing novel statistical methods. From a computational perspective, they constitute embarrassingly parallel tasks. We describe parallelization techniques in the programming language R that can be used on Amazon's cloud-based infrastructure. After a short conceptual overview of the parallelization techniques in R, we provide a hands-on tutorial on how the doRedis package in conjunction with the Redis server can be used on Amazon Web Services, specifically running spot fleets. The tutorial proceeds in seven steps, ie, (1) starting up an EC2 instance, (2) installing a Redis server, (3) using doRedis with a local worker, (4) using doRedis with a remote worker, (5) setting up instances that automatically fetch tasks from a specific master, (6) using spot-fleets, and (7) shutting down the instances. As a basic example, we show how these techniques can be used to assess the effects of heteroscedasticity on the equal-variance t-test. Furthermore, we address several advanced issues, such as multiple conditions, cost-management, and chunking.
Collapse
|