76
|
Ford CT, Janies D. Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria. F1000Res 2020; 9:62. [PMID: 35903243 PMCID: PMC9274019 DOI: 10.12688/f1000research.21539.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/16/2020] [Indexed: 09/02/2024] Open
Abstract
Resistance in malaria is a growing concern affecting many areas of Sub-Saharan Africa and Southeast Asia. Since the emergence of artemisinin resistance in the late 2000s in Cambodia, research into the underlying mechanisms has been underway. The 2019 Malaria Challenge posited the task of developing computational models that address important problems in advancing the fight against malaria. The first goal was to accurately predict artemisinin drug resistance levels of Plasmodium falciparum isolates, as quantified by the IC 50. The second goal was to predict the parasite clearance rate of malaria parasite isolates based on in vitro transcriptional profiles. In this work, we develop machine learning models using novel methods for transforming isolate data and handling the tens of thousands of variables that result from these data transformation exercises. This is demonstrated by using massively parallel processing of the data vectorization for use in scalable machine learning. In addition, we show the utility of ensemble machine learning modeling for highly effective predictions of both goals of this challenge. This is demonstrated by the use of multiple machine learning algorithms combined with various scaling and normalization preprocessing steps. Then, using a voting ensemble, multiple models are combined to generate a final model prediction.
Collapse
|
77
|
He Y, Zheng S, Zhu F, Huang X. Real-Time 3D Reconstruction of Thin Surface Based on Laser Line Scanner. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20020534. [PMID: 31963669 PMCID: PMC7014519 DOI: 10.3390/s20020534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 01/12/2020] [Accepted: 01/13/2020] [Indexed: 05/27/2023]
Abstract
The truncated signed distance field (TSDF) has been applied as a fast, accurate, and flexible geometric fusion method in 3D reconstruction of industrial products based on a hand-held laser line scanner. However, this method has some problems for the surface reconstruction of thin products. The surface mesh will collapse to the interior of the model, resulting in some topological errors, such as overlap, intersections, or gaps. Meanwhile, the existing TSDF method ensures real-time performance through significant graphics processing unit (GPU) memory usage, which limits the scale of reconstruction scene. In this work, we propose three improvements to the existing TSDF methods, including: (i) a thin surface attribution judgment method in real-time processing that solves the problem of interference between the opposite sides of the thin surface; we distinguish measurements originating from different parts of a thin surface by the angle between the surface normal and the observation line of sight; (ii) a post-processing method to automatically detect and repair the topological errors in some areas where misjudgment of thin-surface attribution may occur; (iii) a framework that integrates the central processing unit (CPU) and GPU resources to implement our 3D reconstruction approach, which ensures real-time performance and reduces GPU memory usage. The proposed results show that this method can provide more accurate 3D reconstruction of a thin surface, which is similar to the state-of-the-art laser line scanners with 0.02 mm accuracy. In terms of performance, the algorithm can guarantee a frame rate of more than 60 frames per second (FPS) with the GPU memory footprint under 500 MB. In total, the proposed method can achieve a real-time and high-precision 3D reconstruction of a thin surface.
Collapse
|
78
|
Goudie RJB, Turner RM, De Angelis D, Thomas A. MultiBUGS: A Parallel Implementation of the BUGS Modelling Framework for Faster Bayesian Inference. J Stat Softw 2020; 95. [PMID: 33071678 DOI: 10.18637/jss.v095.i07] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
MultiBUGS is a new version of the general-purpose Bayesian modelling software BUGS that implements a generic algorithm for parallelising Markov chain Monte Carlo (MCMC) algorithms to speed up posterior inference of Bayesian models. The algorithm parallelises evaluation of the product-form likelihoods formed when a parameter has many children in the directed acyclic graph (DAG) representation; and parallelises sampling of conditionally-independent sets of parameters. A heuristic algorithm is used to decide which approach to use for each parameter and to apportion computation across computational cores. This enables MultiBUGS to automatically parallelise the broad range of statistical models that can be fitted using BUGS-language software, making the dramatic speed-ups of modern multi-core computing accessible to applied statisticians, without requiring any experience of parallel programming. We demonstrate the use of MultiBUGS on simulated data designed to mimic a hierarchical e-health linked-data study of methadone prescriptions including 425,112 observations and 20,426 random effects. Posterior inference for the e-health model takes several hours in existing software, but MultiBUGS can perform inference in only 28 minutes using 48 computational cores.
Collapse
|
79
|
Valdiviezo-N JC, Hernandez-Lopez FJ, Toxqui-Quitl C. Parallel implementations to accelerate the autofocus process in microscopy applications. J Med Imaging (Bellingham) 2020; 7:014001. [PMID: 31956664 PMCID: PMC6968793 DOI: 10.1117/1.jmi.7.1.014001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 12/24/2019] [Indexed: 03/29/2024] Open
Abstract
Several autofocus algorithms based on the analysis of image sharpness have been proposed for microscopy applications. Since autofocus functions (AFs) are computed from several images captured at different lens positions, these algorithms are considered computationally intensive. With the aim of presenting the capabilities of dedicated hardware to speed-up the autofocus process, we discuss the implementation of four AFs using, respectively, a multicore central processing unit (CPU) architecture and a graphic processing unit (GPU) card. Throughout different experiments performed on 300 image stacks previously identified with tuberculosis bacilli, the proposed implementations have allowed for the acceleration of the computation time for some AFs up to 23 times with respect to the serial version. These results show that the optimal use of multicore CPU and GPUs can be used effectively for autofocus in real-time microscopy applications.
Collapse
|
80
|
Chattopadhyay A, Lu TP. Gene-gene interaction: the curse of dimensionality. ANNALS OF TRANSLATIONAL MEDICINE 2019; 7:813. [PMID: 32042829 DOI: 10.21037/atm.2019.12.87] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
Collapse
|
81
|
Igarashi J, Yamaura H, Yamazaki T. Large-Scale Simulation of a Layered Cortical Sheet of Spiking Network Model Using a Tile Partitioning Method. Front Neuroinform 2019; 13:71. [PMID: 31849631 PMCID: PMC6895031 DOI: 10.3389/fninf.2019.00071] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 11/12/2019] [Indexed: 11/13/2022] Open
Abstract
One of the grand challenges for computational neuroscience and high-performance computing is computer simulation of a human-scale whole brain model with spiking neurons and synaptic plasticity using supercomputers. To achieve such a simulation, the target network model must be partitioned onto a number of computational nodes, and the sub-network models are executed in parallel while communicating spike information across different nodes. However, it remains unclear how the target network model should be partitioned for efficient computing on next generation of supercomputers. Specifically, reducing the communication of spike information across compute nodes is essential, because of the relatively slower network performance than processor and memory. From the viewpoint of biological features, the cerebral cortex and cerebellum contain 99% of neurons and synapses and form layered sheet structures. Therefore, an efficient method to split the network should exploit the layered sheet structures. In this study, we indicate that a tile partitioning method leads to efficient communication. To demonstrate it, a simulation software called MONET (Millefeuille-like Organization NEural neTwork simulator) that partitions a network model as described above was developed. The MONET simulator was implemented on the Japanese flagship supercomputer K, which is composed of 82,944 computational nodes. We examined a performance of calculation, communication and memory consumption in the tile partitioning method for a cortical model with realistic anatomical and physiological parameters. The result showed that the tile partitioning method drastically reduced communication data amount by replacing network communication with DRAM access and sharing the communication data with neighboring neurons. We confirmed the scalability and efficiency of the tile partitioning method on up to 63,504 compute nodes of the K computer for the cortical model. In the companion paper by Yamaura et al., the performance for a cerebellar model was examined. These results suggest that the tile partitioning method will have advantage for a human-scale whole-brain simulation on exascale computers.
Collapse
|
82
|
Ullah E, Yosafshahi M, Hassoun S. Towards scaling elementary flux mode computation. Brief Bioinform 2019; 21:1875-1885. [PMID: 31745550 DOI: 10.1093/bib/bbz094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Revised: 07/04/2019] [Accepted: 07/05/2019] [Indexed: 01/05/2023] Open
Abstract
While elementary flux mode (EFM) analysis is now recognized as a cornerstone computational technique for cellular pathway analysis and engineering, EFM application to genome-scale models remains computationally prohibitive. This article provides a review of aspects of EFM computation that elucidates bottlenecks in scaling EFM computation. First, algorithms for computing EFMs are reviewed. Next, the impact of redundant constraints, sensitivity to constraint ordering and network compression are evaluated. Then, the advantages and limitations of recent parallelization and GPU-based efforts are highlighted. The article then reviews alternative pathway analysis approaches that aim to reduce the EFM solution space. Despite advances in EFM computation, our review concludes that continued scaling of EFM computation is necessary to apply EFM to genome-scale models. Further, our review concludes that pathway analysis methods that target specific pathway properties can provide powerful alternatives to EFM analysis.
Collapse
|
83
|
Ayres DL, Cummings MP, Baele G, Darling AE, Lewis PO, Swofford DL, Huelsenbeck JP, Lemey P, Rambaut A, Suchard MA. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Syst Biol 2019; 68:1052-1061. [PMID: 31034053 PMCID: PMC6802572 DOI: 10.1093/sysbio/syz020] [Citation(s) in RCA: 115] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 04/10/2019] [Accepted: 04/10/2019] [Indexed: 11/12/2022] Open
Abstract
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
Collapse
|
84
|
Xu Z, Wang Y, Sun N, Li Z, Hu S, Liu Q. Parallel Computing for Quantitative Blood Flow Imaging in Photoacoustic Microscopy. SENSORS 2019; 19:s19184000. [PMID: 31527505 PMCID: PMC6767147 DOI: 10.3390/s19184000] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 08/30/2019] [Accepted: 09/14/2019] [Indexed: 02/07/2023]
Abstract
Photoacoustic microscopy (PAM) is an emerging biomedical imaging technology capable of quantitative measurement of the microvascular blood flow by correlation analysis. However, the computational cost is high, limiting its applications. Here, we report a parallel computation design based on graphics processing unit (GPU) for high-speed quantification of blood flow in PAM. Two strategies were utilized to improve the computational efficiency. First, the correlation method in the algorithm was optimized to avoid redundant computation and a parallel computing structure was designed. Second, the parallel design was realized on GPU and optimized by maximizing the utilization of computing resource in GPU. The detailed timings and speedup for each calculation step were given and the MATLAB and C/C++ code versions based on CPU were presented as a comparison. Full performance test shows that a stable speedup of ~80-fold could be achieved with the same calculation accuracy and the computation time could be reduced from minutes to just several seconds with the imaging size ranging from 1 × 1 mm2 to 2 × 2 mm2. Our design accelerates PAM-based blood flow measurement and paves the way for real-time PAM imaging and processing by significantly improving the computational efficiency.
Collapse
|
85
|
Hirschfeld G, Thiele C. Cloud-based simulation studies in R - A tutorial on using doRedis with Amazon spot fleets. Stat Med 2019; 38:3947-3959. [PMID: 31049978 DOI: 10.1002/sim.8188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 03/15/2019] [Accepted: 04/08/2019] [Indexed: 11/09/2022]
Abstract
Simulation studies are helpful in testing novel statistical methods. From a computational perspective, they constitute embarrassingly parallel tasks. We describe parallelization techniques in the programming language R that can be used on Amazon's cloud-based infrastructure. After a short conceptual overview of the parallelization techniques in R, we provide a hands-on tutorial on how the doRedis package in conjunction with the Redis server can be used on Amazon Web Services, specifically running spot fleets. The tutorial proceeds in seven steps, ie, (1) starting up an EC2 instance, (2) installing a Redis server, (3) using doRedis with a local worker, (4) using doRedis with a remote worker, (5) setting up instances that automatically fetch tasks from a specific master, (6) using spot-fleets, and (7) shutting down the instances. As a basic example, we show how these techniques can be used to assess the effects of heteroscedasticity on the equal-variance t-test. Furthermore, we address several advanced issues, such as multiple conditions, cost-management, and chunking.
Collapse
|
86
|
Miri Rostami SR, Mozaffarzadeh M, Ghaffari-Miab M, Hariri A, Jokerst J. GPU-accelerated Double-stage Delay-multiply-and-sum Algorithm for Fast Photoacoustic Tomography Using LED Excitation and Linear Arrays. ULTRASONIC IMAGING 2019; 41:301-316. [PMID: 31322057 DOI: 10.1177/0161734619862488] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Double-stage delay-multiply-and-sum (DS-DMAS) is an algorithm proposed for photoacoustic image reconstruction. The DS-DMAS algorithm offers a higher contrast than conventional delay-and-sum and delay-multiply and-sum but at the expense of higher computational complexity. Here, we utilized a compute unified device architecture (CUDA) graphics processing unit (GPU) parallel computation approach to address the high complexity of the DS-DMAS for photoacoustic image reconstruction generated from a commercial light-emitting diode (LED)-based photoacoustic scanner. In comparison with a single-threaded central processing unit (CPU), the GPU approach increased speeds by nearly 140-fold for 1024 × 1024 pixel image; there was no decrease in accuracy. The proposed implementation makes it possible to reconstruct photoacoustic images with frame rates of 250, 125, and 83.3 when the images are 64 × 64, 128 × 128, and 256 × 256, respectively. Thus, DS-DMAS can be efficiently used in clinical devices when coupled with CUDA GPU parallel computation.
Collapse
|
87
|
Hishinuma K, Iiduka H. Incremental and Parallel Machine Learning Algorithms With Automated Learning Rate Adjustments. Front Robot AI 2019; 6:77. [PMID: 33501092 PMCID: PMC7805887 DOI: 10.3389/frobt.2019.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 08/08/2019] [Indexed: 11/13/2022] Open
Abstract
The existing machine learning algorithms for minimizing the convex function over a closed convex set suffer from slow convergence because their learning rates must be determined before running them. This paper proposes two machine learning algorithms incorporating the line search method, which automatically and algorithmically finds appropriate learning rates at run-time. One algorithm is based on the incremental subgradient algorithm, which sequentially and cyclically uses each of the parts of the objective function; the other is based on the parallel subgradient algorithm, which uses parts independently in parallel. These algorithms can be applied to constrained nonsmooth convex optimization problems appearing in tasks of learning support vector machines without adjusting the learning rates precisely. The proposed line search method can determine learning rates to satisfy weaker conditions than the ones used in the existing machine learning algorithms. This implies that the two algorithms are generalizations of the existing incremental and parallel subgradient algorithms for solving constrained nonsmooth convex optimization problems. We show that they generate sequences that converge to a solution of the constrained nonsmooth convex optimization problem under certain conditions. The main contribution of this paper is the provision of three kinds of experiment showing that the two algorithms can solve concrete experimental problems faster than the existing algorithms. First, we show that the proposed algorithms have performance advantages over the existing ones in solving a test problem. Second, we compare the proposed algorithms with a different algorithm Pegasos, which is designed to learn with a support vector machine efficiently, in terms of prediction accuracy, value of the objective function, and computational time. Finally, we use one of our algorithms to train a multilayer neural network and discuss its applicability to deep learning.
Collapse
|
88
|
Xia H, Huang W, Li N, Zhou J, Zhang D. PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data. SENSORS 2019; 19:s19153438. [PMID: 31387335 PMCID: PMC6696378 DOI: 10.3390/s19153438] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Revised: 07/23/2019] [Accepted: 07/27/2019] [Indexed: 11/16/2022]
Abstract
Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.
Collapse
|
89
|
Ni Y, Müller P, Diesendruck M, Williamson S, Zhu Y, Ji Y. Scalable Bayesian Nonparametric Clustering and Classification. J Comput Graph Stat 2019; 29:53-65. [PMID: 32982129 DOI: 10.1080/10618600.2019.1624366] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.
Collapse
|
90
|
Liu P, Ye S, Wang C, Zhu Z. Spark-Based Parallel Genetic Algorithm for Simulating a Solution of Optimal Deployment of an Underwater Sensor Network. SENSORS 2019; 19:s19122717. [PMID: 31212959 PMCID: PMC6630356 DOI: 10.3390/s19122717] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 05/27/2019] [Accepted: 06/13/2019] [Indexed: 11/16/2022]
Abstract
Underwater sensor networks have wide application prospects, but the large-scale sensing node deployment is severely hindered by problems like energy constraints, long delays, local disconnections, and heavy energy consumption. These problems can be solved effectively by optimizing sensing node deployment with a genetic algorithm. However, the genetic algorithm (GA) needs many iterations in solving the best location of underwater sensor deployment, which results in long running time delays and limited practical application when dealing with large-scale data. The classical parallel framework Hadoop can improve the GA running efficiency to some extent while the state-of-the-art parallel framework Spark can release much more parallel potential of GA by realizing parallel crossover, mutation, and other operations on each computing node. Giving full allowance for the working environment of the underwater sensor network and the characteristics of sensors, this paper proposes a Spark-based parallel GA to calculate the extremum of the Shubert multi-peak function, through which the optimal deployment of the underwater sensor network can be obtained. Experimental results show that while faced with a large-scale underwater sensor network, compared with single node and Hadoop framework, the Spark-based implementation not only significantly reduces the running time but also effectively avoids the problem of premature convergence because of its powerful randomness.
Collapse
|
91
|
Yingxia PU, Xinyi ZHAO, Guangqing CHI, Shuhe ZHAO, Jiechen WANG, Zhibin JIN, Junjun YIN. Design and implementation of a parallel geographically weighted k-nearest neighbor classifier. COMPUTERS & GEOSCIENCES 2019; 127:111-122. [PMID: 32581418 PMCID: PMC7314359 DOI: 10.1016/j.cageo.2019.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. Geographically weighted k-nearest neighbors (gwk-NN)-a classifier that incorporates spatial information into the traditional k-NN classifier-has demonstrated to be better at mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence relationships into spectral information is computationally intensive. To improve computing performance, this paper discusses two commonly used parallel strategies-data and task parallelism-used to parallelize the gwk-NN classifier in the model training and classification stages, and implements the parallel algorithm by calling MPI and GDAL in the C++ development environment on a standalone eight-core computer. We further investigate the potential performance of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification. The experimental results demonstrate that the parallel gwk-NN classifier can improve the efficiency of high-resolution, remotely sensed images with multiple land cover types. Specifically, data parallelism is more effective than task parallelism in both model training and classification stages because of the minor role of parallel overhead in total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies in the image classification stage, as evidenced by the two largest speedups attained under dual parallelism I (5.28×) and II (5.73×). Comparatively, dual parallelism II, in which priority is given to data decomposition, achieves the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.
Collapse
|
92
|
Abstract
We created a suite of packages to enable analysis of extremely large genomic data sets (potentially millions of individuals and millions of molecular markers) within the R environment. The package offers: a matrix-like interface for .bed files (PLINK’s binary format for genotype data), a novel class of linked arrays that allows linking data stored in multiple files to form a single array accessible from the R computing environment, methods for parallel computing capabilities that can carry out computations on very large data sets without loading the entire data into memory and a basic set of methods for statistical genetic analyses. The package is accessible through CRAN and GitHub. In this note, we describe the classes and methods implemented in each of the packages that make the suite and illustrate the use of the packages using data from the UK Biobank.
Collapse
|
93
|
CuDDI: A CUDA-Based Application for Extracting Drug-Drug Interaction Related Substance Terms from PubMed Literature. Molecules 2019; 24:molecules24061081. [PMID: 30893816 PMCID: PMC6470591 DOI: 10.3390/molecules24061081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 03/12/2019] [Accepted: 03/16/2019] [Indexed: 11/30/2022] Open
Abstract
Drug-drug interaction (DDI) is becoming a serious issue in clinical pharmacy as the use of multiple medications is more common. The PubMed database is one of the biggest literature resources for DDI studies. It contains over 150,000 journal articles related to DDI and is still expanding at a rapid pace. The extraction of DDI-related information, including compounds and proteins from PubMed, is an essential step for DDI research. In this paper, we introduce a tool, CuDDI (compute unified device architecture-based DDI searching), for identification of DDI-related terms (including compounds and proteins) from PubMed. There are three modules in this application, including the automatic retrieval of substances from PubMed, the identification of DDI-related terms, and the display of relationship of DDI-related terms. For DDI term identification, a speedup of 30–105 times was observed for the compute unified device architecture (CUDA)-based version compared with the implementation with a CPU-based Python version. CuDDI can be used to discover DDI-related terms and relationships of these terms, which has the potential to help clinicians and pharmacists better understand the mechanism of DDIs. CuDDI is available at: https://github.com/chengusf/CuDDI.
Collapse
|
94
|
Implementing a Chaotic Cryptosystem by Performing Parallel Computing on Embedded Systems with Multiprocessors. ENTROPY 2019; 21:e21030268. [PMID: 33266983 PMCID: PMC7514748 DOI: 10.3390/e21030268] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 03/04/2019] [Accepted: 03/05/2019] [Indexed: 12/05/2022]
Abstract
Profiling and parallel computing techniques in a cluster of six embedded systems with multiprocessors are introduced herein to implement a chaotic cryptosystem for digital color images. The proposed encryption method is based on stream encryption using a pseudo-random number generator with high-precision arithmetic and data processing in parallel with collective communication. The profiling and parallel computing techniques allow discovery of the optimal number of processors that are necessary to improve the efficiency of the cryptosystem. That is, the processing speed improves the time for generating chaotic sequences and execution of the encryption algorithm. In addition, the high numerical precision reduces the digital degradation in a chaotic system and increases the security levels of the cryptosystem. The security analysis confirms that the proposed cryptosystem is secure and robust against different attacks that have been widely reported in the literature. Accordingly, we highlight that the proposed encryption method is potentially feasible to be implemented in practical applications, such as modern telecommunication devices employing multiprocessors, e.g., smart phones, tablets, and in any embedded system with multi-core hardware.
Collapse
|
95
|
Castro GT, Zárate LE, Nobre CN, Freitas HC. A Fast Parallel K-Modes Algorithm for Clustering Nucleotide Sequences to Predict Translation Initiation Sites. J Comput Biol 2019; 26:442-456. [PMID: 30785342 DOI: 10.1089/cmb.2018.0245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Predicting the location of the translation initiation sites (TIS) is an important problem of molecular biology. In this field, the computational cost for balancing non-TIS sequences is substantial and demands high-performance computing. In this article, we present an optimized version of the K-modes algorithm to cluster TIS sequences and a comparison with the standard K-means clustering. The adapted algorithm uses simple instructions and fewer computational resources to deliver a significant speedup without compromising the sequence clustering results. We also implemented two optimized parallel versions of the algorithm, one for graphics processing units (GPUs) and the other one for general-purpose multicore processors. In our experiments, the GPU K-modes's performance was up to 203 times faster than the respective sequential version for processing Arabidopsis thaliana sequence.
Collapse
|
96
|
Gao J, Sun Y, Zhang B, Chen Z, Gao L, Zhang W. Multi-GPU Based Parallel Design of the Ant Colony Optimization Algorithm for Endmember Extraction from Hyperspectral Images. SENSORS (BASEL, SWITZERLAND) 2019; 19:E598. [PMID: 30708972 PMCID: PMC6387146 DOI: 10.3390/s19030598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 01/25/2019] [Accepted: 01/26/2019] [Indexed: 11/21/2022]
Abstract
Spectral unmixing is a vital procedure in hyperspectral remote sensing image exploitation. The linear mixture model has been widely utilized to unmix hyperspectral images by extracting a set of pure spectral signatures, called endmembers in hyperspectral jargon, and estimating their respective fractional abundances in each pixel of the scene. Many algorithms have been proposed to extract endmembers automatically, which is a critical step in the spectral unmixing chain. In recent years, the ant colony optimization (ACO) algorithm has been developed for endmember extraction from hyperspectral data, which was regarded as a combinatorial optimization problem. Although the ACO for endmember extraction (ACOEE) can acquire accurate endmember results, its high computational complexity has limited its application in the hyperspectral data analysis. The GPUs parallel computing technique can be utilized to improve the computational performance of ACOEE, but the architecture of GPUs determines that the ACOEE should be redesigned to take full advantage of computing resources on GPUs. In this paper, a multiple sub-ant-colony-based parallel design of ACOEE was proposed, in which an innovative mechanism of local pheromone for sub-ant-colonies is utilized to enable ACOEE to be preferably executed on the multi-GPU system. The proposed method can avoid much synchronization among different GPUs to affect the computational performance improvement. The experiments on two real hyperspectral datasets demonstrated that the computational performance of ACOEE significantly benefited from the proposed methods.
Collapse
|
97
|
Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment. Molecules 2019; 24:molecules24010179. [PMID: 30621295 PMCID: PMC6337464 DOI: 10.3390/molecules24010179] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/29/2018] [Accepted: 01/01/2019] [Indexed: 11/16/2022] Open
Abstract
Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.
Collapse
|
98
|
Harms RL, Roebroeck A. Robust and Fast Markov Chain Monte Carlo Sampling of Diffusion MRI Microstructure Models. Front Neuroinform 2018; 12:97. [PMID: 30618702 PMCID: PMC6305549 DOI: 10.3389/fninf.2018.00097] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 11/28/2018] [Indexed: 11/29/2022] Open
Abstract
In diffusion MRI analysis, advances in biophysical multi-compartment modeling have gained popularity over the conventional Diffusion Tensor Imaging (DTI), because they can obtain a greater specificity in relating the dMRI signal to underlying cellular microstructure. Biophysical multi-compartment models require a parameter estimation, typically performed using either the Maximum Likelihood Estimation (MLE) or the Markov Chain Monte Carlo (MCMC) sampling. Whereas, the MLE provides only a point estimate of the fitted model parameters, the MCMC recovers the entire posterior distribution of the model parameters given in the data, providing additional information such as parameter uncertainty and correlations. MCMC sampling is currently not routinely applied in dMRI microstructure modeling, as it requires adjustment and tuning, specific to each model, particularly in the choice of proposal distributions, burn-in length, thinning, and the number of samples to store. In addition, sampling often takes at least an order of magnitude, more time than non-linear optimization. Here we investigate the performance of the MCMC algorithm variations over multiple popular diffusion microstructure models, to examine whether a single, well performing variation could be applied efficiently and robustly to many models. Using an efficient GPU-based implementation, we showed that run times can be removed as a prohibitive constraint for the sampling of diffusion multi-compartment models. Using this implementation, we investigated the effectiveness of different adaptive MCMC algorithms, burn-in, initialization, and thinning. Finally we applied the theory of the Effective Sample Size, to the diffusion multi-compartment models, as a way of determining a relatively general target for the number of samples needed to characterize parameter distributions for different models and data sets. We conclude that adaptive Metropolis methods increase MCMC performance and select the Adaptive Metropolis-Within-Gibbs (AMWG) algorithm as the primary method. We furthermore advise to initialize the sampling with an MLE point estimate, in which case 100 to 200 samples are sufficient as a burn-in. Finally, we advise against thinning in most use-cases and as a relatively general target for the number of samples, we recommend a multivariate Effective Sample Size of 2,200.
Collapse
|
99
|
Knight JC, Nowotny T. GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model. Front Neurosci 2018; 12:941. [PMID: 30618570 PMCID: PMC6299048 DOI: 10.3389/fnins.2018.00941] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 11/29/2018] [Indexed: 11/15/2022] Open
Abstract
While neuromorphic systems may be the ultimate platform for deploying spiking neural networks (SNNs), their distributed nature and optimization for specific types of models makes them unwieldy tools for developing them. Instead, SNN models tend to be developed and simulated on computers or clusters of computers with standard von Neumann CPU architectures. Over the last decade, as well as becoming a common fixture in many workstations, NVIDIA GPU accelerators have entered the High Performance Computing field and are now used in 50 % of the Top 10 super computing sites worldwide. In this paper we use our GeNN code generator to re-implement two neo-cortex-inspired, circuit-scale, point neuron network models on GPU hardware. We verify the correctness of our GPU simulations against prior results obtained with NEST running on traditional HPC hardware and compare the performance with respect to speed and energy consumption against published data from CPU-based HPC and neuromorphic hardware. A full-scale model of a cortical column can be simulated at speeds approaching 0.5× real-time using a single NVIDIA Tesla V100 accelerator-faster than is currently possible using a CPU based cluster or the SpiNNaker neuromorphic system. In addition, we find that, across a range of GPU systems, the energy to solution as well as the energy per synaptic event of the microcircuit simulation is as much as 14× lower than either on SpiNNaker or in CPU-based simulations. Besides performance in terms of speed and energy consumption of the simulation, efficient initialization of models is also a crucial concern, particularly in a research context where repeated runs and parameter-space exploration are required. Therefore, we also introduce in this paper some of the novel parallel initialization methods implemented in the latest version of GeNN and demonstrate how they can enable further speed and energy advantages.
Collapse
|
100
|
Face Recognition Using the SR-CNN Model. SENSORS 2018; 18:s18124237. [PMID: 30513898 PMCID: PMC6308568 DOI: 10.3390/s18124237] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 11/27/2018] [Accepted: 11/28/2018] [Indexed: 12/11/2022]
Abstract
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents a new robust face-matching method named SR-CNN, combining the rotation-invariant texture feature (RITF) vector, the scale-invariant feature transform (SIFT) vector, and the convolution neural network (CNN). Furthermore, a graphics processing unit (GPU) is used to parallelize the model for an optimal computational performance. The Labeled Faces in the Wild (LFW) database and self-collection face database were selected for experiments. It turns out that the true positive rate is improved by 10.97–13.24% and the acceleration ratio (the ratio between central processing unit (CPU) operation time and GPU time) is 5–6 times for the LFW face database. For the self-collection, the true positive rate increased by 12.65–15.31%, and the acceleration ratio improved by a factor of 6–7.
Collapse
|