26
|
Knight JC, Nowotny T. GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model. Front Neurosci 2018; 12:941. [PMID: 30618570 PMCID: PMC6299048 DOI: 10.3389/fnins.2018.00941] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 11/29/2018] [Indexed: 11/15/2022] Open
Abstract
While neuromorphic systems may be the ultimate platform for deploying spiking neural networks (SNNs), their distributed nature and optimization for specific types of models makes them unwieldy tools for developing them. Instead, SNN models tend to be developed and simulated on computers or clusters of computers with standard von Neumann CPU architectures. Over the last decade, as well as becoming a common fixture in many workstations, NVIDIA GPU accelerators have entered the High Performance Computing field and are now used in 50 % of the Top 10 super computing sites worldwide. In this paper we use our GeNN code generator to re-implement two neo-cortex-inspired, circuit-scale, point neuron network models on GPU hardware. We verify the correctness of our GPU simulations against prior results obtained with NEST running on traditional HPC hardware and compare the performance with respect to speed and energy consumption against published data from CPU-based HPC and neuromorphic hardware. A full-scale model of a cortical column can be simulated at speeds approaching 0.5× real-time using a single NVIDIA Tesla V100 accelerator-faster than is currently possible using a CPU based cluster or the SpiNNaker neuromorphic system. In addition, we find that, across a range of GPU systems, the energy to solution as well as the energy per synaptic event of the microcircuit simulation is as much as 14× lower than either on SpiNNaker or in CPU-based simulations. Besides performance in terms of speed and energy consumption of the simulation, efficient initialization of models is also a crucial concern, particularly in a research context where repeated runs and parameter-space exploration are required. Therefore, we also introduce in this paper some of the novel parallel initialization methods implemented in the latest version of GeNN and demonstrate how they can enable further speed and energy advantages.
Collapse
|
research-article |
7 |
26 |
27
|
Ippen T, Eppler JM, Plesser HE, Diesmann M. Constructing Neuronal Network Models in Massively Parallel Environments. Front Neuroinform 2017; 11:30. [PMID: 28559808 PMCID: PMC5432669 DOI: 10.3389/fninf.2017.00030] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 04/04/2017] [Indexed: 11/13/2022] Open
Abstract
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Collapse
|
Journal Article |
8 |
26 |
28
|
Sachetto Oliveira R, Martins Rocha B, Burgarelli D, Meira W, Constantinides C, Weber Dos Santos R. Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2018; 34:e2913. [PMID: 28636811 DOI: 10.1002/cnm.2913] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 06/09/2017] [Accepted: 06/16/2017] [Indexed: 05/23/2023]
Abstract
The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy.
Collapse
|
|
7 |
25 |
29
|
Abstract
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.
Collapse
|
research-article |
15 |
24 |
30
|
Horlacher O, Lisacek F, Müller M. Mining Large Scale Tandem Mass Spectrometry Data for Protein Modifications Using Spectral Libraries. J Proteome Res 2015; 15:721-31. [PMID: 26653734 DOI: 10.1021/acs.jproteome.5b00877] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Experimental improvements in post-translational modification (PTM) detection by tandem mass spectrometry (MS/MS) has allowed the identification of vast numbers of PTMs. Open modification searches (OMSs) of MS/MS data, which do not require prior knowledge of the modifications present in the sample, further increased the diversity of detected PTMs. Despite much effort, there is still a lack of functional annotation of PTMs. One possibility to narrow the annotation gap is to mine MS/MS data deposited in public repositories and to correlate the PTM presence with biological meta-information attached to the data. Since the data volume can be quite substantial and contain tens of millions of MS/MS spectra, the data mining tools must be able to cope with big data. Here, we present two tools, Liberator and MzMod, which are built using the MzJava class library and the Apache Spark large scale computing framework. Liberator builds large MS/MS spectrum libraries, and MzMod searches them in an OMS mode. We applied these tools to a recently published set of 25 million spectra from 30 human tissues and present tissue specific PTMs. We also compared the results to the ones obtained with the OMS tool MODa and the search engine X!Tandem.
Collapse
|
Journal Article |
10 |
21 |
31
|
Face Recognition Using the SR-CNN Model. SENSORS 2018; 18:s18124237. [PMID: 30513898 PMCID: PMC6308568 DOI: 10.3390/s18124237] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 11/27/2018] [Accepted: 11/28/2018] [Indexed: 12/11/2022]
Abstract
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents a new robust face-matching method named SR-CNN, combining the rotation-invariant texture feature (RITF) vector, the scale-invariant feature transform (SIFT) vector, and the convolution neural network (CNN). Furthermore, a graphics processing unit (GPU) is used to parallelize the model for an optimal computational performance. The Labeled Faces in the Wild (LFW) database and self-collection face database were selected for experiments. It turns out that the true positive rate is improved by 10.97–13.24% and the acceleration ratio (the ratio between central processing unit (CPU) operation time and GPU time) is 5–6 times for the LFW face database. For the self-collection, the true positive rate increased by 12.65–15.31%, and the acceleration ratio improved by a factor of 6–7.
Collapse
|
Journal Article |
7 |
21 |
32
|
van den Bedem H, Wolf G, Xu Q, Deacon AM. Distributed structure determination at the JCSG. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2011; 67:368-75. [PMID: 21460455 PMCID: PMC3069752 DOI: 10.1107/s0907444910039934] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2010] [Accepted: 10/06/2010] [Indexed: 11/10/2022]
Abstract
The Joint Center for Structural Genomics (JCSG), one of four large-scale structure-determination centers funded by the US Protein Structure Initiative (PSI) through the National Institute for General Medical Sciences, has been operating an automated distributed structure-solution pipeline, Xsolve, for well over half a decade. During PSI-2, Xsolve solved, traced and partially refined 90% of the JCSG's nearly 770 MAD/SAD structures at an average resolution of about 2 Å without human intervention. Xsolve executes many well established publicly available crystallography software programs in parallel on a commodity Linux cluster, resulting in multiple traces for any given target. Additional software programs have been developed and integrated into Xsolve to further minimize human effort in structure refinement. Consensus-Modeler exploits complementarities in traces from Xsolve to compute a single optimal model for manual refinement. Xpleo is a powerful robotics-inspired algorithm to build missing fragments and qFit automatically identifies and fits alternate conformations.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
21 |
33
|
Pothapragada S, Zhang P, Sheriff J, Livelli M, Slepian MJ, Deng Y, Bluestein D. A phenomenological particle-based platelet model for simulating filopodia formation during early activation. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2015; 31:e02702. [PMID: 25532469 PMCID: PMC4509790 DOI: 10.1002/cnm.2702] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 10/29/2014] [Accepted: 12/11/2014] [Indexed: 05/13/2023]
Abstract
We developed a phenomenological three-dimensional platelet model to characterize the filopodia formation observed during early stage platelet activation. Departing from continuum mechanics based approaches, this coarse-grained molecular dynamics (CGMD) particle-based model can deform to emulate the complex shape change and filopodia formation that platelets undergo during activation. The platelet peripheral zone is modeled with a two-layer homogeneous elastic structure represented by spring-connected particles. The structural zone is represented by a cytoskeletal assembly comprising of a filamentous core and filament bundles supporting the platelet's discoid shape, also modeled by spring-connected particles. The interior organelle zone is modeled by homogeneous cytoplasm particles that facilitate the platelet deformation. Nonbonded interactions among the discrete particles of the membrane, the cytoskeletal assembly, and the cytoplasm are described using the Lennard-Jones potential with empirical constants. By exploring the parameter space of this CGMD model, we have successfully simulated the dynamics of varied filopodia formations. Comparative analyses of length and thickness of filopodia show that our numerical simulations are in agreement with experimental measurements of flow-induced activated platelets. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
21 |
34
|
Bill J, Schuch K, Brüderle D, Schemmel J, Maass W, Meier K. Compensating Inhomogeneities of Neuromorphic VLSI Devices Via Short-Term Synaptic Plasticity. Front Comput Neurosci 2010; 4:129. [PMID: 21031027 PMCID: PMC2965017 DOI: 10.3389/fncom.2010.00129] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 08/11/2010] [Indexed: 11/17/2022] Open
Abstract
Recent developments in neuromorphic hardware engineering make mixed-signal VLSI neural network models promising candidates for neuroscientific research tools and massively parallel computing devices, especially for tasks which exhaust the computing power of software simulations. Still, like all analog hardware systems, neuromorphic models suffer from a constricted configurability and production-related fluctuations of device characteristics. Since also future systems, involving ever-smaller structures, will inevitably exhibit such inhomogeneities on the unit level, self-regulation properties become a crucial requirement for their successful operation. By applying a cortically inspired self-adjusting network architecture, we show that the activity of generic spiking neural networks emulated on a neuromorphic hardware system can be kept within a biologically realistic firing regime and gain a remarkable robustness against transistor-level variations. As a first approach of this kind in engineering practice, the short-term synaptic depression and facilitation mechanisms implemented within an analog VLSI model of I&F neurons are functionally utilized for the purpose of network level stabilization. We present experimental data acquired both from the hardware model and from comparative software simulations which prove the applicability of the employed paradigm to neuromorphic VLSI devices.
Collapse
|
Journal Article |
15 |
21 |
35
|
Implementing a Chaotic Cryptosystem by Performing Parallel Computing on Embedded Systems with Multiprocessors. ENTROPY 2019; 21:e21030268. [PMID: 33266983 PMCID: PMC7514748 DOI: 10.3390/e21030268] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 03/04/2019] [Accepted: 03/05/2019] [Indexed: 12/05/2022]
Abstract
Profiling and parallel computing techniques in a cluster of six embedded systems with multiprocessors are introduced herein to implement a chaotic cryptosystem for digital color images. The proposed encryption method is based on stream encryption using a pseudo-random number generator with high-precision arithmetic and data processing in parallel with collective communication. The profiling and parallel computing techniques allow discovery of the optimal number of processors that are necessary to improve the efficiency of the cryptosystem. That is, the processing speed improves the time for generating chaotic sequences and execution of the encryption algorithm. In addition, the high numerical precision reduces the digital degradation in a chaotic system and increases the security levels of the cryptosystem. The security analysis confirms that the proposed cryptosystem is secure and robust against different attacks that have been widely reported in the literature. Accordingly, we highlight that the proposed encryption method is potentially feasible to be implemented in practical applications, such as modern telecommunication devices employing multiprocessors, e.g., smart phones, tablets, and in any embedded system with multi-core hardware.
Collapse
|
Journal Article |
6 |
20 |
36
|
Harms RL, Roebroeck A. Robust and Fast Markov Chain Monte Carlo Sampling of Diffusion MRI Microstructure Models. Front Neuroinform 2018; 12:97. [PMID: 30618702 PMCID: PMC6305549 DOI: 10.3389/fninf.2018.00097] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 11/28/2018] [Indexed: 11/29/2022] Open
Abstract
In diffusion MRI analysis, advances in biophysical multi-compartment modeling have gained popularity over the conventional Diffusion Tensor Imaging (DTI), because they can obtain a greater specificity in relating the dMRI signal to underlying cellular microstructure. Biophysical multi-compartment models require a parameter estimation, typically performed using either the Maximum Likelihood Estimation (MLE) or the Markov Chain Monte Carlo (MCMC) sampling. Whereas, the MLE provides only a point estimate of the fitted model parameters, the MCMC recovers the entire posterior distribution of the model parameters given in the data, providing additional information such as parameter uncertainty and correlations. MCMC sampling is currently not routinely applied in dMRI microstructure modeling, as it requires adjustment and tuning, specific to each model, particularly in the choice of proposal distributions, burn-in length, thinning, and the number of samples to store. In addition, sampling often takes at least an order of magnitude, more time than non-linear optimization. Here we investigate the performance of the MCMC algorithm variations over multiple popular diffusion microstructure models, to examine whether a single, well performing variation could be applied efficiently and robustly to many models. Using an efficient GPU-based implementation, we showed that run times can be removed as a prohibitive constraint for the sampling of diffusion multi-compartment models. Using this implementation, we investigated the effectiveness of different adaptive MCMC algorithms, burn-in, initialization, and thinning. Finally we applied the theory of the Effective Sample Size, to the diffusion multi-compartment models, as a way of determining a relatively general target for the number of samples needed to characterize parameter distributions for different models and data sets. We conclude that adaptive Metropolis methods increase MCMC performance and select the Adaptive Metropolis-Within-Gibbs (AMWG) algorithm as the primary method. We furthermore advise to initialize the sampling with an MLE point estimate, in which case 100 to 200 samples are sufficient as a burn-in. Finally, we advise against thinning in most use-cases and as a relatively general target for the number of samples, we recommend a multivariate Effective Sample Size of 2,200.
Collapse
|
research-article |
7 |
19 |
37
|
Li C, Petukh M, Li L, Alexov E. Continuous development of schemes for parallel computing of the electrostatics in biological systems: implementation in DelPhi. J Comput Chem 2013; 34:1949-60. [PMID: 23733490 PMCID: PMC3707979 DOI: 10.1002/jcc.23340] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 04/03/2013] [Accepted: 05/02/2013] [Indexed: 11/07/2022]
Abstract
Due to the enormous importance of electrostatics in molecular biology, calculating the electrostatic potential and corresponding energies has become a standard computational approach for the study of biomolecules and nano-objects immersed in water and salt phase or other media. However, the electrostatics of large macromolecules and macromolecular complexes, including nano-objects, may not be obtainable via explicit methods and even the standard continuum electrostatics methods may not be applicable due to high computational time and memory requirements. Here, we report further development of the parallelization scheme reported in our previous work (Li, et al., J. Comput. Chem. 2012, 33, 1960) to include parallelization of the molecular surface and energy calculations components of the algorithm. The parallelization scheme utilizes different approaches such as space domain parallelization, algorithmic parallelization, multithreading, and task scheduling, depending on the quantity being calculated. This allows for efficient use of the computing resources of the corresponding computer cluster. The parallelization scheme is implemented in the popular software DelPhi and results in speedup of several folds. As a demonstration of the efficiency and capability of this methodology, the electrostatic potential, and electric field distributions are calculated for the bovine mitochondrial supercomplex illustrating their complex topology, which cannot be obtained by modeling the supercomplex components alone.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
19 |
38
|
Sigman M, Etchemendy P, Slezak DF, Cecchi GA. Response time distributions in rapid chess: a large-scale decision making experiment. Front Neurosci 2010; 4:60. [PMID: 21031032 PMCID: PMC2965049 DOI: 10.3389/fnins.2010.00060] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 07/23/2010] [Indexed: 11/25/2022] Open
Abstract
Rapid chess provides an unparalleled laboratory to understand decision making in a natural environment. In a chess game, players choose consecutively around 40 moves in a finite time budget. The goodness of each choice can be determined quantitatively since current chess algorithms estimate precisely the value of a position. Web-based chess produces vast amounts of data, millions of decisions per day, incommensurable with traditional psychological experiments. We generated a database of response times (RTs) and position value in rapid chess games. We measured robust emergent statistical observables: (1) RT distributions are long-tailed and show qualitatively distinct forms at different stages of the game, (2) RT of successive moves are highly correlated both for intra- and inter-player moves. These findings have theoretical implications since they deny two basic assumptions of sequential decision making algorithms: RTs are not stationary and can not be generated by a state-function. Our results also have practical implications. First, we characterized the capacity of blunders and score fluctuations to predict a player strength, which is yet an open problem in chess softwares. Second, we show that the winning likelihood can be reliably estimated from a weighted combination of remaining times and position evaluation.
Collapse
|
Journal Article |
15 |
19 |
39
|
Goudie RJB, Turner RM, De Angelis D, Thomas A. MultiBUGS: A Parallel Implementation of the BUGS Modelling Framework for Faster Bayesian Inference. J Stat Softw 2020; 95. [PMID: 33071678 DOI: 10.18637/jss.v095.i07] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
MultiBUGS is a new version of the general-purpose Bayesian modelling software BUGS that implements a generic algorithm for parallelising Markov chain Monte Carlo (MCMC) algorithms to speed up posterior inference of Bayesian models. The algorithm parallelises evaluation of the product-form likelihoods formed when a parameter has many children in the directed acyclic graph (DAG) representation; and parallelises sampling of conditionally-independent sets of parameters. A heuristic algorithm is used to decide which approach to use for each parameter and to apportion computation across computational cores. This enables MultiBUGS to automatically parallelise the broad range of statistical models that can be fitted using BUGS-language software, making the dramatic speed-ups of modern multi-core computing accessible to applied statisticians, without requiring any experience of parallel programming. We demonstrate the use of MultiBUGS on simulated data designed to mimic a hierarchical e-health linked-data study of methadone prescriptions including 425,112 observations and 20,426 random effects. Posterior inference for the e-health model takes several hours in existing software, but MultiBUGS can perform inference in only 28 minutes using 48 computational cores.
Collapse
|
Journal Article |
5 |
18 |
40
|
Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms. IMETA 2022; 1:e1. [PMID: 38867729 PMCID: PMC10989749 DOI: 10.1002/imt2.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 06/14/2024]
Abstract
Massive microbiome sequencing data has been generated, which elucidates associations between microbes and their environmental phenotypes such as host health or ecosystem status. Outstanding bioinformatic tools are the basis to decipher the biological information hidden under microbiome data. However, most approaches placed difficulties on the accessibility to nonprofessional users. On the other side, the computing throughput has become a significant bottleneck of many analytical pipelines in processing large-scale datasets. In this study, we introduce Parallel-Meta Suite (PMS), an interactive software package for fast and comprehensive microbiome data analysis, visualization, and interpretation. It covers a wide array of functions for data preprocessing, statistics, visualization by state-of-the-art algorithms in a user-friendly graphical interface, which is accessible to diverse users. To meet the rapidly increasing computational demands, the entire procedure of PMS has been optimized by a parallel computing scheme, enabling the rapid processing of thousands of samples. PMS is compatible with multiple platforms, and an installer has been integrated for full-automatic installation.
Collapse
|
research-article |
3 |
18 |
41
|
Miri Rostami SR, Mozaffarzadeh M, Ghaffari-Miab M, Hariri A, Jokerst J. GPU-accelerated Double-stage Delay-multiply-and-sum Algorithm for Fast Photoacoustic Tomography Using LED Excitation and Linear Arrays. ULTRASONIC IMAGING 2019; 41:301-316. [PMID: 31322057 DOI: 10.1177/0161734619862488] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Double-stage delay-multiply-and-sum (DS-DMAS) is an algorithm proposed for photoacoustic image reconstruction. The DS-DMAS algorithm offers a higher contrast than conventional delay-and-sum and delay-multiply and-sum but at the expense of higher computational complexity. Here, we utilized a compute unified device architecture (CUDA) graphics processing unit (GPU) parallel computation approach to address the high complexity of the DS-DMAS for photoacoustic image reconstruction generated from a commercial light-emitting diode (LED)-based photoacoustic scanner. In comparison with a single-threaded central processing unit (CPU), the GPU approach increased speeds by nearly 140-fold for 1024 × 1024 pixel image; there was no decrease in accuracy. The proposed implementation makes it possible to reconstruct photoacoustic images with frame rates of 250, 125, and 83.3 when the images are 64 × 64, 128 × 128, and 256 × 256, respectively. Thus, DS-DMAS can be efficiently used in clinical devices when coupled with CUDA GPU parallel computation.
Collapse
|
|
6 |
18 |
42
|
Doulgerakis M, Eggebrecht AT, Wojtkiewicz S, Culver JP, Dehghani H. Toward real-time diffuse optical tomography: accelerating light propagation modeling employing parallel computing on GPU and CPU. JOURNAL OF BIOMEDICAL OPTICS 2017; 22:1-11. [PMID: 29197176 PMCID: PMC5709934 DOI: 10.1117/1.jbo.22.12.125001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/06/2017] [Indexed: 05/18/2023]
Abstract
Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ∼600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ∼0.25 s/excitation source.
Collapse
|
research-article |
8 |
18 |
43
|
Hahne J, Helias M, Kunkel S, Igarashi J, Bolten M, Frommer A, Diesmann M. A unified framework for spiking and gap-junction interactions in distributed neuronal network simulations. Front Neuroinform 2015; 9:22. [PMID: 26441628 PMCID: PMC4563270 DOI: 10.3389/fninf.2015.00022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 08/20/2015] [Indexed: 11/30/2022] Open
Abstract
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology.
Collapse
|
Journal Article |
10 |
17 |
44
|
Knight JC, Komissarov A, Nowotny T. PyGeNN: A Python Library for GPU-Enhanced Neural Networks. Front Neuroinform 2021; 15:659005. [PMID: 33967731 PMCID: PMC8100330 DOI: 10.3389/fninf.2021.659005] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 03/15/2021] [Indexed: 11/23/2022] Open
Abstract
More than half of the Top 10 supercomputing sites worldwide use GPU accelerators and they are becoming ubiquitous in workstations and edge computing devices. GeNN is a C++ library for generating efficient spiking neural network simulation code for GPUs. However, until now, the full flexibility of GeNN could only be harnessed by writing model descriptions and simulation code in C++. Here we present PyGeNN, a Python package which exposes all of GeNN's functionality to Python with minimal overhead. This provides an alternative, arguably more user-friendly, way of using GeNN and allows modelers to use GeNN within the growing Python-based machine learning and computational neuroscience ecosystems. In addition, we demonstrate that, in both Python and C++ GeNN simulations, the overheads of recording spiking data can strongly affect runtimes and show how a new spike recording system can reduce these overheads by up to 10×. Using the new recording system, we demonstrate that by using PyGeNN on a modern GPU, we can simulate a full-scale model of a cortical column faster even than real-time neuromorphic systems. Finally, we show that long simulations of a smaller model with complex stimuli and a custom three-factor learning rule defined in PyGeNN can be simulated almost two orders of magnitude faster than real-time.
Collapse
|
research-article |
4 |
17 |
45
|
Li H, Fan L, Zhuo J, Wang J, Zhang Y, Yang Z, Jiang T. ATPP: A Pipeline for Automatic Tractography-Based Brain Parcellation. Front Neuroinform 2017; 11:35. [PMID: 28611620 PMCID: PMC5447055 DOI: 10.3389/fninf.2017.00035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 05/15/2017] [Indexed: 12/18/2022] Open
Abstract
There is a longstanding effort to parcellate brain into areas based on micro-structural, macro-structural, or connectional features, forming various brain atlases. Among them, connectivity-based parcellation gains much emphasis, especially with the considerable progress of multimodal magnetic resonance imaging in the past two decades. The Brainnetome Atlas published recently is such an atlas that follows the framework of connectivity-based parcellation. However, in the construction of the atlas, the deluge of high resolution multimodal MRI data and time-consuming computation poses challenges and there is still short of publically available tools dedicated to parcellation. In this paper, we present an integrated open source pipeline (https://www.nitrc.org/projects/atpp), named Automatic Tractography-based Parcellation Pipeline (ATPP) to realize the framework of parcellation with automatic processing and massive parallel computing. ATPP is developed to have a powerful and flexible command line version, taking multiple regions of interest as input, as well as a user-friendly graphical user interface version for parcellating single region of interest. We demonstrate the two versions by parcellating two brain regions, left precentral gyrus and middle frontal gyrus, on two independent datasets. In addition, ATPP has been successfully utilized and fully validated in a variety of brain regions and the human Brainnetome Atlas, showing the capacity to greatly facilitate brain parcellation.
Collapse
|
Journal Article |
8 |
16 |
46
|
Abstract
We created a suite of packages to enable analysis of extremely large genomic data sets (potentially millions of individuals and millions of molecular markers) within the R environment. The package offers: a matrix-like interface for .bed files (PLINK’s binary format for genotype data), a novel class of linked arrays that allows linking data stored in multiple files to form a single array accessible from the R computing environment, methods for parallel computing capabilities that can carry out computations on very large data sets without loading the entire data into memory and a basic set of methods for statistical genetic analyses. The package is accessible through CRAN and GitHub. In this note, we describe the classes and methods implemented in each of the packages that make the suite and illustrate the use of the packages using data from the UK Biobank.
Collapse
|
Research Support, Non-U.S. Gov't |
6 |
16 |
47
|
Gounley J, Draeger EW, Randles A. Numerical simulation of a compound capsule in a constricted microchannel. ACTA ACUST UNITED AC 2017; 108:175-184. [PMID: 28831291 DOI: 10.1016/j.procs.2017.05.209] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Simulations of the passage of eukaryotic cells through a constricted channel aid in studying the properties of cancer cells and their transport in the bloodstream. Compound capsules, which explicitly model the outer cell membrane and nuclear lamina, have the potential to improve computational model fidelity. However, general simulations of compound capsules transiting a constricted microchannel have not been conducted and the influence of the compound capsule model on computational performance is not well known. In this study, we extend a parallel hemodynamics application to simulate the fluid-structure interaction between compound capsules and fluid. With this framework, we compare the deformation of simple and compound capsules in constricted microchannels, and explore how deformation depends on the capillary number and on the volume fraction of the inner membrane. The computational framework's parallel performance in this setting is evaluated and future development lessons are discussed.
Collapse
|
Journal Article |
8 |
16 |
48
|
Samsi S, Krishnamurthy AK, Gurcan MN. An Efficient Computational Framework for the Analysis of Whole Slide Images: Application to Follicular Lymphoma Immunohistochemistry. JOURNAL OF COMPUTATIONAL SCIENCE 2012; 3:269-279. [PMID: 22962572 PMCID: PMC3432990 DOI: 10.1016/j.jocs.2012.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist's disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach.
Collapse
|
research-article |
13 |
15 |
49
|
Guerrero-Araya E, Muñoz M, Rodríguez C, Paredes-Sabja D. FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies. Bioinform Biol Insights 2021; 15:11779322211059238. [PMID: 34866905 PMCID: PMC8637782 DOI: 10.1177/11779322211059238] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/19/2021] [Indexed: 11/21/2022] Open
Abstract
Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the
intra-species level for epidemiologic and evolutionary purposes. It operates by
assigning a sequence type (ST) identifier to each specimen, based on a
combination of alleles of multiple housekeeping genes included in a defined
scheme. The use of MLST has multiplied due to the availability of large numbers
of genomic sequences and epidemiologic data in public repositories. However,
data processing speed has become problematic due to the massive size of modern
datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST
searches using BLASTn and a divide-and-conquer approach that processes each
genome assembly in parallel. The output offered by FastMLST includes a table
with the ST, allelic profile, and clonal complex or clade (when available),
detected for a query, as well as a multi-FASTA file or a series of FASTA files
with the concatenated or single allele sequences detected, respectively.
FastMLST was validated with 91 different species, with a wide range of
guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a
speed test was performed on 3 datasets with varying genome sizes. Compared with
other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes
advantage of multiple processors to simultaneously type up to 28 000 genomes in
less than 10 minutes, reducing processing times by at least 3-fold with 100%
concordance to PubMLST, if contaminated genomes are excluded from the analysis.
The source code, installation instructions, and documentation of FastMLST are
available at https://github.com/EnzoAndree/FastMLST
Collapse
|
|
4 |
15 |
50
|
Liu M, Zhao F, Jiang X, Zhang H, Zhou H. Parallel Binary Image Cryptosystem Via Spiking Neural Networks Variants. Int J Neural Syst 2021; 32:2150014. [PMID: 33637028 DOI: 10.1142/s0129065721500143] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the inefficiency of multiple binary images encryption, a parallel binary image encryption framework based on the typical variants of spiking neural networks, spiking neural P (SNP) systems is proposed in this paper. More specifically, the two basic units in the proposed image cryptosystem, the permutation unit and the diffusion unit, are designed through SNP systems with multiple channels and polarizations (SNP-MCP systems), and SNP systems with astrocyte-like control (SNP-ALC systems), respectively. Different from the serial computing of the traditional image permutation/diffusion unit, SNP-MCP-based permutation/SNP-ALC-based diffusion unit can realize parallel computing through the parallel use of rules inside the neurons. Theoretical analysis results confirm the high efficiency of the binary image proposed cryptosystem. Security analysis experiments demonstrate the security of the proposed cryptosystem.
Collapse
|
Journal Article |
4 |
14 |