126
|
Parallel computation with molecular-motor-propelled agents in nanofabricated networks. Proc Natl Acad Sci U S A 2016; 113:2591-6. [PMID: 26903637 DOI: 10.1073/pnas.1510825113] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Collapse
|
127
|
Horlacher O, Lisacek F, Müller M. Mining Large Scale Tandem Mass Spectrometry Data for Protein Modifications Using Spectral Libraries. J Proteome Res 2015; 15:721-31. [PMID: 26653734 DOI: 10.1021/acs.jproteome.5b00877] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Experimental improvements in post-translational modification (PTM) detection by tandem mass spectrometry (MS/MS) has allowed the identification of vast numbers of PTMs. Open modification searches (OMSs) of MS/MS data, which do not require prior knowledge of the modifications present in the sample, further increased the diversity of detected PTMs. Despite much effort, there is still a lack of functional annotation of PTMs. One possibility to narrow the annotation gap is to mine MS/MS data deposited in public repositories and to correlate the PTM presence with biological meta-information attached to the data. Since the data volume can be quite substantial and contain tens of millions of MS/MS spectra, the data mining tools must be able to cope with big data. Here, we present two tools, Liberator and MzMod, which are built using the MzJava class library and the Apache Spark large scale computing framework. Liberator builds large MS/MS spectrum libraries, and MzMod searches them in an OMS mode. We applied these tools to a recently published set of 25 million spectra from 30 human tissues and present tissue specific PTMs. We also compared the results to the ones obtained with the OMS tool MODa and the search engine X!Tandem.
Collapse
|
128
|
Ocaña K, de Oliveira D. Parallel computing in genomic research: advances and applications. Adv Appl Bioinform Chem 2015; 8:23-35. [PMID: 26604801 PMCID: PMC4655901 DOI: 10.2147/aabc.s64482] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Collapse
|
129
|
Dormanns K, Brown RG, David T. Neurovascular coupling: a parallel implementation. Front Comput Neurosci 2015; 9:109. [PMID: 26441619 PMCID: PMC4569750 DOI: 10.3389/fncom.2015.00109] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/24/2015] [Indexed: 11/18/2022] Open
Abstract
A numerical model of neurovascular coupling (NVC) is presented based on neuronal activity coupled to vasodilation/contraction models via the astrocytic mediated perivascular K+ and the smooth muscle cell (SMC) Ca2+ pathway termed a neurovascular unit (NVU). Luminal agonists acting on P2Y receptors on the endothelial cell (EC) surface provide a flux of inositol trisphosphate (IP3) into the endothelial cytosol. This concentration of IP3 is transported via gap junctions between EC and SMC providing a source of sarcoplasmic derived Ca2+ in the SMC. The model is able to relate a neuronal input signal to the corresponding vessel reaction (contraction or dilation). A tissue slice consisting of blocks, each of which contain an NVU is connected to a space filling H-tree, simulating a perfusing arterial tree (vasculature) The model couples the NVUs to the vascular tree via a stretch mediated Ca2+ channel on both the EC and SMC. The SMC is induced to oscillate by increasing an agonist flux in the EC and hence increased IP3 induced Ca2+ from the SMC stores with the resulting calcium-induced calcium release (CICR) oscillation inhibiting NVC thereby relating blood flow to vessel contraction and dilation following neuronal activation. The coupling between the vasculature and the set of NVUs is relatively weak for the case with agonist induced where only the Ca2+ in cells inside the activated area becomes oscillatory however, the radii of vessels both inside and outside the activated area oscillate (albeit small for those outside). In addition the oscillation profile differs between coupled and decoupled states with the time required to refill the cytosol with decreasing Ca2+ and increasing frequency with coupling. The solution algorithm is shown to have excellent weak and strong scaling. Results have been generated for tissue slices containing up to 4096 blocks.
Collapse
|
130
|
Hahne J, Helias M, Kunkel S, Igarashi J, Bolten M, Frommer A, Diesmann M. A unified framework for spiking and gap-junction interactions in distributed neuronal network simulations. Front Neuroinform 2015; 9:22. [PMID: 26441628 PMCID: PMC4563270 DOI: 10.3389/fninf.2015.00022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 08/20/2015] [Indexed: 11/30/2022] Open
Abstract
Contemporary simulators for networks of point and few-compartment model neurons come with a plethora of ready-to-use neuron and synapse models and support complex network topologies. Recent technological advancements have broadened the spectrum of application further to the efficient simulation of brain-scale networks on supercomputers. In distributed network simulations the amount of spike data that accrues per millisecond and process is typically low, such that a common optimization strategy is to communicate spikes at relatively long intervals, where the upper limit is given by the shortest synaptic transmission delay in the network. This approach is well-suited for simulations that employ only chemical synapses but it has so far impeded the incorporation of gap-junction models, which require instantaneous neuronal interactions. Here, we present a numerical algorithm based on a waveform-relaxation technique which allows for network simulations with gap junctions in a way that is compatible with the delayed communication strategy. Using a reference implementation in the NEST simulator, we demonstrate that the algorithm and the required data structures can be smoothly integrated with existing code such that they complement the infrastructure for spiking connections. To show that the unified framework for gap-junction and spiking interactions achieves high performance and delivers high accuracy in the presence of gap junctions, we present benchmarks for workstations, clusters, and supercomputers. Finally, we discuss limitations of the novel technology.
Collapse
|
131
|
Kutzner C, Páll S, Fechner M, Esztermann A, de Groot BL, Grubmüller H. Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. J Comput Chem 2015; 36:1990-2008. [PMID: 26238484 PMCID: PMC5042102 DOI: 10.1002/jcc.24030] [Citation(s) in RCA: 147] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 07/05/2015] [Indexed: 11/11/2022]
Abstract
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Collapse
|
132
|
Vitay J, Dinkelbach HÜ, Hamker FH. ANNarchy: a code generation approach to neural simulations on parallel hardware. Front Neuroinform 2015; 9:19. [PMID: 26283957 PMCID: PMC4521356 DOI: 10.3389/fninf.2015.00019] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 07/13/2015] [Indexed: 11/22/2022] Open
Abstract
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions.
Collapse
|
133
|
Li Y, Jia W, Luan B, Mao ZH, Zhang H, Sun M. A FPGA Implementation of JPEG Baseline Encoder for Wearable Devices. PROCEEDINGS OF THE IEEE ... ANNUAL NORTHEAST BIOENGINEERING CONFERENCE. IEEE NORTHEAST BIOENGINEERING CONFERENCE 2015; 2015:10.1109/NEBEC.2015.7117173. [PMID: 26190911 PMCID: PMC4505724 DOI: 10.1109/nebec.2015.7117173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this paper, an efficient field-programmable gate array (FPGA) implementation of the JPEG baseline image compression encoder is presented for wearable devices in health and wellness applications. In order to gain flexibility in developing FPGA-specific software and balance between real-time performance and resources utilization, A High Level Synthesis (HLS) tool is utilized in our system design. An optimized dataflow configuration with a padding scheme simplifies the timing control for data transfer. Our experiments with a system-on-chip multi-sensor system have verified our FPGA implementation with respect to real-time performance, computational efficiency, and FPGA resource utilization.
Collapse
|
134
|
Pothapragada S, Zhang P, Sheriff J, Livelli M, Slepian MJ, Deng Y, Bluestein D. A phenomenological particle-based platelet model for simulating filopodia formation during early activation. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2015; 31:e02702. [PMID: 25532469 PMCID: PMC4509790 DOI: 10.1002/cnm.2702] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 10/29/2014] [Accepted: 12/11/2014] [Indexed: 05/13/2023]
Abstract
We developed a phenomenological three-dimensional platelet model to characterize the filopodia formation observed during early stage platelet activation. Departing from continuum mechanics based approaches, this coarse-grained molecular dynamics (CGMD) particle-based model can deform to emulate the complex shape change and filopodia formation that platelets undergo during activation. The platelet peripheral zone is modeled with a two-layer homogeneous elastic structure represented by spring-connected particles. The structural zone is represented by a cytoskeletal assembly comprising of a filamentous core and filament bundles supporting the platelet's discoid shape, also modeled by spring-connected particles. The interior organelle zone is modeled by homogeneous cytoplasm particles that facilitate the platelet deformation. Nonbonded interactions among the discrete particles of the membrane, the cytoskeletal assembly, and the cytoplasm are described using the Lennard-Jones potential with empirical constants. By exploring the parameter space of this CGMD model, we have successfully simulated the dynamics of varied filopodia formations. Comparative analyses of length and thickness of filopodia show that our numerical simulations are in agreement with experimental measurements of flow-induced activated platelets. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
|
135
|
Flouri T, Izquierdo-Carrasco F, Darriba D, Aberer AJ, Nguyen LT, Minh BQ, Von Haeseler A, Stamatakis A. The phylogenetic likelihood library. Syst Biol 2014; 64:356-62. [PMID: 25358969 PMCID: PMC4380035 DOI: 10.1093/sysbio/syu084] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required). The PLL is available at http://www.libpll.org under the GNU General Public License (GPL).
Collapse
|
136
|
Kunkel S, Schmidt M, Eppler JM, Plesser HE, Masumoto G, Igarashi J, Ishii S, Fukai T, Morrison A, Diesmann M, Helias M. Spiking network simulation code for petascale computers. Front Neuroinform 2014; 8:78. [PMID: 25346682 PMCID: PMC4193238 DOI: 10.3389/fninf.2014.00078] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/27/2014] [Indexed: 11/13/2022] Open
Abstract
Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today.
Collapse
|
137
|
Shen Z, Wang L, Zhao Y, Zhao Q, Zhao M. GPU-based skin texture synthesis for digital human model. Biomed Mater Eng 2014; 24:2219-27. [PMID: 25226921 DOI: 10.3233/bme-141034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Skin synthesis is important for the actual appearance of digital human models. However, it is difficult to design a general algorithm to efficiently produce high quality results. This paper proposes a parallel texture synthesis method for large scale skin of digital human models. Two major procedures are included in this method, a parallel matching procedure and a multi-pass optimizing procedure. Compared with other methods, this algorithm is easy to use, requires only a small size of skin image as input, and generates an arbitrary size of skin texture with high quality. As demonstrated by experiments, the effectiveness of this skin texture synthesis method is confirmed.
Collapse
|
138
|
Zenke F, Gerstner W. Limits to high-speed simulations of spiking neural networks using general-purpose computers. Front Neuroinform 2014; 8:76. [PMID: 25309418 PMCID: PMC4160969 DOI: 10.3389/fninf.2014.00076] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2014] [Accepted: 08/25/2014] [Indexed: 11/13/2022] Open
Abstract
To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
Collapse
|
139
|
Meng J, Wang B, Wei Y, Feng S, Balaji P. SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores. BMC Bioinformatics 2014; 15 Suppl 9:S2. [PMID: 25253533 PMCID: PMC4168705 DOI: 10.1186/1471-2105-15-s9-s2] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler.
Collapse
|
140
|
Augustin CM, Holzapfel GA, Steinbach O. Classical and all-floating FETI methods for the simulation of arterial tissues. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING 2014; 99:290-312. [PMID: 26751957 PMCID: PMC4702352 DOI: 10.1002/nme.4674] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
High-resolution and anatomically realistic computer models of biological soft tissues play a significant role in the understanding of the function of cardiovascular components in health and disease. However, the computational effort to handle fine grids to resolve the geometries as well as sophisticated tissue models is very challenging. One possibility to derive a strongly scalable parallel solution algorithm is to consider finite element tearing and interconnecting (FETI) methods. In this study we propose and investigate the application of FETI methods to simulate the elastic behavior of biological soft tissues. As one particular example we choose the artery which is - as most other biological tissues - characterized by anisotropic and nonlinear material properties. We compare two specific approaches of FETI methods, classical and all-floating, and investigate the numerical behavior of different preconditioning techniques. In comparison to classical FETI, the all-floating approach has not only advantages concerning the implementation but in many cases also concerning the convergence of the global iterative solution method. This behavior is illustrated with numerical examples. We present results of linear elastic simulations to show convergence rates, as expected from the theory, and results from the more sophisticated nonlinear case where we apply a well-known anisotropic model to the realistic geometry of an artery. Although the FETI methods have a great applicability on artery simulations we will also discuss some limitations concerning the dependence on material parameters.
Collapse
|
141
|
Meng X, Saunders MA, Mahoney MW. LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS. SIAM JOURNAL ON SCIENTIFIC COMPUTING : A PUBLICATION OF THE SOCIETY FOR INDUSTRIAL AND APPLIED MATHEMATICS 2014; 36:C95-C118. [PMID: 25419094 PMCID: PMC4238893 DOI: 10.1137/120866580] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to min x∈ℝ n ‖Ax - b‖2, where A ∈ ℝ m × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK's DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster.
Collapse
|
142
|
Li C, Petukh M, Li L, Alexov E. Continuous development of schemes for parallel computing of the electrostatics in biological systems: implementation in DelPhi. J Comput Chem 2013; 34:1949-60. [PMID: 23733490 PMCID: PMC3707979 DOI: 10.1002/jcc.23340] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 04/03/2013] [Accepted: 05/02/2013] [Indexed: 11/07/2022]
Abstract
Due to the enormous importance of electrostatics in molecular biology, calculating the electrostatic potential and corresponding energies has become a standard computational approach for the study of biomolecules and nano-objects immersed in water and salt phase or other media. However, the electrostatics of large macromolecules and macromolecular complexes, including nano-objects, may not be obtainable via explicit methods and even the standard continuum electrostatics methods may not be applicable due to high computational time and memory requirements. Here, we report further development of the parallelization scheme reported in our previous work (Li, et al., J. Comput. Chem. 2012, 33, 1960) to include parallelization of the molecular surface and energy calculations components of the algorithm. The parallelization scheme utilizes different approaches such as space domain parallelization, algorithmic parallelization, multithreading, and task scheduling, depending on the quantity being calculated. This allows for efficient use of the computing resources of the corresponding computer cluster. The parallelization scheme is implemented in the popular software DelPhi and results in speedup of several folds. As a demonstration of the efficiency and capability of this methodology, the electrostatic potential, and electric field distributions are calculated for the bovine mitochondrial supercomplex illustrating their complex topology, which cannot be obtained by modeling the supercomplex components alone.
Collapse
|
143
|
Yuan J, Xu G, Yu Y, Zhou Y, Carson PL, Wang X, Liu X. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization. JOURNAL OF BIOMEDICAL OPTICS 2013; 18:86001. [PMID: 23907277 PMCID: PMC3733419 DOI: 10.1117/1.jbo.18.8.086001] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Revised: 05/31/2013] [Accepted: 06/21/2013] [Indexed: 05/18/2023]
Abstract
Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.
Collapse
|
144
|
Grinberg L, Fedosov DA, Karniadakis GE. Parallel multiscale simulations of a brain aneurysm. JOURNAL OF COMPUTATIONAL PHYSICS 2013; 244:131-147. [PMID: 23734066 PMCID: PMC3668797 DOI: 10.1016/j.jcp.2012.08.023] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκαr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Collapse
|
145
|
Ben-Shalom R, Liberman G, Korngreen A. Accelerating compartmental modeling on a graphical processing unit. Front Neuroinform 2013; 7:4. [PMID: 23508232 PMCID: PMC3600538 DOI: 10.3389/fninf.2013.00004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 02/28/2013] [Indexed: 11/17/2022] Open
Abstract
Compartmental modeling is a widely used tool in neurophysiology but the detail and scope of such models is frequently limited by lack of computational resources. Here we implement compartmental modeling on low cost Graphical Processing Units (GPUs), which significantly increases simulation speed compared to NEURON. Testing two methods for solving the current diffusion equation system revealed which method is more useful for specific neuron morphologies. Regions of applicability were investigated using a range of simulations from a single membrane potential trace simulated in a simple fork morphology to multiple traces on multiple realistic cells. A runtime peak 150-fold faster than the CPU was achieved. This application can be used for statistical analysis and data fitting optimizations of compartmental models and may be used for simultaneously simulating large populations of neurons. Since GPUs are forging ahead and proving to be more cost-effective than CPUs, this may significantly decrease the cost of computation power and open new computational possibilities for laboratories with limited budgets.
Collapse
|
146
|
Tyka MD, Jung K, Baker D. Efficient sampling of protein conformational space using fast loop building and batch minimization on highly parallel computers. J Comput Chem 2012; 33:2483-91. [PMID: 22847521 PMCID: PMC3760475 DOI: 10.1002/jcc.23069] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2012] [Revised: 05/30/2012] [Accepted: 06/24/2012] [Indexed: 12/22/2022]
Abstract
All-atom sampling is a critical and compute-intensive end stage to protein structural modeling. Because of the vast size and extreme ruggedness of conformational space, even close to the native structure, the high-resolution sampling problem is almost as difficult as predicting the rough fold of a protein. Here, we present a combination of new algorithms that considerably speed up the exploration of very rugged conformational landscapes and are capable of finding heretofore hidden low-energy states. The algorithm is based on a hierarchical workflow and can be parallelized on supercomputers with up to 128,000 compute cores with near perfect efficiency. Such scaling behavior is notable, as with Moore's law continuing only in the number of cores per chip, parallelizability is a critical property of new algorithms. Using the enhanced sampling power, we have uncovered previously invisible deficiencies in the Rosetta force field and created an extensive decoy training set for optimizing and testing force fields.
Collapse
|
147
|
Helias M, Kunkel S, Masumoto G, Igarashi J, Eppler JM, Ishii S, Fukai T, Morrison A, Diesmann M. Supercomputers ready for use as discovery machines for neuroscience. Front Neuroinform 2012; 6:26. [PMID: 23129998 PMCID: PMC3486988 DOI: 10.3389/fninf.2012.00026] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Accepted: 10/08/2012] [Indexed: 11/16/2022] Open
Abstract
NEST is a widely used tool to simulate biological spiking neural networks. Here we explain the improvements, guided by a mathematical model of memory consumption, that enable us to exploit for the first time the computational power of the K supercomputer for neuroscience. Multi-threaded components for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling. K is capable of simulating networks corresponding to a brain area with 108 neurons and 1012 synapses in the worst case scenario of random connectivity; for larger networks of the brain its hierarchical organization can be exploited to constrain the number of communicating computer nodes. We discuss the limits of the software technology, comparing maximum filling scaling plots for K and the JUGENE BG/P system. The usability of these machines for network simulations has become comparable to running simulations on a single PC. Turn-around times in the range of minutes even for the largest systems enable a quasi interactive working style and render simulations on this scale a practical tool for computational neuroscience.
Collapse
|
148
|
Li C, Li L, Zhang J, Alexov E. Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi. J Comput Chem 2012; 33:1960-6. [PMID: 22674480 PMCID: PMC3412928 DOI: 10.1002/jcc.23033] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Revised: 04/20/2012] [Accepted: 05/11/2012] [Indexed: 11/07/2022]
Abstract
The Gauss-Seidel (GS) method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the GS method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here, we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of processes or computing units. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further, we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures.
Collapse
|
149
|
Samsi S, Krishnamurthy AK, Gurcan MN. An Efficient Computational Framework for the Analysis of Whole Slide Images: Application to Follicular Lymphoma Immunohistochemistry. JOURNAL OF COMPUTATIONAL SCIENCE 2012; 3:269-279. [PMID: 22962572 PMCID: PMC3432990 DOI: 10.1016/j.jocs.2012.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist's disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach.
Collapse
|
150
|
Murphy M, Alley M, Demmel J, Keutzer K, Vasanawala S, Lustig M. Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime. IEEE TRANSACTIONS ON MEDICAL IMAGING 2012; 31:1250-62. [PMID: 22345529 PMCID: PMC3522122 DOI: 10.1109/tmi.2012.2188039] [Citation(s) in RCA: 145] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Collapse
|