26
|
Feldotto B, Eppler JM, Jimenez-Romero C, Bignamini C, Gutierrez CE, Albanese U, Retamino E, Vorobev V, Zolfaghari V, Upton A, Sun Z, Yamaura H, Heidarinejad M, Klijn W, Morrison A, Cruz F, McMurtrie C, Knoll AC, Igarashi J, Yamazaki T, Doya K, Morin FO. Deploying and Optimizing Embodied Simulations of Large-Scale Spiking Neural Networks on HPC Infrastructure. Front Neuroinform 2022; 16:884180. [PMID: 35662903 PMCID: PMC9160925 DOI: 10.3389/fninf.2022.884180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 04/19/2022] [Indexed: 12/20/2022] Open
Abstract
Simulating the brain-body-environment trinity in closed loop is an attractive proposal to investigate how perception, motor activity and interactions with the environment shape brain activity, and vice versa. The relevance of this embodied approach, however, hinges entirely on the modeled complexity of the various simulated phenomena. In this article, we introduce a software framework that is capable of simulating large-scale, biologically realistic networks of spiking neurons embodied in a biomechanically accurate musculoskeletal system that interacts with a physically realistic virtual environment. We deploy this framework on the high performance computing resources of the EBRAINS research infrastructure and we investigate the scaling performance by distributing computation across an increasing number of interconnected compute nodes. Our architecture is based on requested compute nodes as well as persistent virtual machines; this provides a high-performance simulation environment that is accessible to multi-domain users without expert knowledge, with a view to enable users to instantiate and control simulations at custom scale via a web-based graphical user interface. Our simulation environment, entirely open source, is based on the Neurorobotics Platform developed in the context of the Human Brain Project, and the NEST simulator. We characterize the capabilities of our parallelized architecture for large-scale embodied brain simulations through two benchmark experiments, by investigating the effects of scaling compute resources on performance defined in terms of experiment runtime, brain instantiation and simulation time. The first benchmark is based on a large-scale balanced network, while the second one is a multi-region embodied brain simulation consisting of more than a million neurons and a billion synapses. Both benchmarks clearly show how scaling compute resources improves the aforementioned performance metrics in a near-linear fashion. The second benchmark in particular is indicative of both the potential and limitations of a highly distributed simulation in terms of a trade-off between computation speed and resource cost. Our simulation architecture is being prepared to be accessible for everyone as an EBRAINS service, thereby offering a community-wide tool with a unique workflow that should provide momentum to the investigation of closed-loop embodiment within the computational neuroscience community.
Collapse
|
27
|
Lim HGM, Hsiao SH, Fann YC, Lee YCG. Robust Mutation Profiling of SARS-CoV-2 Variants from Multiple Raw Illumina Sequencing Data with Cloud Workflow. Genes (Basel) 2022. [PMID: 35456492 DOI: 10.3390/genes1304068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2023] Open
Abstract
Several variants of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are emerging all over the world. Variant surveillance from genome sequencing has become crucial to determine if mutations in these variants are rendering the virus more infectious, potent, or resistant to existing vaccines and therapeutics. Meanwhile, analyzing many raw sequencing data repeatedly with currently available code-based bioinformatics tools is tremendously challenging to be implemented in this unprecedented pandemic time due to the fact of limited experts and computational resources. Therefore, in order to hasten variant surveillance efforts, we developed an installation-free cloud workflow for robust mutation profiling of SARS-CoV-2 variants from multiple Illumina sequencing data. Herein, 55 raw sequencing data representing four early SARS-CoV-2 variants of concern (Alpha, Beta, Gamma, and Delta) from an open-access database were used to test our workflow performance. As a result, our workflow could automatically identify mutated sites of the variants along with reliable annotation of the protein-coding genes at cost-effective and timely manner for all by harnessing parallel cloud computing in one execution under resource-limitation settings. In addition, our workflow can also generate a consensus genome sequence which can be shared with others in public data repositories to support global variant surveillance efforts.
Collapse
|
28
|
Paik H, Cho Y, Cho SB, Kwon OK. MPI-GWAS: a supercomputing-aided permutation approach for genomewide association studies. Genomics Inform 2022; 20:e14. [PMID: 35399013 PMCID: PMC9001997 DOI: 10.5808/gi.22001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 02/10/2022] [Indexed: 11/21/2022] Open
Abstract
Permutation testing is a robust and popular approach for significance testing in genomic research that has the advantage of reducing inflated type 1 error rates; however, its computational cost is notorious in genome-wide association studies (GWAS). Here, we developed a supercomputing-aided approach to accelerate the permutation testing for GWAS, based on the message-passing interface (MPI) on parallel computing architecture. Our application, called MPI-GWAS, conducts MPI-based permutation testing using a parallel computing approach with our supercomputing system, Nurion (8,305 compute nodes, and 563,740 central processing units [CPUs]). For 107 permutations of one locus in MPI-GWAS, it was calculated in 600 s using 2,720 CPU cores. For 107 permutations of ~30,000–50,000 loci in over 7,000 subjects, the total elapsed time was ~4 days in the Nurion supercomputer. Thus, MPI-GWAS enables us to feasibly compute the permutation-based GWAS within a reason-able time by harnessing the power of parallel computing resources.
Collapse
|
29
|
Pronold J, Jordan J, Wylie BJN, Kitayama I, Diesmann M, Kunkel S. Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring. Front Neuroinform 2022; 15:785068. [PMID: 35300490 PMCID: PMC8921864 DOI: 10.3389/fninf.2021.785068] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 12/24/2021] [Indexed: 11/26/2022] Open
Abstract
Generic simulation code for spiking neuronal networks spends the major part of the time in the phase where spikes have arrived at a compute node and need to be delivered to their target neurons. These spikes were emitted over the last interval between communication steps by source neurons distributed across many compute nodes and are inherently irregular and unsorted with respect to their targets. For finding those targets, the spikes need to be dispatched to a three-dimensional data structure with decisions on target thread and synapse type to be made on the way. With growing network size, a compute node receives spikes from an increasing number of different source neurons until in the limit each synapse on the compute node has a unique source. Here, we show analytically how this sparsity emerges over the practically relevant range of network sizes from a hundred thousand to a billion neurons. By profiling a production code we investigate opportunities for algorithmic changes to avoid indirections and branching. Every thread hosts an equal share of the neurons on a compute node. In the original algorithm, all threads search through all spikes to pick out the relevant ones. With increasing network size, the fraction of hits remains invariant but the absolute number of rejections grows. Our new alternative algorithm equally divides the spikes among the threads and immediately sorts them in parallel according to target thread and synapse type. After this, every thread completes delivery solely of the section of spikes for its own neurons. Independent of the number of threads, all spikes are looked at only two times. The new algorithm halves the number of instructions in spike delivery which leads to a reduction of simulation time of up to 40 %. Thus, spike delivery is a fully parallelizable process with a single synchronization point and thereby well suited for many-core systems. Our analysis indicates that further progress requires a reduction of the latency that the instructions experience in accessing memory. The study provides the foundation for the exploration of methods of latency hiding like software pipelining and software-induced prefetching.
Collapse
|
30
|
Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms. IMETA 2022; 1:e1. [PMID: 38867729 PMCID: PMC10989749 DOI: 10.1002/imt2.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 06/14/2024]
Abstract
Massive microbiome sequencing data has been generated, which elucidates associations between microbes and their environmental phenotypes such as host health or ecosystem status. Outstanding bioinformatic tools are the basis to decipher the biological information hidden under microbiome data. However, most approaches placed difficulties on the accessibility to nonprofessional users. On the other side, the computing throughput has become a significant bottleneck of many analytical pipelines in processing large-scale datasets. In this study, we introduce Parallel-Meta Suite (PMS), an interactive software package for fast and comprehensive microbiome data analysis, visualization, and interpretation. It covers a wide array of functions for data preprocessing, statistics, visualization by state-of-the-art algorithms in a user-friendly graphical interface, which is accessible to diverse users. To meet the rapidly increasing computational demands, the entire procedure of PMS has been optimized by a parallel computing scheme, enabling the rapid processing of thousands of samples. PMS is compatible with multiple platforms, and an installer has been integrated for full-automatic installation.
Collapse
|
31
|
Oh S, Lee JH, Seo S, Choo H, Lee D, Cho JI, Park JH. Electrolyte-Gated Vertical Synapse Array based on Van Der Waals Heterostructure for Parallel Computing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2103808. [PMID: 34957687 PMCID: PMC8867203 DOI: 10.1002/advs.202103808] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/11/2021] [Indexed: 06/01/2023]
Abstract
Recently, three-terminal synaptic devices, which separate read and write terminals, have attracted significant attention because they enable nondestructive read-out and parallel-access for updating synaptic weights. However, owing to their structural features, it is difficult to address the relatively high device density compared with two-terminal synaptic devices. In this study, a vertical synaptic device featuring remotely controllable weight updates via e-field-dependent movement of mobile ions in the ion-gel layer is developed. This synaptic device successfully demonstrates all essential synaptic characteristics, such as excitatory/inhibitory postsynaptic current (E/IPSC), paired-pulse facilitation (PPF), and long-term potentiation/depression (LTP/D) by electrical measurements, and exhibits competitive LTP/D characteristics with a dynamic range (Gmax /Gmin ) of 31.3, and asymmetry (AS) of 8.56. The stability of the LTP/D characteristics is also verified through repeated measurements over 50 cycles; the relative standard deviations (RSDs) of Gmax /Gmin and AS are calculated as 1.65% and 0.25%, respectively. These excellent synaptic properties enable a recognition rate of ≈99% in the training and inference tasks for acoustic and emotional information patterns. This study is expected to be an important foundation for the realization of future parallel computing networks for energy-efficient and high-speed data processing.
Collapse
|
32
|
Heittmann A, Psychou G, Trensch G, Cox CE, Wilcke WW, Diesmann M, Noll TG. Simulating the Cortical Microcircuit Significantly Faster Than Real Time on the IBM INC-3000 Neural Supercomputer. Front Neurosci 2022; 15:728460. [PMID: 35126034 PMCID: PMC8811464 DOI: 10.3389/fnins.2021.728460] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 11/04/2021] [Indexed: 11/13/2022] Open
Abstract
This article employs the new IBM INC-3000 prototype FPGA-based neural supercomputer to implement a widely used model of the cortical microcircuit. With approximately 80,000 neurons and 300 Million synapses this model has become a benchmark network for comparing simulation architectures with regard to performance. To the best of our knowledge, the achieved speed-up factor is 2.4 times larger than the highest speed-up factor reported in the literature and four times larger than biological real time demonstrating the potential of FPGA systems for neural modeling. The work was performed at Jülich Research Centre in Germany and the INC-3000 was built at the IBM Almaden Research Center in San Jose, CA, United States. For the simulation of the microcircuit only the programmable logic part of the FPGA nodes are used. All arithmetic is implemented with single-floating point precision. The original microcircuit network with linear LIF neurons and current-based exponential-decay-, alpha-function- as well as beta-function-shaped synapses was simulated using exact exponential integration as ODE solver method. In order to demonstrate the flexibility of the approach, additionally networks with non-linear neuron models (AdEx, Izhikevich) and conductance-based synapses were simulated, applying Runge-Kutta and Parker-Sochacki solver methods. In all cases, the simulation-time speed-up factor did not decrease by more than a very few percent. It finally turns out that the speed-up factor is essentially limited by the latency of the INC-3000 communication system.
Collapse
|
33
|
Lai X, Taskén HA, Mo T, Funke SW, Frigessi A, Rognes ME, Köhn-Luque A. A scalable solver for a stochastic, hybrid cellular automaton model of personalized breast cancer therapy. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2022; 38:e3542. [PMID: 34716985 DOI: 10.1002/cnm.3542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 10/24/2021] [Indexed: 06/13/2023]
Abstract
Mathematical modeling and simulation is a promising approach to personalized cancer medicine. Yet, the complexity, heterogeneity and multi-scale nature of cancer pose significant computational challenges. Coupling discrete cell-based models with continuous models using hybrid cellular automata (CA) is a powerful approach for mimicking biological complexity and describing the dynamical exchange of information across different scales. However, when clinically relevant cancer portions are taken into account, such models become computationally very expensive. While efficient parallelization techniques for continuous models exist, their coupling with discrete models, particularly CA, necessitates more elaborate solutions. Building upon FEniCS, a popular and powerful scientific computing platform for solving partial differential equations, we developed parallel algorithms to link stochastic CA with differential equations (https://bitbucket.org/HTasken/cansim). The algorithms minimize the communication between processes that share CA neighborhood values while also allowing for reproducibility during stochastic updates. We demonstrated the potential of our solution on a complex hybrid cellular automaton model of breast cancer treated with combination chemotherapy. On a single-core processor, we obtained nearly linear scaling with an increasing problem size, whereas weak parallel scaling showed moderate growth in solving time relative to increase in problem size. Finally, we applied the algorithm to a problem that is 500 times larger than previous work, allowing us to run personalized therapy simulations based on heterogeneous cell density and tumor perfusion conditions estimated from magnetic resonance imaging data on an unprecedented scale.
Collapse
|
34
|
Yan W, Ansari S, Lamson A, Glaser MA, Blackwell R, Betterton MD, Shelley M. Toward the cellular-scale simulation of motor-driven cytoskeletal assemblies. eLife 2022; 11:74160. [PMID: 35617115 PMCID: PMC9135453 DOI: 10.7554/elife.74160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 04/24/2022] [Indexed: 11/17/2022] Open
Abstract
The cytoskeleton - a collection of polymeric filaments, molecular motors, and crosslinkers - is a foundational example of active matter, and in the cell assembles into organelles that guide basic biological functions. Simulation of cytoskeletal assemblies is an important tool for modeling cellular processes and understanding their surprising material properties. Here, we present aLENS (a Living Ensemble Simulator), a novel computational framework designed to surmount the limits of conventional simulation methods. We model molecular motors with crosslinking kinetics that adhere to a thermodynamic energy landscape, and integrate the system dynamics while efficiently and stably enforcing hard-body repulsion between filaments. Molecular potentials are entirely avoided in imposing steric constraints. Utilizing parallel computing, we simulate tens to hundreds of thousands of cytoskeletal filaments and crosslinking motors, recapitulating emergent phenomena such as bundle formation and buckling. This simulation framework can help elucidate how motor type, thermal fluctuations, internal stresses, and confinement determine the evolution of cytoskeletal active matter.
Collapse
|
35
|
Zhang S, Qiang Y. Fast parallel implementation for total variation constrained algebraic reconstruction technique. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2022; 30:737-750. [PMID: 35527622 DOI: 10.3233/xst-221163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In computed tomography (CT), the total variation (TV) constrained algebraic reconstruction technique (ART) can obtain better reconstruction quality when the projection data are sparse and noisy. However, the ART-TV algorithm remains time-consuming since it requires large numbers of iterations, especially for the reconstruction of high-resolution images. In this work, we propose a fast algorithm to calculate the system matrix for line intersection model and apply this algorithm to perform the forward-projection and back-projection operations of the ART. Then, we utilize the parallel computing techniques of multithreading and graphics processing units (GPU) to accelerate the ART iteration and the TV minimization, respectively. Numerical experiments show that our proposed parallel implementation approach is very efficient and accurate. For the reconstruction of a 2048 × 2048 image from 180 projection views of 2048 detector bins, it takes about 2.2 seconds to perform one iteration of the ART-TV algorithm using our proposed approach on a ten-core platform. Experimental results demonstrate that our new approach achieves a speedup of 23 times over the conventional single-threaded CPU implementation that using the Siddon algorithm.
Collapse
|
36
|
Guerrero-Araya E, Muñoz M, Rodríguez C, Paredes-Sabja D. FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies. Bioinform Biol Insights 2021; 15:11779322211059238. [PMID: 34866905 PMCID: PMC8637782 DOI: 10.1177/11779322211059238] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/19/2021] [Indexed: 11/21/2022] Open
Abstract
Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the
intra-species level for epidemiologic and evolutionary purposes. It operates by
assigning a sequence type (ST) identifier to each specimen, based on a
combination of alleles of multiple housekeeping genes included in a defined
scheme. The use of MLST has multiplied due to the availability of large numbers
of genomic sequences and epidemiologic data in public repositories. However,
data processing speed has become problematic due to the massive size of modern
datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST
searches using BLASTn and a divide-and-conquer approach that processes each
genome assembly in parallel. The output offered by FastMLST includes a table
with the ST, allelic profile, and clonal complex or clade (when available),
detected for a query, as well as a multi-FASTA file or a series of FASTA files
with the concatenated or single allele sequences detected, respectively.
FastMLST was validated with 91 different species, with a wide range of
guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a
speed test was performed on 3 datasets with varying genome sizes. Compared with
other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes
advantage of multiple processors to simultaneously type up to 28 000 genomes in
less than 10 minutes, reducing processing times by at least 3-fold with 100%
concordance to PubMLST, if contaminated genomes are excluded from the analysis.
The source code, installation instructions, and documentation of FastMLST are
available at https://github.com/EnzoAndree/FastMLST
Collapse
|
37
|
Iserte S, Carratalà P, Arnau R, Martínez-Cuenca R, Barreda P, Basiero L, Climent J, Chiva S. Modeling of wastewater treatment processes with hydrosludge. WATER ENVIRONMENT RESEARCH : A RESEARCH PUBLICATION OF THE WATER ENVIRONMENT FEDERATION 2021; 93:3049-3063. [PMID: 34755418 DOI: 10.1002/wer.1656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 10/18/2021] [Accepted: 10/22/2021] [Indexed: 06/13/2023]
Abstract
The pressure for Water Resource Recovery Facilities (WRRF) operators to efficiently treat wastewater is greater than ever because of the water crisis, produced by the climate change effects and more restrictive regulations. Technicians and researchers need to evaluate WRRF performance to ensure maximum efficiency. For this purpose, numerical techniques, such as CFD, have been widely applied to the wastewater sector to model biological reactors and secondary settling tanks with high spatial and temporal accuracy. However, limitations such as complexity and learning curve prevent extending CFD usage among wastewater modeling experts. This paper presents HydroSludge, a framework that provides a series of tools that simplify the implementation of the processes and workflows in a WRRF. This work leverages HydroSludge to preprocess existing data, aid the meshing process, and perform CFD simulations. Its intuitive interface proves itself as an effective tool to increase the efficiency of wastewater treatment. PRACTITIONER POINTS: This paper introduces a software platform specifically oriented to WRRF, named HydroSludge, which provides easy access to the most widespread and leading CFD simulation software, OpenFOAM. Hydrosludge is intended to be used by WRRF operators, bringing a more wizard-like, automatic, and intuitive usage. Meshing assistance, submersible mixers, biological models, and distributed parallel computing are the most remarkable features included in HydroSludge. With the provided study cases, HydroSludge has proven to be a crucial tool for operators, managers, and researchers in WRRF.
Collapse
|
38
|
Cury LFM, Maso Talou GD, Younes-Ibrahim M, Blanco PJ. Parallel generation of extensive vascular networks with application to an archetypal human kidney model. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210973. [PMID: 34966553 PMCID: PMC8633801 DOI: 10.1098/rsos.210973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 10/28/2021] [Indexed: 05/25/2023]
Abstract
Given the relevance of the inextricable coupling between microcirculation and physiology, and the relation to organ function and disease progression, the construction of synthetic vascular networks for mathematical modelling and computer simulation is becoming an increasingly broad field of research. Building vascular networks that mimic in vivo morphometry is feasible through algorithms such as constrained constructive optimization (CCO) and variations. Nevertheless, these methods are limited by the maximum number of vessels to be generated due to the whole network update required at each vessel addition. In this work, we propose a CCO-based approach endowed with a domain decomposition strategy to concurrently create vascular networks. The performance of this approach is evaluated by analysing the agreement with the sequentially generated networks and studying the scalability when building vascular networks up to 200 000 vascular segments. Finally, we apply our method to vascularize a highly complex geometry corresponding to the cortex of a prototypical human kidney. The technique presented in this work enables the automatic generation of extensive vascular networks, removing the limitation from previous works. Thus, we can extend vascular networks (e.g. obtained from medical images) to pre-arteriolar level, yielding patient-specific whole-organ vascular models with an unprecedented level of detail.
Collapse
|
39
|
Yang X, Wang W, Ma JL, Qiu YL, Lu K, Cao DS, Wu CK. BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution. Brief Bioinform 2021; 23:6440126. [PMID: 34849567 PMCID: PMC8690188 DOI: 10.1093/bib/bbab491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/24/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023] Open
Abstract
Motivation Understanding chemical–gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. Results We developed BioNet, a deep biological networkmodel with a graph encoder–decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.
Collapse
|
40
|
Shuai M, He D, Chen X. Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix. Stat Appl Genet Mol Biol 2021; 20:145-153. [PMID: 34757703 DOI: 10.1515/sagmb-2021-0025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 10/08/2021] [Indexed: 02/06/2023]
Abstract
Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.
Collapse
|
41
|
Ko S, Li GX, Choi H, Won JH. Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx. Brief Bioinform 2021; 22:bbab256. [PMID: 34254998 PMCID: PMC8575036 DOI: 10.1093/bib/bbab256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 06/15/2021] [Accepted: 06/17/2021] [Indexed: 12/20/2022] Open
Abstract
Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.
Collapse
|
42
|
Lu H, Wei Z, Wang C, Guo J, Zhou Y, Wang Z, Liu H. Redesigning Vina@QNLM for Ultra-Large-Scale Molecular Docking and Screening on a Sunway Supercomputer. Front Chem 2021; 9:750325. [PMID: 34778205 PMCID: PMC8581564 DOI: 10.3389/fchem.2021.750325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/14/2021] [Indexed: 11/28/2022] Open
Abstract
Ultra-large-scale molecular docking can improve the accuracy of lead compounds in drug discovery. In this study, we developed a molecular docking piece of software, Vina@QNLM, which can use more than 4,80,000 parallel processes to search for potential lead compounds from hundreds of millions of compounds. We proposed a task scheduling mechanism for large-scale parallelism based on Vinardo and Sunway supercomputer architecture. Then, we readopted the core docking algorithm to incorporate the full advantage of the heterogeneous multicore processor architecture in intensive computing. We successfully expanded it to 10, 465, 065 cores (1,61,001 management process elements and 0, 465, 065 computing process elements), with a strong scalability of 55.92%. To the best of our knowledge, this is the first time that 10 million cores are used for molecular docking on Sunway. The introduction of the heterogeneous multicore processor architecture achieved the best speedup, which is 11x more than that of the management process element of Sunway. The performance of Vina@QNLM was comprehensively evaluated using the CASF-2013 and CASF-2016 protein-ligand benchmarks, and the screening power was the highest out of the 27 pieces of software tested in the CASF-2013 benchmark. In some existing applications, we used Vina@QNLM to dock more than 10 million molecules to nine rigid proteins related to SARS-CoV-2 within 8.5 h on 10 million cores. We also developed a platform for the general public to use the software.
Collapse
|
43
|
Thomine O, Alizon S, Boennec C, Barthelemy M, Sofonea M. Emerging dynamics from high-resolution spatial numerical epidemics. eLife 2021; 10:71417. [PMID: 34652271 PMCID: PMC8568339 DOI: 10.7554/elife.71417] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Simulating nationwide realistic individual movements with a detailed geographical structure can help optimise public health policies. However, existing tools have limited resolution or can only account for a limited number of agents. We introduce Epidemap, a new framework that can capture the daily movement of more than 60 million people in a country at a building-level resolution in a realistic and computationally efficient way. By applying it to the case of an infectious disease spreading in France, we uncover hitherto neglected effects, such as the emergence of two distinct peaks in the daily number of cases or the importance of local density in the timing of arrival of the epidemic. Finally, we show that the importance of super-spreading events strongly varies over time.
Collapse
|
44
|
Parallel Algorithm on GPU for Wireless Sensor Data Acquisition Using a Team of Unmanned Aerial Vehicles. SENSORS 2021; 21:s21206851. [PMID: 34696064 PMCID: PMC8541541 DOI: 10.3390/s21206851] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/30/2021] [Accepted: 10/12/2021] [Indexed: 11/16/2022]
Abstract
This paper proposes a framework for the wireless sensor data acquisition using a team of Unmanned Aerial Vehicles (UAVs). Scattered over a terrain, the sensors detect information about their surroundings and can transmit this information wirelessly over a short range. With no access to a terrestrial or satellite communication network to relay the information to, UAVs are used to visit the sensors and collect the data. The proposed framework uses an iterative k-means algorithm to group the sensors into clusters and to identify Download Points (DPs) where the UAVs hover to download the data. A Single-Source–Shortest-Path algorithm (SSSP) is used to compute optimal paths between every pair of DPs with a constraint to reduce the number of turns. A genetic algorithm supplemented with a 2-opt local search heuristic is used to solve the multi-travelling salesperson problem and to find optimized tours for each UAVs. Finally, a collision avoidance strategy is implemented to guarantee collision-free trajectories. Concerned with the overall runtime of the framework, the SSSP algorithm is implemented in parallel on a graphics processing unit. The proposed framework is tested in simulation using three UAVs and realistic 3D maps with up to 100 sensors and runs in just 20.7 s, a 33.3× speed-up compared to a sequential execution on CPU. The results show that the proposed method is efficient at calculating optimized trajectories for the UAVs for data acquisition from wireless sensors. The results also show the significant advantage of the parallel implementation on GPU.
Collapse
|
45
|
Goicovich I, Olivares P, Román C, Vázquez A, Poupon C, Mangin JF, Guevara P, Hernández C. Fiber Clustering Acceleration With a Modified Kmeans++ Algorithm Using Data Parallelism. Front Neuroinform 2021; 15:727859. [PMID: 34539370 PMCID: PMC8445177 DOI: 10.3389/fninf.2021.727859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 08/10/2021] [Indexed: 11/13/2022] Open
Abstract
Fiber clustering methods are typically used in brain research to study the organization of white matter bundles from large diffusion MRI tractography datasets. These methods enable exploratory bundle inspection using visualization and other methods that require identifying brain white matter structures in individuals or a population. Some applications, such as real-time visualization and inter-subject clustering, need fast and high-quality intra-subject clustering algorithms. This work proposes a parallel algorithm using a General Purpose Graphics Processing Unit (GPGPU) for fiber clustering based on the FFClust algorithm. The proposed GPGPU implementation exploits data parallelism using both multicore and GPU fine-grained parallelism present in commodity architectures, including current laptops and desktop computers. Our approach implements all FFClust steps in parallel, improving execution times in all of them. In addition, our parallel approach includes a parallel Kmeans++ algorithm implementation and defines a new variant of Kmeans++ to reduce the impact of choosing outliers as initial centroids. The results show that our approach provides clustering quality results very similar to FFClust, and it requires an execution time of 3.5 s for processing about a million fibers, achieving a speedup of 11.5 times compared to FFClust.
Collapse
|
46
|
Lebedev I, Lovskaya D, Mochalova M, Mitrofanov I, Menshutina N. Cellular Automata Modeling of Three-Dimensional Chitosan-Based Aerogels Fiberous Structures with Bezier Curves. Polymers (Basel) 2021; 13:polym13152511. [PMID: 34372113 PMCID: PMC8348900 DOI: 10.3390/polym13152511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 07/27/2021] [Accepted: 07/28/2021] [Indexed: 11/16/2022] Open
Abstract
In this work, a cellular automata approach was investigated for modeling three-dimensional fibrous nanoporous aerogel structures. A model for the generation of fibrous structures using the Bezier curves is proposed. Experimental chitosan-based aerogel particles were obtained for which analytical studies of the structural characteristics were carried out. The data obtained were used to generate digital copies of chitosan-based aerogel structures and to assess the accuracy of the developed model. The obtained digital copies of chitosan-based aerogel structures will be used to create digital copies of aerogel structures with embedded active pharmaceutical ingredients (APIs) and further predict the release of APIs from these structures.
Collapse
|
47
|
Nilforooshan MA, Garrick D, Harris B. Alternative Ways of Computing the Numerator Relationship Matrix. Front Genet 2021; 12:655638. [PMID: 34394180 PMCID: PMC8356081 DOI: 10.3389/fgene.2021.655638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/17/2021] [Indexed: 11/25/2022] Open
Abstract
Pedigree relationships between every pair of individuals forms the elements of the additive genetic relationship matrix (A). Calculation of A−1 does not require forming and inverting A, and it is faster and easier than the calculation of A. Although A−1 is used in best linear unbiased prediction of genetic merit, A is used in population studies and post-evaluation procedures, such as breeding programs and controlling the rate of inbreeding. Three pedigrees with 20,000 animals (20K) and different (1, 2, 4) litter sizes, and a pedigree with 180,000 animals (180K) and litter size 2 were simulated. Aiming to reduce the computation time for calculating A, new methods [Array-Tabular method, (T−1)−1 instead of T in Thompson's method, iterative updating of D in Thompson's method, and iteration by generation] were developed and compared with some existing methods. The methods were coded in the R programming language to demonstrate the algorithms, aiming for minimizing the computational time. Among 20K, computational time decreased with increasing litter size for most of the methods. Methods deriving A from A−1 were relatively slow. The other methods were either using only pedigree information or both the pedigree and inbreeding coefficients. Calculating inbreeding coefficients was extremely fast (<0.2 s for 180K). Parallel computing (15 cores) was adopted for methods that were based on solving A−1 for columns of A, as those methods allowed implicit parallelism. Optimizing the code for one of the earliest methods enabled A to be built in 13 s (faster than the 31 s for calculating A−1) for 20K and 17 min 3 s for 180K. Memory is a bottleneck for large pedigrees but attempts to reduce the memory usage increased the computational time. To reduce disk space usage, memory usage, and computational time, relationship coefficients of old animals in the pedigree can be archived and relationship coefficients for parents of the next generation can be saved in an external file for successive updates to the pedigree and the A matrix.
Collapse
|
48
|
Dumont AP, Fang Q, Patil CA. A computationally efficient Monte-Carlo model for biomedical Raman spectroscopy. JOURNAL OF BIOPHOTONICS 2021; 14:e202000377. [PMID: 33733621 PMCID: PMC10069992 DOI: 10.1002/jbio.202000377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 05/29/2023]
Abstract
Monte Carlo (MC) modeling is a valuable tool to gain fundamental understanding of light-tissue interactions, provide guidance and assessment to optical instrument designs, and help analyze experimental data. It has been a major challenge to efficiently extend MC towards modeling of bulk-tissue Raman spectroscopy (RS) due to the wide spectral range, relatively sharp spectral features, and presence of background autofluorescence. Here, we report a computationally efficient MC approach for RS by adapting the massively-parallel Monte Carlo eXtreme (MCX) simulator. Simulation efficiency is achieved through "isoweight," a novel approach that combines the statistical generation of Raman scattered and Fluorescence emission with a lookup-table-based technique well-suited for parallelization. The MC model uses a graphics processor to produce dense Raman and fluorescence spectra over a range of 800 - 2000 cm-1 with an approximately 100× increase in speed over prior RS Monte Carlo methods. The simulated RS signals are compared against experimentally collected spectra from gelatin phantoms, showing a strong correlation.
Collapse
|
49
|
Lin Z, Chen R, Gao B, Qin S, Wu B, Liu J, Cai XC. A highly parallel simulation of patient-specific hepatic flows. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2021; 37:e3451. [PMID: 33609008 DOI: 10.1002/cnm.3451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/29/2021] [Accepted: 02/13/2021] [Indexed: 06/12/2023]
Abstract
Computational hemodynamics is being developed as an alternative approach for assisting clinical diagnosis and treatment planning for liver diseases. The technology is non-invasive, but the computational time could be high when the full geometry of the blood vessels is taken into account. Existing approaches use either one-dimensional model of the artery or simplified three-dimensional tubular geometry in order to reduce the computational time, but the accuracy is sometime compromised, for example, when simulating blood flows in arteries with plaque. In this work, we study a highly parallel method for the transient incompressible Navier-Stokes equations for the simulation of the blood flows in the full three-dimensional patient-specific hepatic artery, portal vein and hepatic vein. As applications, we also simulate the flow in a patient with hepatectomy and calculate the S (PPG). One of the advantages of simulating blood flows in all hepatic vessels is that it provides a direct estimate of the PPG, which is a gold standard value to assess the portal hypertension. Moreover, the robustness and scalability of the algorithm are also investigated. A 83% parallel efficiency is achieved for solving a problem with 7 million elements on a supercomputer with more than 1000 processor cores.
Collapse
|
50
|
He W, Yang D, Peng H, Liang S, Lin Y. An Efficient Ensemble Binarized Deep Neural Network on Chip with Perception-Control Integrated. SENSORS 2021; 21:s21103407. [PMID: 34068351 PMCID: PMC8153352 DOI: 10.3390/s21103407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 05/03/2021] [Accepted: 05/10/2021] [Indexed: 11/18/2022]
Abstract
Lightweight UAVs equipped with deep learning models have become a trend, which can be deployed for automatic navigation in a wide range of civilian and military missions. However, real-time applications usually need to process a large amount of image data, which leads to a very large computational complexity and storage consumption, and restricts its deployment on resource-constrained embedded edge devices. To reduce the computing requirements and storage occupancy of the neural network model, we proposed the ensemble binarized DroNet (EBDN) model, which implemented the reconstructed DroNet with the binarized and ensemble learning method, so that the model size of DroNet was effectively compressed, and ensemble learning method was used to overcome the defect of the poor performance of the low-precision network. Compared to the original DroNet, EBDN saves more than 7 times of memory footprint with similar model accuracy. Meanwhile, we also proposed a novel and high-efficiency hardware architecture to realize the EBDN on the chip (EBDNoC) system, which perfectly realizes the mapping of an algorithm model to hardware architecture. Compared to other solutions, the proposed architecture achieves about 10.21 GOP/s/kLUTs resource efficiency and 208.1 GOP/s/W energy efficiency, while also providing a good trade-off between model performance and resource utilization.
Collapse
|