26
|
Lai X, Taskén HA, Mo T, Funke SW, Frigessi A, Rognes ME, Köhn-Luque A. A scalable solver for a stochastic, hybrid cellular automaton model of personalized breast cancer therapy. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2022; 38:e3542. [PMID: 34716985 DOI: 10.1002/cnm.3542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 10/24/2021] [Indexed: 06/13/2023]
Abstract
Mathematical modeling and simulation is a promising approach to personalized cancer medicine. Yet, the complexity, heterogeneity and multi-scale nature of cancer pose significant computational challenges. Coupling discrete cell-based models with continuous models using hybrid cellular automata (CA) is a powerful approach for mimicking biological complexity and describing the dynamical exchange of information across different scales. However, when clinically relevant cancer portions are taken into account, such models become computationally very expensive. While efficient parallelization techniques for continuous models exist, their coupling with discrete models, particularly CA, necessitates more elaborate solutions. Building upon FEniCS, a popular and powerful scientific computing platform for solving partial differential equations, we developed parallel algorithms to link stochastic CA with differential equations (https://bitbucket.org/HTasken/cansim). The algorithms minimize the communication between processes that share CA neighborhood values while also allowing for reproducibility during stochastic updates. We demonstrated the potential of our solution on a complex hybrid cellular automaton model of breast cancer treated with combination chemotherapy. On a single-core processor, we obtained nearly linear scaling with an increasing problem size, whereas weak parallel scaling showed moderate growth in solving time relative to increase in problem size. Finally, we applied the algorithm to a problem that is 500 times larger than previous work, allowing us to run personalized therapy simulations based on heterogeneous cell density and tumor perfusion conditions estimated from magnetic resonance imaging data on an unprecedented scale.
Collapse
|
27
|
Yan W, Ansari S, Lamson A, Glaser MA, Blackwell R, Betterton MD, Shelley M. Toward the cellular-scale simulation of motor-driven cytoskeletal assemblies. eLife 2022; 11:74160. [PMID: 35617115 PMCID: PMC9135453 DOI: 10.7554/elife.74160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 04/24/2022] [Indexed: 11/17/2022] Open
Abstract
The cytoskeleton - a collection of polymeric filaments, molecular motors, and crosslinkers - is a foundational example of active matter, and in the cell assembles into organelles that guide basic biological functions. Simulation of cytoskeletal assemblies is an important tool for modeling cellular processes and understanding their surprising material properties. Here, we present aLENS (a Living Ensemble Simulator), a novel computational framework designed to surmount the limits of conventional simulation methods. We model molecular motors with crosslinking kinetics that adhere to a thermodynamic energy landscape, and integrate the system dynamics while efficiently and stably enforcing hard-body repulsion between filaments. Molecular potentials are entirely avoided in imposing steric constraints. Utilizing parallel computing, we simulate tens to hundreds of thousands of cytoskeletal filaments and crosslinking motors, recapitulating emergent phenomena such as bundle formation and buckling. This simulation framework can help elucidate how motor type, thermal fluctuations, internal stresses, and confinement determine the evolution of cytoskeletal active matter.
Collapse
|
28
|
Zhang S, Qiang Y. Fast parallel implementation for total variation constrained algebraic reconstruction technique. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2022; 30:737-750. [PMID: 35527622 DOI: 10.3233/xst-221163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In computed tomography (CT), the total variation (TV) constrained algebraic reconstruction technique (ART) can obtain better reconstruction quality when the projection data are sparse and noisy. However, the ART-TV algorithm remains time-consuming since it requires large numbers of iterations, especially for the reconstruction of high-resolution images. In this work, we propose a fast algorithm to calculate the system matrix for line intersection model and apply this algorithm to perform the forward-projection and back-projection operations of the ART. Then, we utilize the parallel computing techniques of multithreading and graphics processing units (GPU) to accelerate the ART iteration and the TV minimization, respectively. Numerical experiments show that our proposed parallel implementation approach is very efficient and accurate. For the reconstruction of a 2048 × 2048 image from 180 projection views of 2048 detector bins, it takes about 2.2 seconds to perform one iteration of the ART-TV algorithm using our proposed approach on a ten-core platform. Experimental results demonstrate that our new approach achieves a speedup of 23 times over the conventional single-threaded CPU implementation that using the Siddon algorithm.
Collapse
|
29
|
Guerrero-Araya E, Muñoz M, Rodríguez C, Paredes-Sabja D. FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies. Bioinform Biol Insights 2021; 15:11779322211059238. [PMID: 34866905 PMCID: PMC8637782 DOI: 10.1177/11779322211059238] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/19/2021] [Indexed: 11/21/2022] Open
Abstract
Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the
intra-species level for epidemiologic and evolutionary purposes. It operates by
assigning a sequence type (ST) identifier to each specimen, based on a
combination of alleles of multiple housekeeping genes included in a defined
scheme. The use of MLST has multiplied due to the availability of large numbers
of genomic sequences and epidemiologic data in public repositories. However,
data processing speed has become problematic due to the massive size of modern
datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST
searches using BLASTn and a divide-and-conquer approach that processes each
genome assembly in parallel. The output offered by FastMLST includes a table
with the ST, allelic profile, and clonal complex or clade (when available),
detected for a query, as well as a multi-FASTA file or a series of FASTA files
with the concatenated or single allele sequences detected, respectively.
FastMLST was validated with 91 different species, with a wide range of
guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a
speed test was performed on 3 datasets with varying genome sizes. Compared with
other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes
advantage of multiple processors to simultaneously type up to 28 000 genomes in
less than 10 minutes, reducing processing times by at least 3-fold with 100%
concordance to PubMLST, if contaminated genomes are excluded from the analysis.
The source code, installation instructions, and documentation of FastMLST are
available at https://github.com/EnzoAndree/FastMLST
Collapse
|
30
|
Iserte S, Carratalà P, Arnau R, Martínez-Cuenca R, Barreda P, Basiero L, Climent J, Chiva S. Modeling of wastewater treatment processes with hydrosludge. WATER ENVIRONMENT RESEARCH : A RESEARCH PUBLICATION OF THE WATER ENVIRONMENT FEDERATION 2021; 93:3049-3063. [PMID: 34755418 DOI: 10.1002/wer.1656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 10/18/2021] [Accepted: 10/22/2021] [Indexed: 06/13/2023]
Abstract
The pressure for Water Resource Recovery Facilities (WRRF) operators to efficiently treat wastewater is greater than ever because of the water crisis, produced by the climate change effects and more restrictive regulations. Technicians and researchers need to evaluate WRRF performance to ensure maximum efficiency. For this purpose, numerical techniques, such as CFD, have been widely applied to the wastewater sector to model biological reactors and secondary settling tanks with high spatial and temporal accuracy. However, limitations such as complexity and learning curve prevent extending CFD usage among wastewater modeling experts. This paper presents HydroSludge, a framework that provides a series of tools that simplify the implementation of the processes and workflows in a WRRF. This work leverages HydroSludge to preprocess existing data, aid the meshing process, and perform CFD simulations. Its intuitive interface proves itself as an effective tool to increase the efficiency of wastewater treatment. PRACTITIONER POINTS: This paper introduces a software platform specifically oriented to WRRF, named HydroSludge, which provides easy access to the most widespread and leading CFD simulation software, OpenFOAM. Hydrosludge is intended to be used by WRRF operators, bringing a more wizard-like, automatic, and intuitive usage. Meshing assistance, submersible mixers, biological models, and distributed parallel computing are the most remarkable features included in HydroSludge. With the provided study cases, HydroSludge has proven to be a crucial tool for operators, managers, and researchers in WRRF.
Collapse
|
31
|
Cury LFM, Maso Talou GD, Younes-Ibrahim M, Blanco PJ. Parallel generation of extensive vascular networks with application to an archetypal human kidney model. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210973. [PMID: 34966553 PMCID: PMC8633801 DOI: 10.1098/rsos.210973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 10/28/2021] [Indexed: 05/25/2023]
Abstract
Given the relevance of the inextricable coupling between microcirculation and physiology, and the relation to organ function and disease progression, the construction of synthetic vascular networks for mathematical modelling and computer simulation is becoming an increasingly broad field of research. Building vascular networks that mimic in vivo morphometry is feasible through algorithms such as constrained constructive optimization (CCO) and variations. Nevertheless, these methods are limited by the maximum number of vessels to be generated due to the whole network update required at each vessel addition. In this work, we propose a CCO-based approach endowed with a domain decomposition strategy to concurrently create vascular networks. The performance of this approach is evaluated by analysing the agreement with the sequentially generated networks and studying the scalability when building vascular networks up to 200 000 vascular segments. Finally, we apply our method to vascularize a highly complex geometry corresponding to the cortex of a prototypical human kidney. The technique presented in this work enables the automatic generation of extensive vascular networks, removing the limitation from previous works. Thus, we can extend vascular networks (e.g. obtained from medical images) to pre-arteriolar level, yielding patient-specific whole-organ vascular models with an unprecedented level of detail.
Collapse
|
32
|
Yang X, Wang W, Ma JL, Qiu YL, Lu K, Cao DS, Wu CK. BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution. Brief Bioinform 2021; 23:6440126. [PMID: 34849567 PMCID: PMC8690188 DOI: 10.1093/bib/bbab491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/24/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023] Open
Abstract
Motivation Understanding chemical–gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. Results We developed BioNet, a deep biological networkmodel with a graph encoder–decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.
Collapse
|
33
|
Shuai M, He D, Chen X. Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix. Stat Appl Genet Mol Biol 2021; 20:145-153. [PMID: 34757703 DOI: 10.1515/sagmb-2021-0025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 10/08/2021] [Indexed: 02/06/2023]
Abstract
Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks), where each module is assumed to be associated with a certain biological function. The most time-consuming step of WGCNA is to calculate TOM from the Adjacency Matrix in a single thread. In this paper, the single-threaded algorithm of the TOM has been changed into a multi-threaded algorithm (the parameters are the default values of WGCNA). In the multi-threaded algorithm, Rcpp was used to make R call a C++ function, and then C++ used OpenMP to start multiple threads to calculate TOM from the Adjacency Matrix. On shared-memory MultiProcessor systems, the calculation time decreases as the number of CPU cores increases. The algorithm of this paper can promote the application of WGCNA on large data sets, and help other research fields to identify sub-networks in undirected scale-free hierarchical weighted networks. The source codes and usage are available at https://github.com/do-somethings-haha/multi-threaded_calculate_unsigned_TOM_from_unsigned_or_signed_Adjacency_Matrix_of_WGCNA.
Collapse
|
34
|
Ko S, Li GX, Choi H, Won JH. Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx. Brief Bioinform 2021; 22:bbab256. [PMID: 34254998 PMCID: PMC8575036 DOI: 10.1093/bib/bbab256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 06/15/2021] [Accepted: 06/17/2021] [Indexed: 12/20/2022] Open
Abstract
Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.
Collapse
|
35
|
Lu H, Wei Z, Wang C, Guo J, Zhou Y, Wang Z, Liu H. Redesigning Vina@QNLM for Ultra-Large-Scale Molecular Docking and Screening on a Sunway Supercomputer. Front Chem 2021; 9:750325. [PMID: 34778205 PMCID: PMC8581564 DOI: 10.3389/fchem.2021.750325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/14/2021] [Indexed: 11/28/2022] Open
Abstract
Ultra-large-scale molecular docking can improve the accuracy of lead compounds in drug discovery. In this study, we developed a molecular docking piece of software, Vina@QNLM, which can use more than 4,80,000 parallel processes to search for potential lead compounds from hundreds of millions of compounds. We proposed a task scheduling mechanism for large-scale parallelism based on Vinardo and Sunway supercomputer architecture. Then, we readopted the core docking algorithm to incorporate the full advantage of the heterogeneous multicore processor architecture in intensive computing. We successfully expanded it to 10, 465, 065 cores (1,61,001 management process elements and 0, 465, 065 computing process elements), with a strong scalability of 55.92%. To the best of our knowledge, this is the first time that 10 million cores are used for molecular docking on Sunway. The introduction of the heterogeneous multicore processor architecture achieved the best speedup, which is 11x more than that of the management process element of Sunway. The performance of Vina@QNLM was comprehensively evaluated using the CASF-2013 and CASF-2016 protein-ligand benchmarks, and the screening power was the highest out of the 27 pieces of software tested in the CASF-2013 benchmark. In some existing applications, we used Vina@QNLM to dock more than 10 million molecules to nine rigid proteins related to SARS-CoV-2 within 8.5 h on 10 million cores. We also developed a platform for the general public to use the software.
Collapse
|
36
|
Thomine O, Alizon S, Boennec C, Barthelemy M, Sofonea M. Emerging dynamics from high-resolution spatial numerical epidemics. eLife 2021; 10:71417. [PMID: 34652271 PMCID: PMC8568339 DOI: 10.7554/elife.71417] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Simulating nationwide realistic individual movements with a detailed geographical structure can help optimise public health policies. However, existing tools have limited resolution or can only account for a limited number of agents. We introduce Epidemap, a new framework that can capture the daily movement of more than 60 million people in a country at a building-level resolution in a realistic and computationally efficient way. By applying it to the case of an infectious disease spreading in France, we uncover hitherto neglected effects, such as the emergence of two distinct peaks in the daily number of cases or the importance of local density in the timing of arrival of the epidemic. Finally, we show that the importance of super-spreading events strongly varies over time.
Collapse
|
37
|
Parallel Algorithm on GPU for Wireless Sensor Data Acquisition Using a Team of Unmanned Aerial Vehicles. SENSORS 2021; 21:s21206851. [PMID: 34696064 PMCID: PMC8541541 DOI: 10.3390/s21206851] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/30/2021] [Accepted: 10/12/2021] [Indexed: 11/16/2022]
Abstract
This paper proposes a framework for the wireless sensor data acquisition using a team of Unmanned Aerial Vehicles (UAVs). Scattered over a terrain, the sensors detect information about their surroundings and can transmit this information wirelessly over a short range. With no access to a terrestrial or satellite communication network to relay the information to, UAVs are used to visit the sensors and collect the data. The proposed framework uses an iterative k-means algorithm to group the sensors into clusters and to identify Download Points (DPs) where the UAVs hover to download the data. A Single-Source–Shortest-Path algorithm (SSSP) is used to compute optimal paths between every pair of DPs with a constraint to reduce the number of turns. A genetic algorithm supplemented with a 2-opt local search heuristic is used to solve the multi-travelling salesperson problem and to find optimized tours for each UAVs. Finally, a collision avoidance strategy is implemented to guarantee collision-free trajectories. Concerned with the overall runtime of the framework, the SSSP algorithm is implemented in parallel on a graphics processing unit. The proposed framework is tested in simulation using three UAVs and realistic 3D maps with up to 100 sensors and runs in just 20.7 s, a 33.3× speed-up compared to a sequential execution on CPU. The results show that the proposed method is efficient at calculating optimized trajectories for the UAVs for data acquisition from wireless sensors. The results also show the significant advantage of the parallel implementation on GPU.
Collapse
|
38
|
Goicovich I, Olivares P, Román C, Vázquez A, Poupon C, Mangin JF, Guevara P, Hernández C. Fiber Clustering Acceleration With a Modified Kmeans++ Algorithm Using Data Parallelism. Front Neuroinform 2021; 15:727859. [PMID: 34539370 PMCID: PMC8445177 DOI: 10.3389/fninf.2021.727859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 08/10/2021] [Indexed: 11/13/2022] Open
Abstract
Fiber clustering methods are typically used in brain research to study the organization of white matter bundles from large diffusion MRI tractography datasets. These methods enable exploratory bundle inspection using visualization and other methods that require identifying brain white matter structures in individuals or a population. Some applications, such as real-time visualization and inter-subject clustering, need fast and high-quality intra-subject clustering algorithms. This work proposes a parallel algorithm using a General Purpose Graphics Processing Unit (GPGPU) for fiber clustering based on the FFClust algorithm. The proposed GPGPU implementation exploits data parallelism using both multicore and GPU fine-grained parallelism present in commodity architectures, including current laptops and desktop computers. Our approach implements all FFClust steps in parallel, improving execution times in all of them. In addition, our parallel approach includes a parallel Kmeans++ algorithm implementation and defines a new variant of Kmeans++ to reduce the impact of choosing outliers as initial centroids. The results show that our approach provides clustering quality results very similar to FFClust, and it requires an execution time of 3.5 s for processing about a million fibers, achieving a speedup of 11.5 times compared to FFClust.
Collapse
|
39
|
Lebedev I, Lovskaya D, Mochalova M, Mitrofanov I, Menshutina N. Cellular Automata Modeling of Three-Dimensional Chitosan-Based Aerogels Fiberous Structures with Bezier Curves. Polymers (Basel) 2021; 13:polym13152511. [PMID: 34372113 PMCID: PMC8348900 DOI: 10.3390/polym13152511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 07/27/2021] [Accepted: 07/28/2021] [Indexed: 11/16/2022] Open
Abstract
In this work, a cellular automata approach was investigated for modeling three-dimensional fibrous nanoporous aerogel structures. A model for the generation of fibrous structures using the Bezier curves is proposed. Experimental chitosan-based aerogel particles were obtained for which analytical studies of the structural characteristics were carried out. The data obtained were used to generate digital copies of chitosan-based aerogel structures and to assess the accuracy of the developed model. The obtained digital copies of chitosan-based aerogel structures will be used to create digital copies of aerogel structures with embedded active pharmaceutical ingredients (APIs) and further predict the release of APIs from these structures.
Collapse
|
40
|
Nilforooshan MA, Garrick D, Harris B. Alternative Ways of Computing the Numerator Relationship Matrix. Front Genet 2021; 12:655638. [PMID: 34394180 PMCID: PMC8356081 DOI: 10.3389/fgene.2021.655638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/17/2021] [Indexed: 11/25/2022] Open
Abstract
Pedigree relationships between every pair of individuals forms the elements of the additive genetic relationship matrix (A). Calculation of A−1 does not require forming and inverting A, and it is faster and easier than the calculation of A. Although A−1 is used in best linear unbiased prediction of genetic merit, A is used in population studies and post-evaluation procedures, such as breeding programs and controlling the rate of inbreeding. Three pedigrees with 20,000 animals (20K) and different (1, 2, 4) litter sizes, and a pedigree with 180,000 animals (180K) and litter size 2 were simulated. Aiming to reduce the computation time for calculating A, new methods [Array-Tabular method, (T−1)−1 instead of T in Thompson's method, iterative updating of D in Thompson's method, and iteration by generation] were developed and compared with some existing methods. The methods were coded in the R programming language to demonstrate the algorithms, aiming for minimizing the computational time. Among 20K, computational time decreased with increasing litter size for most of the methods. Methods deriving A from A−1 were relatively slow. The other methods were either using only pedigree information or both the pedigree and inbreeding coefficients. Calculating inbreeding coefficients was extremely fast (<0.2 s for 180K). Parallel computing (15 cores) was adopted for methods that were based on solving A−1 for columns of A, as those methods allowed implicit parallelism. Optimizing the code for one of the earliest methods enabled A to be built in 13 s (faster than the 31 s for calculating A−1) for 20K and 17 min 3 s for 180K. Memory is a bottleneck for large pedigrees but attempts to reduce the memory usage increased the computational time. To reduce disk space usage, memory usage, and computational time, relationship coefficients of old animals in the pedigree can be archived and relationship coefficients for parents of the next generation can be saved in an external file for successive updates to the pedigree and the A matrix.
Collapse
|
41
|
Dumont AP, Fang Q, Patil CA. A computationally efficient Monte-Carlo model for biomedical Raman spectroscopy. JOURNAL OF BIOPHOTONICS 2021; 14:e202000377. [PMID: 33733621 PMCID: PMC10069992 DOI: 10.1002/jbio.202000377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 05/29/2023]
Abstract
Monte Carlo (MC) modeling is a valuable tool to gain fundamental understanding of light-tissue interactions, provide guidance and assessment to optical instrument designs, and help analyze experimental data. It has been a major challenge to efficiently extend MC towards modeling of bulk-tissue Raman spectroscopy (RS) due to the wide spectral range, relatively sharp spectral features, and presence of background autofluorescence. Here, we report a computationally efficient MC approach for RS by adapting the massively-parallel Monte Carlo eXtreme (MCX) simulator. Simulation efficiency is achieved through "isoweight," a novel approach that combines the statistical generation of Raman scattered and Fluorescence emission with a lookup-table-based technique well-suited for parallelization. The MC model uses a graphics processor to produce dense Raman and fluorescence spectra over a range of 800 - 2000 cm-1 with an approximately 100× increase in speed over prior RS Monte Carlo methods. The simulated RS signals are compared against experimentally collected spectra from gelatin phantoms, showing a strong correlation.
Collapse
|
42
|
Lin Z, Chen R, Gao B, Qin S, Wu B, Liu J, Cai XC. A highly parallel simulation of patient-specific hepatic flows. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2021; 37:e3451. [PMID: 33609008 DOI: 10.1002/cnm.3451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/29/2021] [Accepted: 02/13/2021] [Indexed: 06/12/2023]
Abstract
Computational hemodynamics is being developed as an alternative approach for assisting clinical diagnosis and treatment planning for liver diseases. The technology is non-invasive, but the computational time could be high when the full geometry of the blood vessels is taken into account. Existing approaches use either one-dimensional model of the artery or simplified three-dimensional tubular geometry in order to reduce the computational time, but the accuracy is sometime compromised, for example, when simulating blood flows in arteries with plaque. In this work, we study a highly parallel method for the transient incompressible Navier-Stokes equations for the simulation of the blood flows in the full three-dimensional patient-specific hepatic artery, portal vein and hepatic vein. As applications, we also simulate the flow in a patient with hepatectomy and calculate the S (PPG). One of the advantages of simulating blood flows in all hepatic vessels is that it provides a direct estimate of the PPG, which is a gold standard value to assess the portal hypertension. Moreover, the robustness and scalability of the algorithm are also investigated. A 83% parallel efficiency is achieved for solving a problem with 7 million elements on a supercomputer with more than 1000 processor cores.
Collapse
|
43
|
He W, Yang D, Peng H, Liang S, Lin Y. An Efficient Ensemble Binarized Deep Neural Network on Chip with Perception-Control Integrated. SENSORS 2021; 21:s21103407. [PMID: 34068351 PMCID: PMC8153352 DOI: 10.3390/s21103407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 05/03/2021] [Accepted: 05/10/2021] [Indexed: 11/18/2022]
Abstract
Lightweight UAVs equipped with deep learning models have become a trend, which can be deployed for automatic navigation in a wide range of civilian and military missions. However, real-time applications usually need to process a large amount of image data, which leads to a very large computational complexity and storage consumption, and restricts its deployment on resource-constrained embedded edge devices. To reduce the computing requirements and storage occupancy of the neural network model, we proposed the ensemble binarized DroNet (EBDN) model, which implemented the reconstructed DroNet with the binarized and ensemble learning method, so that the model size of DroNet was effectively compressed, and ensemble learning method was used to overcome the defect of the poor performance of the low-precision network. Compared to the original DroNet, EBDN saves more than 7 times of memory footprint with similar model accuracy. Meanwhile, we also proposed a novel and high-efficiency hardware architecture to realize the EBDN on the chip (EBDNoC) system, which perfectly realizes the mapping of an algorithm model to hardware architecture. Compared to other solutions, the proposed architecture achieves about 10.21 GOP/s/kLUTs resource efficiency and 208.1 GOP/s/W energy efficiency, while also providing a good trade-off between model performance and resource utilization.
Collapse
|
44
|
Huang W, Zhou J, Zhang D. On-the-Fly Fusion of Remotely-Sensed Big Data Using an Elastic Computing Paradigm with a Containerized Spark Engine on Kubernetes. SENSORS 2021; 21:s21092971. [PMID: 33922709 PMCID: PMC8122984 DOI: 10.3390/s21092971] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/16/2021] [Accepted: 04/21/2021] [Indexed: 11/16/2022]
Abstract
Remotely-sensed satellite image fusion is indispensable for the generation of long-term gap-free Earth observation data. While cloud computing (CC) provides the big picture for RS big data (RSBD), the fundamental question of the efficient fusion of RSBD on CC platforms has not yet been settled. To this end, we propose a lightweight cloud-native framework for the elastic processing of RSBD in this study. With the scaling mechanisms provided by both the Infrastructure as a Service (IaaS) and Platform as a Services (PaaS) of CC, the Spark-on-Kubernetes operator model running in the framework can enhance the efficiency of Spark-based algorithms without considering bottlenecks such as task latency caused by an unbalanced workload, and can ease the burden to tune the performance parameters for their parallel algorithms. Internally, we propose a task scheduling mechanism (TSM) to dynamically change the Spark executor pods' affinities to the computing hosts. The TSM learns the workload of a computing host. Learning from the ratio between the number of completed and failed tasks on a computing host, the TSM dispatches Spark executor pods to newer and less-overwhelmed computing hosts. In order to illustrate the advantage, we implement a parallel enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to enable the efficient fusion of big RS images with a Spark aggregation function. We construct an OpenStack cloud computing environment to test the usability of the framework. According to the experiments, TSM can improve the performance of the PESTARFM using only PaaS scaling to about 11.7%. When using both the IaaS and PaaS scaling, the maximum performance gain with the TSM can be even greater than 13.6%. The fusion of such big Sentinel and PlanetScope images requires less than 4 min in the experimental environment.
Collapse
|
45
|
Knight JC, Komissarov A, Nowotny T. PyGeNN: A Python Library for GPU-Enhanced Neural Networks. Front Neuroinform 2021; 15:659005. [PMID: 33967731 PMCID: PMC8100330 DOI: 10.3389/fninf.2021.659005] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 03/15/2021] [Indexed: 11/23/2022] Open
Abstract
More than half of the Top 10 supercomputing sites worldwide use GPU accelerators and they are becoming ubiquitous in workstations and edge computing devices. GeNN is a C++ library for generating efficient spiking neural network simulation code for GPUs. However, until now, the full flexibility of GeNN could only be harnessed by writing model descriptions and simulation code in C++. Here we present PyGeNN, a Python package which exposes all of GeNN's functionality to Python with minimal overhead. This provides an alternative, arguably more user-friendly, way of using GeNN and allows modelers to use GeNN within the growing Python-based machine learning and computational neuroscience ecosystems. In addition, we demonstrate that, in both Python and C++ GeNN simulations, the overheads of recording spiking data can strongly affect runtimes and show how a new spike recording system can reduce these overheads by up to 10×. Using the new recording system, we demonstrate that by using PyGeNN on a modern GPU, we can simulate a full-scale model of a cortical column faster even than real-time neuromorphic systems. Finally, we show that long simulations of a smaller model with complex stimuli and a custom three-factor learning rule defined in PyGeNN can be simulated almost two orders of magnitude faster than real-time.
Collapse
|
46
|
Gangopadhyay A, Winberg S, Naidoo KJ. Anisotropic numerical potentials for coarse-grained modeling from high-speed multidimensional lookup table and interpolation algorithms. J Comput Chem 2021; 42:666-675. [PMID: 33547644 DOI: 10.1002/jcc.26487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 01/15/2021] [Accepted: 01/18/2021] [Indexed: 11/12/2022]
Abstract
A high-speed numerical potential delivering computational performance comparable with complex coarse-grained analytic potentials makes available models that have greater degrees of physical and chemical accuracy. This opens the possibility of increased accuracy in classical molecular dynamics simulations of anisotropic systems. In this work, we report the development of a high-speed lookup table (LUT) of four-dimensional gridded data, that uses cubic B-spline interpolations to derive off grid values and their associated partial derivatives that are located between the known grid data points. The accuracy of the coarse-grained numerical potential using a LUT from uniaxial Gay-Berne (GB) potential produced array of values is within a 3% and a 5% margin of error respectively for the interpolation of the uniaxial GB potential and its partial derivatives. The numerical potential model and partial derivatives speedup is made competitive with the analytical potential by exploiting graphics processing units on board functionality. The capability of the numerical potential is demonstrated by comparing minimizations of a box of 500 naphthalene molecules. The minimizations using a full atomistic (NAMD/CHARMM force field), a biaxial GB and a numerical potential from a LUT using data from the CHARMM pair potential was done. The numerical potential model is significantly more accurate in its approximation of the atomistic local minimum configuration than is the biaxial GB analytical potential function. This demonstrates that using a numerical potential founded on a direct lookup of the atomistic potential landscape significantly improves coarse grain (CG) modeling of complex molecules, possibly paving the way for accurate anisotropic system CG modeling.
Collapse
|
47
|
Zou Y, Zhu Y, Li Y, Wu FX, Wang J. Parallel computing for genome sequence processing. Brief Bioinform 2021; 22:6210355. [PMID: 33822883 DOI: 10.1093/bib/bbab070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/26/2021] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.
Collapse
|
48
|
Liu M, Zhao F, Jiang X, Zhang H, Zhou H. Parallel Binary Image Cryptosystem Via Spiking Neural Networks Variants. Int J Neural Syst 2021; 32:2150014. [PMID: 33637028 DOI: 10.1142/s0129065721500143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the inefficiency of multiple binary images encryption, a parallel binary image encryption framework based on the typical variants of spiking neural networks, spiking neural P (SNP) systems is proposed in this paper. More specifically, the two basic units in the proposed image cryptosystem, the permutation unit and the diffusion unit, are designed through SNP systems with multiple channels and polarizations (SNP-MCP systems), and SNP systems with astrocyte-like control (SNP-ALC systems), respectively. Different from the serial computing of the traditional image permutation/diffusion unit, SNP-MCP-based permutation/SNP-ALC-based diffusion unit can realize parallel computing through the parallel use of rules inside the neurons. Theoretical analysis results confirm the high efficiency of the binary image proposed cryptosystem. Security analysis experiments demonstrate the security of the proposed cryptosystem.
Collapse
|
49
|
Xi NM, Li JJ. Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst 2021; 12:176-194.e6. [PMID: 33338399 PMCID: PMC7897250 DOI: 10.1016/j.cels.2020.11.008] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 10/06/2020] [Accepted: 11/19/2020] [Indexed: 12/29/2022]
Abstract
In single-cell RNA sequencing (scRNA-seq), doublets form when two cells are encapsulated into one reaction volume. The existence of doublets, which appear to be-but are not-real cells, is a key confounder in scRNA-seq data analysis. Computational methods have been developed to detect doublets in scRNA-seq data; however, the scRNA-seq field lacks a comprehensive benchmarking of these methods, making it difficult for researchers to choose an appropriate method for specific analyses. We conducted a systematic benchmark study of nine cutting-edge computational doublet-detection methods. Our study included 16 real datasets, which contained experimentally annotated doublets, and 112 realistic synthetic datasets. We compared doublet-detection methods regarding detection accuracy under various experimental settings, impacts on downstream analyses, and computational efficiencies. Our results show that existing methods exhibited diverse performance and distinct advantages in different aspects. Overall, the DoubletFinder method has the best detection accuracy, and the cxds method has the highest computational efficiency. A record of this paper's transparent peer review process is included in the Supplemental Information.
Collapse
|
50
|
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark. SENSORS 2021; 21:s21020365. [PMID: 33430375 PMCID: PMC7827788 DOI: 10.3390/s21020365] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/29/2022]
Abstract
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
Collapse
|