Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

175
(from Reference Citation Analysis)

Article PDFs (78)

Cited by ≥ 1 (97)

Searched Name

parallel computing

Year Published

Show more Refine

Article Statistics

Refine

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Journal Articles

Number	Citation Analysis
1	Zhang P, Jiang Z, He Y, Li A. A distributed software system for integrating data-intensive imaging methods in a hard X-ray nanoprobe beamline at the SSRF. JOURNAL OF SYNCHROTRON RADIATION 2024;31:1234-1240. [PMID: 39172093 PMCID: PMC11371055 DOI: 10.1107/s1600577524006994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 07/17/2024] [Indexed: 08/23/2024] Abstract The development of hard X-ray nanoprobe techniques has given rise to a number of experimental methods, like nano-XAS, nano-XRD, nano-XRF, ptychography and tomography. Each method has its own unique data processing algorithms. With the increase in data acquisition rate, the large amount of generated data is now a big challenge to these algorithms. In this work, an intuitive, user-friendly software system is introduced to integrate and manage these algorithms; by taking advantage of the loosely coupled, component-based design approach of the system, the data processing speed of the imaging algorithm is enhanced through optimization of the parallelism efficiency. This study provides meaningful solutions to tackle complexity challenges faced in synchrotron data processing. Collapse Key Words distributed software system parallel computing ptychography synchrotron radiation big data Collapse MESH Headings Collapse Grants 2021YFA1601000 National Key Research and Development Program of China 23YF1453600 Science and Technology Commission of Shanghai Municipality 12305376 National Natural Science foundation of China Shanghai Municipal Science and Technology Major Project Photon Science Center for Carbon Neutrality Collapse
2	Ahsan R, Chae HU, Jalal SAA, Wu Z, Tao J, Das S, Liu H, Wu JB, Cronin SB, Wang H, Sideris C, Kapadia R. Ultralow Power In-Sensor Neuronal Computing with Oscillatory Retinal Neurons for Frequency-Multiplexed, Parallel Machine Vision. ACS NANO 2024;18:23785-23796. [PMID: 39140995 DOI: 10.1021/acsnano.4c09055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024] Abstract In-sensor and near-sensor computing architectures enable multiply accumulate operations to be carried out directly at the point of sensing. In-sensor architectures offer dramatic power and speed improvements over traditional von Neumann architectures by eliminating multiple analog-to-digital conversions, data storage, and data movement operations. Current in-sensor processing approaches rely on tunable sensors or additional weighting elements to perform linear functions such as multiply accumulate operations as the sensor acquires data. This work implements in-sensor computing with an oscillatory retinal neuron device that converts incident optical signals into voltage oscillations. A computing scheme is introduced based on the frequency shift of coupled oscillators that enables parallel, frequency multiplexed, nonlinear operations on the inputs. An experimentally implemented 3 × 3 focal plane array of coupled neurons shows that functions approximating edge detection, thresholding, and segmentation occur in parallel. An example of inference on handwritten digits from the MNIST database is also experimentally demonstrated with a 3 × 3 array of coupled neurons feeding into a single hidden layer neural network, approximating a liquid-state machine. Finally, the equivalent energy consumption to carry out image processing operations, including peripherals such as the Fourier transform circuits, is projected to be <20 fJ/OP, possibly reaching as low as 15 aJ/OP. Collapse Key Words in-sensor computing negative differential resistance oscillator oscillatory retinal neurons parallel computing ultralow power computing Collapse MESH Headings Retinal Neurons/physiology Retinal Neurons/cytology Neural Networks, Computer Neurons/physiology Neurons/cytology Animals Collapse Grants Collapse
3	Bandara YMNDY, Dutt S, Karawdeniya BI, Saharia J, Kluth P, Tricoli A. A Robust Parallel Computing Data Extraction Framework for Nanopore Experiments. SMALL METHODS 2024:e2400045. [PMID: 38967324 DOI: 10.1002/smtd.202400045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 05/24/2024] [Indexed: 07/06/2024] Abstract The success of a nanopore experiment relies not only on the quality of the experimental design but also on the performance of the analysis program utilized to decipher the ionic perturbations necessary for understanding the fundamental molecular intricacies. An event extraction framework is developed that leverages parallel computing, efficient memory management, and vectorization, yielding significant performance enhancement. The newly developed abf-ultra-simple function extracts key parameters from the header critical for the operation of open-seek-read-close data loading architecture running on multiple cores. This underpins the swift analysis of large files where an ≈ × 18 improvement is found for a 100 min-long file (≈4.5 GB) compared to the more traditional single (cell) array data loading method. The application is benchmarked against five other analysis platforms showcasing significant performance enhancement (>2 ×-1120 ×). The integrated provisions for batch analysis enable concurrently analyzing multiple files (vital for high-bandwidth experiments). Furthermore, the application is equipped with multi-level data fitting based on abrupt changes in the event waveform. The application condenses the extracted events to a single binary file improving data portability (e.g., 16 GB file with 28 182 events reduces to 47.9 MB-343 × size reduction) and enables a multitude of post-analysis extractions to be done efficiently. Collapse Key Words event extraction fast analysis nanopores parallel computing single molecule sensing Collapse MESH Headings Collapse Grants FT200100939 Australian Research Council DP190101864 Australian Research Council NS210100083 Australian Research Council DP180100068 Australian Research Council Ramaciotti Foundations Collapse
4	Quelhas KN, Henn MA, Farias R, Tew WL, Woods SI. GPU-accelerated parallel image reconstruction strategies for magnetic particle imaging. Phys Med Biol 2024;69:135005. [PMID: 38843809 DOI: 10.1088/1361-6560/ad5510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 06/06/2024] [Indexed: 06/25/2024] Abstract Objective. Image reconstruction is a fundamental step in magnetic particle imaging (MPI). One of the main challenges is the fact that the reconstructions are computationally intensive and time-consuming, so choosing an algorithm presents a compromise between accuracy and execution time, which depends on the application. This work proposes a method that provides both fast and accurate image reconstructions.Approach. Image reconstruction algorithms were implemented to be executed in parallel ingraphics processing units(GPUs) using the CUDA framework. The calculation of the model-based MPI calibration matrix was also implemented in GPU to allow both fast and flexible reconstructions.Main results. The parallel algorithms were able to accelerate the reconstructions by up to about6,100times in comparison to the serial Kaczmarz algorithm executed in the CPU, allowing for real-time applications. Reconstructions using the OpenMPIData dataset validated the proposed algorithms and demonstrated that they are able to provide both fast and accurate reconstructions. The calculation of the calibration matrix was accelerated by up to about 37 times.Significance. The parallel algorithms proposed in this work can provide single-frame MPI reconstructions in real time, with frame rates greater than 100 frames per second. The parallel calculation of the calibration matrix can be combined with the parallel reconstruction to deliver images in less time than the serial Kaczmarz reconstruction, potentially eliminating the need of storing the calibration matrix in the main memory, and providing the flexibility of redefining scanning and reconstruction parameters during execution. Collapse Key Words CUDA. graphical processing unit image reconstruction magnetic particle imaging parallel computing Collapse MESH Headings Image Processing, Computer-Assisted/methods Algorithms Computer Graphics Time Factors Molecular Imaging/methods Calibration Collapse Grants Collapse
5	Barry T, Roeder K, Katsevich E. Exponential family measurement error models for single-cell CRISPR screens. Biostatistics 2024:kxae010. [PMID: 38649751 DOI: 10.1093/biostatistics/kxae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 01/10/2024] [Accepted: 03/11/2024] [Indexed: 04/25/2024] Open Abstract CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens-"thresholded regression"-exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights. Collapse Key Words CRISPR GLM mixture model parallel computing single cell Collapse MESH Headings Collapse Grants R01 MH123184 NIMH NIH HHS R01MH123184 NIMH NIH HHS Collapse
6	Yu I, Mori T, Matsuoka D, Surblys D, Sugita Y. SPANA: Spatial decomposition analysis for cellular-scale molecular dynamics simulations. J Comput Chem 2024;45:498-505. [PMID: 37966727 DOI: 10.1002/jcc.27260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/31/2023] [Accepted: 11/02/2023] [Indexed: 11/16/2023] Abstract The rapid increase in computational power with the latest supercomputers has enabled atomistic molecular dynamics (MDs) simulations of biomolecules in biological membrane, cytoplasm, and other cellular environments. These environments often contain a million or more atoms to be simulated simultaneously. Therefore, their trajectory analyses involve heavy computations that can become a bottleneck in the computational studies. Spatial decomposition analysis (SPANA) is a set of analysis tools in the Generalized-Ensemble Simulation System (GENESIS) software package that can carry out MD trajectory analyses of large-scale biological simulations using multiple CPU cores in parallel. SPANA applies the spatial decomposition of a large biological system to distribute structural and dynamical analyses into individual CPU cores, which reduces the computational time and the memory size, significantly. SPANA opens new possibilities for detailed atomistic analyses of biomacromolecules as well as solvent water molecules, ions, and metabolites in MD simulation trajectories of very large biological systems containing more than millions of atoms in cellular environments. Collapse Key Words cellular crowding molecular dynamics simulation parallel computing supercomputer trajectory analysis Collapse MESH Headings Molecular Dynamics Simulation Software Computers Collapse Grants 19H05645 Ministry of Education, Culture, Sports, Science and Technology 21H05249 Ministry of Education, Culture, Sports, Science and Technology 19K05389 Ministry of Education, Culture, Sports, Science and Technology RIKEN 26119006 Grant-in-Aid for Scientific Research on Innovative Area JST CREST Collapse
7	Olbrich M, Bartels L, Wohlers I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. FRONTIERS IN BIOINFORMATICS 2024;4:1384497. [PMID: 38567256 PMCID: PMC10985184 DOI: 10.3389/fbinf.2024.1384497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open Abstract Collapse Key Words computational genomics genome references genome sequencing long-read sequencing pangenomics parallel computing sequencing software systems genetics Collapse MESH Headings Collapse Grants Collapse
8	López-Ales E, Menchón-Lara RM, Simmross-Wattenberg F, Rodríguez-Cayetano M, Martín-Fernández M, Alberola-López C. Multi-Device Parallel MRI Reconstruction: Efficient Partitioning for Undersampled 5D Cardiac CINE. SENSORS (BASEL, SWITZERLAND) 2024;24:1313. [PMID: 38400470 PMCID: PMC10891760 DOI: 10.3390/s24041313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/04/2024] [Accepted: 02/16/2024] [Indexed: 02/25/2024] Abstract Cardiac CINE, a form of dynamic cardiac MRI, is indispensable in the diagnosis and treatment of heart conditions, offering detailed visualization essential for the early detection of cardiac diseases. As the demand for higher-resolution images increases, so does the volume of data requiring processing, presenting significant computational challenges that can impede the efficiency of diagnostic imaging. Our research presents an approach that takes advantage of the computational power of multiple Graphics Processing Units (GPUs) to address these challenges. GPUs are devices capable of performing large volumes of computations in a short period, and have significantly improved the cardiac MRI reconstruction process, allowing images to be produced faster. The innovation of our work resides in utilizing a multi-device system capable of processing the substantial data volumes demanded by high-resolution, five-dimensional cardiac MRI. This system surpasses the memory capacity limitations of single GPUs by partitioning large datasets into smaller, manageable segments for parallel processing, thereby preserving image integrity and accelerating reconstruction times. Utilizing OpenCL technology, our system offers adaptability and cross-platform functionality, ensuring wider applicability. The proposed multi-device approach offers an advancement in medical imaging, accelerating the reconstruction process and facilitating faster and more effective cardiac health assessment. Collapse Key Words MRI reconstruction cardiac CINE compressed sensing multi-GPU multi-device parallel computing Collapse MESH Headings Algorithms Magnetic Resonance Imaging Heart/diagnostic imaging Image Enhancement/methods Imaging, Three-Dimensional/methods Collapse Grants TEC2017-82408-R Ministerio de Asuntos Económicos y Transformación Digital PRE2018-086922 Ministerio de Asuntos Económicos y Transformación Digital PID2020-115339RB-I00 Agencia Estatal de Investigación TED2021-130090B-I00 Agencia Estatal de Investigación Collapse
9	Kopal I, Labaj I, Vršková J, Harničárová M, Valíček J, Tozan H. Intelligent Modelling of the Real Dynamic Viscosity of Rubber Blends Using Parallel Computing. Polymers (Basel) 2023;15:3636. [PMID: 37688262 PMCID: PMC10490080 DOI: 10.3390/polym15173636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 08/27/2023] [Accepted: 08/30/2023] [Indexed: 09/10/2023] Open Abstract Modelling the flow properties of rubber blends makes it possible to predict their rheological behaviour during the processing and production of rubber-based products. As the nonlinear nature of such complex processes complicates the creation of exact analytical models, it is appropriate to use artificial intelligence tools in this modelling. The present study was implemented to develop a highly efficient artificial neural network model, optimised using a novel training algorithm with fast parallel computing to predict the results of rheological tests of rubber blends performed under different conditions. A series of 120 real dynamic viscosity-time curves, acquired by a rubber process analyser for styrene-butadiene rubber blends with varying carbon black contents vulcanised at different temperatures, were analysed using a Generalised Regression Neural Network. The model was optimised by limiting the fitting error of the training dataset to a pre-specified value of less than 1%. All repeated calculations were made via parallel computing with multiple computer cores, which significantly reduces the total computation time. An excellent agreement between the predicted and measured generalisation data was found, with an error of less than 4.7%, confirming the high generalisation performance of the newly developed model. Collapse Key Words curing process generalised regression neural network intelligent modelling parallel computing rubber blends Collapse MESH Headings Collapse Grants 1/0236/21 Ministry of Education, Science, Research and Sport of the Slovak Republic 1/0691/23 Ministry of Education, Science, Research and Sport of the Slovak Republic Collapse
10	Jung J, Kobayashi C, Sugita Y. Acceleration of generalized replica exchange with solute tempering simulations of large biological systems on massively parallel supercomputer. J Comput Chem 2023. [PMID: 37141320 DOI: 10.1002/jcc.27124] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 04/10/2023] [Accepted: 04/14/2023] [Indexed: 05/06/2023] Abstract Generalized replica exchange with solute tempering (gREST) is one of the enhanced sampling algorithms for proteins or other systems with rugged energy landscapes. Unlike the replica-exchange molecular dynamics (REMD) method, solvent temperatures are the same in all replicas, while solute temperatures are different and are exchanged frequently between replicas for exploring various solute structures. Here, we apply the gREST scheme to large biological systems containing over one million atoms using a large number of processors in a supercomputer. First, communication time on a multi-dimensional torus network is reduced by matching each replica to MPI processors optimally. This is applicable not only to gREST but also to other multi-copy algorithms. Second, energy evaluations, which are necessary for the multistate bennet acceptance ratio (MBAR) method for free energy estimations, are performed on-the-fly during the gREST simulations. Using these two advanced schemes, we observed 57.72 ns/day performance in 128-replica gREST calculations with 1.5 million atoms system using 16,384 nodes in Fugaku. These schemes implemented in the latest version of GENESIS software could open new possibilities to answer unresolved questions on large biomolecular complex systems with slow conformational dynamics. Collapse Key Words MBAR biomolecular simulations enhanced conformational sampling free energy calculation high performance computing molecular dynamics parallel computing replica-exchange MD Collapse MESH Headings Collapse Grants Collapse
11	Zhang Y, Sun H, Lian X, Tang J, Zhu F. ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023;10:e2207061. [PMID: 36950745 DOI: 10.1002/advs.202207061] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/13/2023] [Indexed: 05/27/2023] Abstract ANPELA is widely used for quantifying traditional bulk proteomic data. Recently, there is a clear shift from bulk proteomics to the single-cell ones (SCP), for which powerful cytometry techniques demonstrate the fantastic capacity of capturing cellular heterogeneity that is completely overlooked by traditional bulk profiling. However, the in-depth and high-quality quantification of SCP data is still challenging and severely affected by the large numbers of quantification workflows and extreme performance dependence on the studied datasets. In other words, the proper selection of well-performing workflow(s) for any studied dataset is elusory, and it is urgently needed to have a significantly enhanced and accelerated tool to address this issue. However, no such tool is developed yet. Herein, ANPELA is therefore updated to its 2.0 version (https://idrblab.org/anpela/), which is unique in providing the most comprehensive set of quantification alternatives (>1000 workflows) among all existing tools, enabling systematic performance evaluation from multiple perspectives based on machine learning, and identifying the optimal workflow(s) using overall performance ranking together with the parallel computation. Extensive validation on different benchmark datasets and representative application scenarios suggest the great application potential of ANPELA in current SCP research for gaining more accurate and reliable biological insights. Collapse Key Words cell population identification comprehensive assessment parallel computing protein quantification single-cell proteomics trajectory inference Collapse MESH Headings Collapse Grants U1909208 National Natural Science Foundation of China 81872798 National Natural Science Foundation of China LR21H300001 Natural Science Foundation of Zhejiang Province Leading Talent of the "Ten Thousand Plan" - National High-Level Talents Special Support Plan of China; 2018QNA7023 Fundamental Research Fund for Central Universities 181201*194232101 "Double Top-Class" University Project 2020C03010 Key R&D Program of Zhejiang Province 2018QNA7023 Fundamental Research Funds for the Central Universities Collapse
12	AL-Jumaili AHA, Muniyandi RC, Hasan MK, Paw JKS, Singh MJ. Big Data Analytics Using Cloud Computing Based Frameworks for Power Management Systems: Status, Constraints, and Future Recommendations. SENSORS (BASEL, SWITZERLAND) 2023;23:2952. [PMID: 36991663 PMCID: PMC10051254 DOI: 10.3390/s23062952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/03/2023] [Accepted: 02/10/2023] [Indexed: 06/19/2023] Abstract Traditional parallel computing for power management systems has prime challenges such as execution time, computational complexity, and efficiency like process time and delays in power system condition monitoring, particularly consumer power consumption, weather data, and power generation for detecting and predicting data mining in the centralized parallel processing and diagnosis. Due to these constraints, data management has become a critical research consideration and bottleneck. To cope with these constraints, cloud computing-based methodologies have been introduced for managing data efficiently in power management systems. This paper reviews the concept of cloud computing architecture that can meet the multi-level real-time requirements to improve monitoring and performance which is designed for different application scenarios for power system monitoring. Then, cloud computing solutions are discussed under the background of big data, and emerging parallel programming models such as Hadoop, Spark, and Storm are briefly described to analyze the advancement, constraints, and innovations. The key performance metrics of cloud computing applications such as core data sampling, modeling, and analyzing the competitiveness of big data was modeled by applying related hypotheses. Finally, it introduces a new design concept with cloud computing and eventually some recommendations focusing on cloud computing infrastructure, and methods for managing real-time big data in the power management system that solve the data mining challenges. Collapse Key Words big data cloud computing data mining parallel computing power system Collapse MESH Headings Collapse Grants Collapse
13	Wang Y, Fu T, Wu C, Fan J, Song H, Xiao D, Lin Y, Liu F, Yang J. Adaptive tetrahedral interpolation for reconstruction of uneven freehand 3D ultrasound. Phys Med Biol 2023;68. [PMID: 36731138 DOI: 10.1088/1361-6560/acb88c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Abstract Objective.Freehand 3D ultrasound volume reconstruction has received considerable attention in medical research because it can freely perform spatial imaging at a low cost. However, the uneven distribution of the original ultrasound images in space reduces the reconstruction effect of the traditional method.Approach.An adaptive tetrahedral interpolation algorithm is proposed to reconstruct 3D ultrasound volume data. The algorithm adaptively divides the unevenly distributed images into numerous tetrahedrons and interpolates the voxel value in each tetrahedron to reconstruct 3D ultrasound volume data.Main results.Extensive experiments on simulated and clinical data confirm that the proposed method can achieve more accurate reconstruction than six benchmark methods. Specifically, the averaged interpolation error at the gray level can be reduced by 0.22-0.82, and the peak signal-to-noise ratio and the mean structure similarity can be improved by 0.32-1.83 dB and 0.01-0.05, respectively.Significance.With the parallel implementation of the algorithm, one 3D ultrasound volume data with size 279 × 279 × 276 can be reconstructed from 100 slices 2D ultrasound images with size 200 × 200 at 1.04 s. Such a quick and accurate approach has practical value in medical research. Collapse Key Words freehand ultrasound parallel computing tetrahedral interpolation volume reconstruction Collapse MESH Headings Collapse Grants Collapse
14	Sebastian S, Roy S, Kalita J. A generic parallel framework for inferring large-scale gene regulatory networks from expression profiles: application to Alzheimer's disease network. Brief Bioinform 2023;24:6868522. [PMID: 36534961 DOI: 10.1093/bib/bbac482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/14/2022] [Accepted: 10/11/2022] [Indexed: 12/23/2022] Open Abstract The inference of large-scale gene regulatory networks is essential for understanding comprehensive interactions among genes. Most existing methods are limited to reconstructing networks with a few hundred nodes. Therefore, parallel computing paradigms must be leveraged to construct large networks. We propose a generic parallel framework that enables any existing method, without re-engineering, to infer large networks in parallel, guaranteeing quality output. The framework is tested on 15 inference methods (not limited to) employing in silico benchmarks and real-world large expression matrices, followed by qualitative and speedup assessment. The framework does not compromise the quality of the base serial inference method. We rank the candidate methods and use the top-performing method to infer an Alzheimer's Disease (AD) affected network from large expression profiles of a triple transgenic mouse model consisting of 45,101 genes. The resultant network is further explored to obtain hub genes that emerge functionally related to the disease. We partition the network into 41 modules and conduct pathway enrichment analysis, revealing that a good number of participating genes are collectively responsible for several brain disorders, including AD. Finally, we extract the interactions of a few known AD genes and observe that they are periphery genes connected to the network's hub genes. Availability: The R implementation of the framework is downloadable from https://github.com/Netralab/GenericParallelFramework. Collapse Key Words Alzheimer’s disease gene regulatory network inference large-network scalability analysis parallel computing Collapse MESH Headings Collapse Grants Collapse
15	Kauth K, Stadtmann T, Sobhani V, Gemmeke T. neuroAIx-Framework: design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20× faster than biological real-time. Front Comput Neurosci 2023;17:1144143. [PMID: 37152299 PMCID: PMC10156974 DOI: 10.3389/fncom.2023.1144143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/30/2023] [Indexed: 05/09/2023] Open Abstract Introduction Research in the field of computational neuroscience relies on highly capable simulation platforms. With real-time capabilities surpassed for established models like the cortical microcircuit, it is time to conceive next-generation systems: neuroscience simulators providing significant acceleration, even for larger networks with natural density, biologically plausible multi-compartment models and the modeling of long-term and structural plasticity. Methods Stressing the need for agility to adapt to new concepts or findings in the domain of neuroscience, we have developed the neuroAIx-Framework consisting of an empirical modeling tool, a virtual prototype, and a cluster of FPGA boards. This framework is designed to support and accelerate the continuous development of such platforms driven by new insights in neuroscience. Results Based on design space explorations using this framework, we devised and realized an FPGA cluster consisting of 35 NetFPGA SUME boards. Discussion This system functions as an evaluation platform for our framework. At the same time, it resulted in a fully deterministic neuroscience simulation system surpassing the state of the art in both performance and energy efficiency. It is capable of simulating the microcircuit with 20× acceleration compared to biological real-time and achieves an energy efficiency of 48nJ per synaptic event. Collapse Key Words FPGA cluster accelerated simulation computational neuroscience cortical microcircuit neuromorphic computing architectures parallel computing rapid prototyping spiking neural networks (SNN) Collapse MESH Headings Collapse Grants Helmholtz-Gemeinschaft Bundesministerium für Bildung und Forschung Collapse
16	Tran NY, Hieu HT, Bao PT. A proposed scenario to improve the Ncut algorithm in segmentation. Front Big Data 2023;6:1134946. [PMID: 36936997 PMCID: PMC10020342 DOI: 10.3389/fdata.2023.1134946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open Abstract In image segmentation, there are many methods to accomplish the result of segmenting an image into k clusters. However, the number of clusters k is always defined before running the process. It is defined by some observation or knowledge based on the application. In this paper, we propose a new scenario in order to define the value k clusters automatically using histogram information. This scenario is applied to Ncut algorithm and speeds up the running time by using CUDA language to parallel computing in GPU. The Ncut is improved in four steps: determination of number of clusters in segmentation, computing the similarity matrix W, computing the similarity matrix's eigenvalues, and grouping on the Fuzzy C-Means (FCM) clustering algorithm. Some experimental results are shown to prove that our scenario is 20 times faster than the Ncut algorithm while keeping the same accuracy. Collapse Key Words CPU FCM GPU Ncut parallel computing Collapse MESH Headings Collapse Grants Collapse
17	Abdusalomov AB, Safarov F, Rakhimov M, Turaev B, Whangbo TK. Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm. SENSORS (BASEL, SWITZERLAND) 2022;22:8122. [PMID: 36365819 PMCID: PMC9654697 DOI: 10.3390/s22218122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/14/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023] Abstract Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker's features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process involves three main steps: acoustic processing, feature extraction, and classification/recognition. The purpose of feature extraction is to illustrate a speech signal using a predetermined number of signal components. This is because all information in the acoustic signal is excessively cumbersome to handle, and some information is irrelevant in the identification task. This study proposes a machine learning-based approach that performs feature parameter extraction from speech signals to improve the performance of speech recognition applications in real-time smart city environments. Moreover, the principle of mapping a block of main memory to the cache is used efficiently to reduce computing time. The block size of cache memory is a parameter that strongly affects the cache performance. In particular, the implementation of such processes in real-time systems requires a high computation speed. Processing speed plays an important role in speech recognition in real-time systems. It requires the use of modern technologies and fast algorithms that increase the acceleration in extracting the feature parameters from speech signals. Problems with overclocking during the digital processing of speech signals have yet to be completely resolved. The experimental results demonstrate that the proposed method successfully extracts the signal features and achieves seamless classification performance compared to other conventional speech recognition algorithms. Collapse Key Words distributed computing feature extraction multicore processor parallel computing spectral analysis speech recognition Collapse MESH Headings Speech Machine Learning Algorithms Acoustics Recognition, Psychology Collapse Grants Collapse
18	Wang S, Li Z, Lan L, Zhao J, Zheng WJ, Li L. GPU Accelerated Estimation of a Shared Random Effect Joint Model for Dynamic Prediction. Comput Stat Data Anal 2022;174:107528. [PMID: 39257897 PMCID: PMC11384271 DOI: 10.1016/j.csda.2022.107528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract In longitudinal cohort studies, it is often of interest to predict the risk of a terminal clinical event using longitudinal predictor data among subjects at risk by the time of the prediction. The at-risk population changes over time; so does the association between predictors and the outcome, as well as the accumulating longitudinal predictor history. The dynamic nature of this prediction problem has received increasing interest in the literature, but computation often poses a challenge. The widely used joint model of longitudinal and survival data often comes with intensive computation and excessive model fitting time, due to numerical optimization and the analytically intractable high-dimensional integral in the likelihood function. This problem is exacerbated when the model is fit to a large dataset or the model involves multiple longitudinal predictors with nonlinear trajectories. This challenge can be addressed from an algorithmic perspective, by a novel two-stage estimation procedure, and from a computing perspective, by Graphics Processing Unit (GPU) programming. The latter is implemented through PyTorch, an emerging deep learning framework. The numerical studies demonstrate that the proposed algorithm and software can substantially speed up the estimation of the joint model, particularly with large datasets. The numerical studies also concluded that accounting for nonlinearity in longitudinal predictor trajectories can improve the prediction accuracy in comparison to joint modeling that ignore nonlinearity. Collapse Key Words Graphics Processing Unit (GPU) computing electronic health records joint modeling longitudinal and survival data numerical integration parallel computing Collapse MESH Headings Collapse Grants R01 CA225646 NCI NIH HHS Collapse
19	Cheng C, Feng X, Li X, Wu M. Robust analysis of cancer heterogeneity for high-dimensional data. Stat Med 2022;41:5448-5462. [PMID: 36117143 DOI: 10.1002/sim.9578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/04/2022] [Accepted: 09/05/2022] [Indexed: 11/06/2022] Abstract Cancer heterogeneity plays an important role in the understanding of tumor etiology, progression, and response to treatment. To accommodate heterogeneity, cancer subgroup analysis has been extensively conducted. However, most of the existing studies share the limitation that they cannot accommodate heavy-tailed or contaminated outcomes and also high dimensional covariates, both of which are not uncommon in biomedical research. In this study, we propose a robust subgroup identification approach based on M-estimators together with concave and pairwise fusion penalties, which advances from existing studies by effectively accommodating high-dimensional data containing some outliers. The penalties are applied on both latent heterogeneity factors and covariates, where the estimation is expected to achieve subgroup identification and variable selection simultaneously, with the number of subgroups being apriori unknown. We innovatively develop an algorithm based on parallel computing strategy, with a significant advantage of capable of processing large-scale data. The convergence property of the proposed algorithm, oracle property of the penalized M-estimators, and selection consistency of the proposed BIC criterion are carefully established. Simulation and analysis of TCGA breast cancer data demonstrate that the proposed approach is promising to efficiently identify underlying subgroups in high-dimensional data. Collapse Key Words parallel computing penalized fusion robust estimation subgroup analysis variable selection Collapse MESH Headings Collapse Grants Collapse
20	Makarov VL, Bakhtizin AR, Sushko ED, Sushko GB. Creation of a Supercomputer Simulation of a Society with Different Types of Active Agents and Its Approbation. HERALD OF THE RUSSIAN ACADEMY OF SCIENCES 2022;92:268-275. [PMID: 36035028 PMCID: PMC9395919 DOI: 10.1134/s1019331622030182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 06/30/2021] [Accepted: 09/15/2021] [Indexed: 06/15/2023] Abstract This article continues a series of works devoted to the creation of large agent-based models, built as an artificial society, and the development of software for their implementation-the MÖBIUS design system for scalable agent-based models. The basic core of the system is a demographic model that simulates the natural movement of the population. A new stage in the development of the work discussed in this article was the creation on the basis of this core of an agent-based model of Russia, which includes families as agents of a new type, hierarchically connected with human agents. In addition, objects of a new type were introduced into the model-projects that provide for the creation in an artificial environment of analogues of complex control actions aimed at stimulating fertility. Developed on the basis of simulating the reaction of individual families to the introduced regional support measures, the model makes it possible to track their impact on key demographic indicators. The agent-based model of Russia was tested on data for a long retrospective period using the example of the launch of maternal capital programs and showed good agreement with official statistics. Collapse Key Words agent-based modeling approbation of managerial influences digital simulation and modeling of systems parallel computing simulation of demographic processes supercomputer technologies Collapse MESH Headings Collapse Grants Collapse
21	Wu W, Yang Y, Kang J, He K. Improving large-scale estimation and inference for profiling health care providers. Stat Med 2022;41:2840-2853. [PMID: 35318706 PMCID: PMC9314652 DOI: 10.1002/sim.9387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 02/04/2022] [Accepted: 02/21/2022] [Indexed: 01/25/2023] Abstract Provider profiling has been recognized as a useful tool in monitoring health care quality, facilitating inter-provider care coordination, and improving medical cost-effectiveness. Existing methods often use generalized linear models with fixed provider effects, especially when profiling dialysis facilities. As the number of providers under evaluation escalates, the computational burden becomes formidable even for specially designed workstations. To address this challenge, we introduce a serial blockwise inversion Newton algorithm exploiting the block structure of the information matrix. A shared-memory divide-and-conquer algorithm is proposed to further boost computational efficiency. In addition to the computational challenge, the current literature lacks an appropriate inferential approach to detecting providers with outlying performance especially when small providers with extreme outcomes are present. In this context, traditional score and Wald tests relying on large-sample distributions of the test statistics lead to inaccurate approximations of the small-sample properties. In light of the inferential issue, we develop an exact test of provider effects using exact finite-sample distributions, with the Poisson-binomial distribution as a special case when the outcome is binary. Simulation analyses demonstrate improved estimation and inference over existing methods. The proposed methods are applied to profiling dialysis facilities based on emergency department encounters using a dialysis patient database from the Centers for Medicare & Medicaid Services. Collapse Key Words Poisson-binomial distribution divide-and-conquer emergency department encounters exact test parallel computing Collapse MESH Headings Aged Health Personnel Humans Medicare Quality of Health Care United States Collapse Grants Centers for Medicare & Medicaid Services Collapse
22	Trensch G, Morrison A. A System-on-Chip Based Hybrid Neuromorphic Compute Node Architecture for Reproducible Hyper-Real-Time Simulations of Spiking Neural Networks. Front Neuroinform 2022;16:884033. [PMID: 35846779 PMCID: PMC9277345 DOI: 10.3389/fninf.2022.884033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 05/23/2022] [Indexed: 11/23/2022] Open Abstract Despite the great strides neuroscience has made in recent decades, the underlying principles of brain function remain largely unknown. Advancing the field strongly depends on the ability to study large-scale neural networks and perform complex simulations. In this context, simulations in hyper-real-time are of high interest, as they would enable both comprehensive parameter scans and the study of slow processes, such as learning and long-term memory. Not even the fastest supercomputer available today is able to meet the challenge of accurate and reproducible simulation with hyper-real acceleration. The development of novel neuromorphic computer architectures holds out promise, but the high costs and long development cycles for application-specific hardware solutions makes it difficult to keep pace with the rapid developments in neuroscience. However, advances in System-on-Chip (SoC) device technology and tools are now providing interesting new design possibilities for application-specific implementations. Here, we present a novel hybrid software-hardware architecture approach for a neuromorphic compute node intended to work in a multi-node cluster configuration. The node design builds on the Xilinx Zynq-7000 SoC device architecture that combines a powerful programmable logic gate array (FPGA) and a dual-core ARM Cortex-A9 processor extension on a single chip. Our proposed architecture makes use of both and takes advantage of their tight coupling. We show that available SoC device technology can be used to build smaller neuromorphic computing clusters that enable hyper-real-time simulation of networks consisting of tens of thousands of neurons, and are thus capable of meeting the high demands for modeling and simulation in neuroscience. Collapse Key Words FPGA SoC compute node neuromorphic computing parallel computing performance simulation spiking neural networks Collapse MESH Headings Collapse Grants Collapse
23	Ahmad K, Rizzi A, Capelli R, Mandelli D, Lyu W, Carloni P. Enhanced-Sampling Simulations for the Estimation of Ligand Binding Kinetics: Current Status and Perspective. Front Mol Biosci 2022;9:899805. [PMID: 35755817 PMCID: PMC9216551 DOI: 10.3389/fmolb.2022.899805] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022] Open Abstract The dissociation rate (k off) associated with ligand unbinding events from proteins is a parameter of fundamental importance in drug design. Here we review recent major advancements in molecular simulation methodologies for the prediction of k off. Next, we discuss the impact of the potential energy function models on the accuracy of calculated k off values. Finally, we provide a perspective from high-performance computing and machine learning which might help improve such predictions. Collapse Key Words QM/MM drug discovery enhanced sampling kinetics machine learning molecular dynamics parallel computing Collapse MESH Headings Collapse Grants Horizon 2020 Collapse
24	Pham M, Li H, Yuan Y, Mou C, Ramachandran K, Xu Z, Tu Y. Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs. ICS ... : PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING. INTERNATIONAL CONFERENCE ON SUPERCOMPUTING 2022;2022:24. [PMID: 35943281 PMCID: PMC9357265 DOI: 10.1145/3524059.3532387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Abstract Due to the high level of parallelism, there are unique challenges in developing system software on massively parallel hardware such as GPUs. One such challenge is designing a dynamic memory allocator whose task is to allocate memory chunks to requesting threads at runtime. State-of-the-art GPU memory allocators maintain a global data structure holding metadata to facilitate allocation/deallocation. However, the centralized data structure can easily become a bottleneck in a massively parallel system. In this paper, we present a novel approach for designing dynamic memory allocation without a centralized data structure. The core idea is to let threads follow a random search procedure to locate free pages. Then we further extend to more advanced designs and algorithms that can achieve an order of magnitude improvement over the basic idea. We present mathematical proofs to demonstrate that (1) the basic random search design achieves asymptotically lower latency than the traditional queue-based design and (2) the advanced designs achieve significant improvement over the basic idea. Extensive experiments show consistency to our mathematical models and demonstrate that our solutions can achieve up to two orders of magnitude improvement in latency over the best-known existing solutions. Collapse Key Words GPU dynamic memory management massively parallel algorithms parallel computing Collapse MESH Headings Collapse Grants R01 GM140316 NIGMS NIH HHS Collapse
25	Liu Z, Lin Y, Hoover J, Beene D, Charley PH, Singer N. Individual level spatial-temporal modelling of exposure potential of livestock in the Cove Wash watershed, Arizona. ANNALS OF GIS 2022;29:87-107. [PMID: 37090684 PMCID: PMC10117392 DOI: 10.1080/19475683.2022.2075935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/02/2022] [Indexed: 05/03/2023] Abstract Personal exposure studies suffer from uncertainty issues, largely stemming from individual behavior uncertainties. Built on spatial-temporal exposure analysis and methods, this study proposed a novel approach to spatial-temporal modeling that incorporated behavior classifications taking into account uncertainties, to estimate individual livestock exposure potential. The new approach was applied in a community-based research project with a Tribal community in the southwest United States. The community project examined the geospatial and temporal grazing patterns of domesticated livestock in a watershed containing 52 abandoned uranium mines (AUMs). Thus, the study aimed to 1) classify Global Positioning System (GPS) data from livestock into three behavior subgroups - grazing, traveling or resting; 2) calculate the daily cumulative exposure potential for livestock; 3) assess the performance of the computational method with and without behavior classifications. Using Lotek Litetrack GPS collars, we collected data at a 20-minute-interval for 2 flocks of sheep and goats during the spring and summer of 2019. Analysis and modeling of GPS data demonstrated no significant difference in individual cumulative exposure potential within each flock when animal behaviors with probability/uncertainties were considered. However, when daily cumulative exposure potential was calculated without consideration of animal behavior or probability/uncertainties, significant differences among animals within a herd were observed, which does not match animal grazing behaviors reported by livestock owners. These results suggest that the proposed method of including behavior subgroups with probability/uncertainties more closely resembled the observed grazing behaviors reported by livestock owners. Results from the research may be used for future intervention and policy-making on remediation efforts in communities where grazing livestock may encounter environmental contaminants. This research also demonstrates a novel robust geographic information system (GIS)-based framework to estimate cumulative exposure potential to environmental contaminants and provides critical information to address community questions on livestock exposure to AUMs. Collapse Key Words GIS GPS behavior patterns cumulative exposure domesticated livestock fuzzy logic parallel computing time geography Collapse MESH Headings Collapse Grants P50 ES026102 NIEHS NIH HHS P50 MD015706 NIMHD NIH HHS R836157 EPA P42 ES025589 NIEHS NIH HHS P50 ES026089 NIEHS NIH HHS Collapse