226
|
Gorji H, Lunati I, Rudolf F, Vidondo B, Hardt WD, Jenny P, Engel D, Schneider J, Jamnicki M, Leuthold R, Risch L, Risch M, Bühler M, Sommer A, Caduff A. Results from Canton Grisons of Switzerland suggest repetitive testing reduces SARS-CoV-2 incidence (February-March 2021). Sci Rep 2022; 12:19538. [PMID: 36376420 PMCID: PMC9663184 DOI: 10.1038/s41598-022-23986-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/09/2022] [Indexed: 11/16/2022] Open
Abstract
In February 2021, in response to emergence of more transmissible SARS-CoV-2 virus variants, the Canton Grisons launched a unique RNA mass testing program targeting the labour force in local businesses. Employees were offered weekly tests free of charge and on a voluntary basis. If tested positive, they were required to self-isolate for ten days and their contacts were subjected to daily testing at work. Thereby, the quarantine of contact persons could be waved.Here, we evaluate the effects of the testing program on the tested cohorts. We examined 121,364 test results from 27,514 participants during February-March 2021. By distinguishing different cohorts of employees, we observe a noticeable decrease in the test positivity rate and a statistically significant reduction in the associated incidence rate over the considered period. The reduction in the latter ranges between 18 and 50%. The variability is partly explained by different exposures to exogenous infection sources (e.g., contacts with visiting tourists or cross-border commuters). Our analysis provides the first empirical evidence that applying repetitive mass testing to a real population over an extended period of time can prevent spread of COVID-19 pandemic. However, to overcome logistic, uptake, and adherence challenges it is important that the program is carefully designed and that disease incursion from the population outside of the program is considered and controlled.
Collapse
|
227
|
Region-specific denoising identifies spatial co-expression patterns and intra-tissue heterogeneity in spatially resolved transcriptomics data. Nat Commun 2022; 13:6912. [PMID: 36376296 PMCID: PMC9663444 DOI: 10.1038/s41467-022-34567-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 10/31/2022] [Indexed: 11/16/2022] Open
Abstract
Spatially resolved transcriptomics is a relatively new technique that maps transcriptional information within a tissue. Analysis of these datasets is challenging because gene expression values are highly sparse due to dropout events, and there is a lack of tools to facilitate in silico detection and annotation of regions based on their molecular content. Therefore, we develop a computational tool for detecting molecular regions and region-based Missing value Imputation for Spatially Transcriptomics (MIST). We validate MIST-identified regions across multiple datasets produced by 10x Visium Spatial Transcriptomics, using manually annotated histological images as references. We benchmark MIST against a spatial k-nearest neighboring baseline and other imputation methods designed for single-cell RNA sequencing. We use holdout experiments to demonstrate that MIST accurately recovers spatial transcriptomics missing values. MIST facilitates identifying intra-tissue heterogeneity and recovering spatial gene-gene co-expression signals. Using MIST before downstream analysis thus provides unbiased region detections to facilitate annotations with the associated functional analyses and produces accurately denoised spatial gene expression profiles.
Collapse
|
228
|
Abdallah M, Iovene V, Zanitti G, Wassermann D. Meta-analysis of the functional neuroimaging literature with probabilistic logic programming. Sci Rep 2022; 12:19431. [PMID: 36371447 PMCID: PMC9653422 DOI: 10.1038/s41598-022-21801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 10/04/2022] [Indexed: 11/13/2022] Open
Abstract
Inferring reliable brain-behavior associations requires synthesizing evidence from thousands of functional neuroimaging studies through meta-analysis. However, existing meta-analysis tools are limited to investigating simple neuroscience concepts and expressing a restricted range of questions. Here, we expand the scope of neuroimaging meta-analysis by designing NeuroLang: a domain-specific language to express and test hypotheses using probabilistic first-order logic programming. By leveraging formalisms found at the crossroads of artificial intelligence and knowledge representation, NeuroLang provides the expressivity to address a larger repertoire of hypotheses in a meta-analysis, while seamlessly modeling the uncertainty inherent to neuroimaging data. We demonstrate the language's capabilities in conducting comprehensive neuroimaging meta-analysis through use-case examples that address questions of structure-function associations. Specifically, we infer the specific functional roles of three canonical brain networks, support the role of the visual word-form area in visuospatial attention, and investigate the heterogeneous organization of the frontoparietal control network.
Collapse
|
229
|
Battenberg K, Kelly ST, Ras RA, Hetherington NA, Hayashi M, Minoda A. A flexible cross-platform single-cell data processing pipeline. Nat Commun 2022; 13:6847. [PMID: 36369450 PMCID: PMC9652453 DOI: 10.1038/s41467-022-34681-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open
Abstract
Single-cell RNA-sequencing analysis to quantify the RNA molecules in individual cells has become popular, as it can obtain a large amount of information from each experiment. We introduce UniverSC ( https://github.com/minoda-lab/universc ), a universal single-cell RNA-seq data processing tool that supports any unique molecular identifier-based platform. Our command-line tool, docker image, and containerised graphical application enables consistent and comprehensive integration, comparison, and evaluation across data generated from a wide range of platforms. We also provide a cross-platform application to run UniverSC via a graphical user interface, available for macOS, Windows, and Linux Ubuntu, negating one of the bottlenecks with single-cell RNA-seq analysis that is data processing for researchers who are not bioinformatically proficient.
Collapse
|
230
|
Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking. Nat Commun 2022; 13:6656. [PMID: 36333358 PMCID: PMC9636193 DOI: 10.1038/s41467-022-34537-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022] Open
Abstract
Liquid chromatography - mass spectrometry (LC-MS) based untargeted metabolomics allows to measure both known and unknown metabolites in the metabolome. However, unknown metabolite annotation is a major challenge in untargeted metabolomics. Here, we develop an approach, namely, knowledge-guided multi-layer network (KGMN), to enable global metabolite annotation from knowns to unknowns in untargeted metabolomics. The KGMN approach integrates three-layer networks, including knowledge-based metabolic reaction network, knowledge-guided MS/MS similarity network, and global peak correlation network. To demonstrate the principle, we apply KGMN in an in vitro enzymatic reaction system and different biological samples, with ~100-300 putative unknowns annotated in each data set. Among them, >80% unknown metabolites are corroborated with in silico MS/MS tools. Finally, we validate 5 metabolites that are absent in common MS/MS libraries through repository mining and synthesis of chemical standards. Together, the KGMN approach enables efficient unknown annotations, and substantially advances the discovery of recurrent unknown metabolites for common biological samples from model organisms, towards deciphering dark matter in untargeted metabolomics.
Collapse
|
231
|
Chromosome-level genome assembly of Nibea coibor using PacBio HiFi reads and Hi-C technologies. Sci Data 2022; 9:670. [PMID: 36329044 PMCID: PMC9633807 DOI: 10.1038/s41597-022-01804-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
Abstract
Nibea coibor belongs to Sciaenidae and is distributed in the South China Sea, East China Sea, India and the Philippines. In this study, we sequenced the DNA of a male Nibea coibor using PacBio long-read sequencing and generated chromatin interaction data. The genome size of Nibea coibor was estimated to be 611.85~633.88 Mb based on k-mer counts generated with Jellyfish. PacBio sequencing produced 29.26 Gb of HiFi reads, and Hifiasm was used to assemble a 627.60 Mb genome with a contig N50 of 10.66 Mb. We further found the canonical telomeric repeats “TTAGGG” to be present at the telomeres of all 24 chromosomes. The completeness of the assembly was estimated to be 98.9% and 97.8% using BUSCO and Merqury, respectively. Using the combination of ab initio prediction, protein homology and RNAseq annotation, we identified a total of 21,433 protein-coding genes. Phylogenetic analyses showed that Nibea coibor and Nibea albiflora are closely related. The results provide an important basis for research on the genetic breeding and genome evolution of Nibea coibor. Measurement(s) | Whole-Genome Shotgun Sequencing • transcription profiling assay | Technology Type(s) | single molecular realtime sequencing • RNA sequencing | Sample Characteristic - Organism | Nibea coibor | Sample Characteristic - Location | China |
Collapse
|
232
|
Jensen AB, Christensen TEK, Weninger C, Birkedal H. Very large-scale diffraction investigations enabled by a matrix-multiplication facilitated radial and azimuthal integration algorithm: MatFRAIA. JOURNAL OF SYNCHROTRON RADIATION 2022; 29:1420-1428. [PMID: 36345750 PMCID: PMC9641557 DOI: 10.1107/s1600577522008232] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 08/17/2022] [Indexed: 05/22/2023]
Abstract
As synchrotron facilities continue to generate increasingly brilliant X-rays and detector speeds increase, swift data reduction from the collected area detector images to more workable 1D diffractograms becomes of increasing importance. This work reports an integration algorithm that can integrate diffractograms in real time on modern laptops and can reach 10 kHz integration speeds on modern workstations using an efficient pixel-splitting and parallelization scheme. This algorithm is limited not by the computation of the integration itself but is rather bottlenecked by the speed of the data transfer to the processor, the data decompression and/or the saving of results. The algorithm and its implementation is described while the performance is investigated on 2D scanning X-ray diffraction/fluorescence data collected at the interface between an implant and forming bone.
Collapse
|
233
|
Chu Q, Sun W, Zhang Y. A Data-Driven Method for the Estimation of Truck-State Parameters and Braking Force Distribution. SENSORS (BASEL, SWITZERLAND) 2022; 22:8358. [PMID: 36366054 PMCID: PMC9655628 DOI: 10.3390/s22218358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 06/16/2023]
Abstract
In the study of braking force distribution of trucks, the accurate estimation of the state parameters of the vehicle is very critical. However, during the braking process, the state parameters of the vehicle present a highly nonlinear relationship that is difficult to estimate accurately and that seriously affects the accuracy of the braking force distribution strategy. To solve this problem, this paper proposes a machine-learning-based state-parameter estimation method to provide a solid data base for the braking force distribution strategy of the vehicle. Firstly, the actual collected complete vehicle information is processed for data; secondly, random forest is applied for the feature screening of data to reduce the data dimensionality; subsequently, the generalized regression neural network (GRNN) model is trained offline, and the vehicle state parameters are estimated online; the estimated parameters are used to implement the four-wheel braking force distribution strategy; finally, the effectiveness of the method is verified by joint simulation using MATLAB/Simulink and TruckSim.
Collapse
|
234
|
Ahuja Y, Wen J, Hong C, Xia Z, Huang S, Cai T. A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record. Sci Rep 2022; 12:17737. [PMID: 36273240 PMCID: PMC9588081 DOI: 10.1038/s41598-022-22585-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 10/17/2022] [Indexed: 01/18/2023] Open
Abstract
While there exist numerous methods to identify binary phenotypes (i.e. COPD) using electronic health record (EHR) data, few exist to ascertain the timings of phenotype events (i.e. COPD onset or exacerbations). Estimating event times could enable more powerful use of EHR data for longitudinal risk modeling, including survival analysis. Here we introduce Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to estimate phenotype event times using EHR data with limited observed labels, which require resource-intensive chart review to obtain. SAMGEP models latent phenotype states as a binary Markov process, and it employs an adaptive weighting strategy to map timestamped EHR features to an embedding function that it models as a state-dependent Gaussian process. SAMGEP's feature weighting achieves meaningful feature selection, and its predictions significantly improve AUCs and F1 scores over existing approaches in diverse simulations and real-world settings. It is particularly adept at predicting cumulative risk and event counting process functions, and is robust to diverse generative model parameters. Moreover, it achieves high accuracy with few (50-100) labels, efficiently leveraging unlabeled EHR data to maximize information gain from costly-to-obtain event time labels. SAMGEP can be used to estimate accurate phenotype state functions for risk modeling research.
Collapse
|
235
|
Pattammattel A, Tappero R, Gavrilov D, Zhang H, Aronstein P, Forman HJ, O'Day PA, Yan H, Chu YS. Multimodal X-ray nano-spectromicroscopy analysis of chemically heterogeneous systems. Metallomics 2022; 14:6754152. [PMID: 36208212 PMCID: PMC9584160 DOI: 10.1093/mtomcs/mfac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 09/27/2022] [Indexed: 11/14/2022]
Abstract
Understanding the nanoscale chemical speciation of heterogeneous systems in their native environment is critical for several disciplines such as life and environmental sciences, biogeochemistry, and materials science. Synchrotron-based X-ray spectromicroscopy tools are widely used to understand the chemistry and morphology of complex material systems owing to their high penetration depth and sensitivity. The multidimensional (4D+) structure of spectromicroscopy data poses visualization and data-reduction challenges. This paper reports the strategies for the visualization and analysis of spectromicroscopy data. We created a new graphical user interface and data analysis platform named XMIDAS (X-ray multimodal image data analysis software) to visualize spectromicroscopy data from both image and spectrum representations. The interactive data analysis toolkit combined conventional analysis methods with well-established machine learning classification algorithms (e.g. nonnegative matrix factorization) for data reduction. The data visualization and analysis methodologies were then defined and optimized using a model particle aggregate with known chemical composition. Nanoprobe-based X-ray fluorescence (nano-XRF) and X-ray absorption near edge structure (nano-XANES) spectromicroscopy techniques were used to probe elemental and chemical state information of the aggregate sample. We illustrated the complete chemical speciation methodology of the model particle by using XMIDAS. Next, we demonstrated the application of this approach in detecting and characterizing nanoparticles associated with alveolar macrophages. Our multimodal approach combining nano-XRF, nano-XANES, and differential phase-contrast imaging efficiently visualizes the chemistry of localized nanostructure with the morphology. We believe that the optimized data-reduction strategies and tool development will facilitate the analysis of complex biological and environmental samples using X-ray spectromicroscopy techniques.
Collapse
|
236
|
Ammer T, Schützenmeister A, Prokosch HU, Zierk J, Rank CM, Rauh M. RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation. Clin Chem 2022; 68:1410-1424. [PMID: 36264679 DOI: 10.1093/clinchem/hvac142] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/12/2022] [Indexed: 11/14/2022]
Abstract
BACKGROUND Indirect methods leverage real-world data for the estimation of reference intervals. These constitute an active field of research, and several methods have been developed recently. So far, no standardized tool for evaluation and comparison of indirect methods exists. METHODS We provide RIbench, a benchmarking suite for quantitative evaluation of any existing or novel indirect method. The benchmark contains simulated test sets for 10 biomarkers mimicking routine measurements of a mixed distribution of non-pathological (reference) values and pathological values. The non-pathological distributions represent 4 common distribution types: normal, skewed, heavily skewed, and skewed-and-shifted. To identify strengths and weaknesses of indirect methods, test sets have varying sample sizes and pathological distributions differ in location, extent of overlap, and fraction. For performance evaluation, we use an overall benchmark score and sub-scores derived from absolute z-score deviations between estimated and true reference limits. We illustrate the application of RIbench by evaluating and comparing the Hoffmann method and 4 modern indirect methods -TML (Truncated-Maximum-Likelihood), kosmic, TMC (Truncated-Minimum-Chi-Square), and refineR- against one another and against a nonparametric direct method (n = 120). RESULTS For the modern indirect methods, pathological fraction and sample size had a strong influence on the results: With a pathological fraction up to 20% and a minimum sample size of 5000, most methods achieved results comparable or superior to the direct method. CONCLUSIONS We present RIbench, an open-source R-package, for the systematic evaluation of existing and novel indirect methods. RIbench can serve as a tool for enhancement of indirect methods, improving the estimation of reference intervals.
Collapse
|
237
|
Reconstruction of time-shifted hemodynamic response. Sci Rep 2022; 12:17441. [PMID: 36261655 PMCID: PMC9581965 DOI: 10.1038/s41598-022-17601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 07/27/2022] [Indexed: 01/12/2023] Open
Abstract
Regression of voxel time course onto expected response is a standard procedure in functional magnetic resonance imaging that relies on exact onset time and shape of superimposed hemodynamic response functions. Elegant capture of time deviation by time derivative regressors appears complicated by shape distortion and limited to ±1 s, and is usually not exploited for reconstructing the true time-shifted response function together with its magnitude. This analysis of the time-derivative approach provides closed-form functional relations between time shift and regression coefficients that allow for hemodynamic shifts of ±5 s and can explain shape distortion and reconstruction behavior. Reliable absolute latencies were no smaller than 0.6 s in a best-case experiment. Confusions of latency are a previously undiscussed shortcoming where current limitation strategy may eliminate correct latencies and protect incorrect ones.
Collapse
|
238
|
Hwangbo L, Kang YJ, Kwon H, Lee JI, Cho HJ, Ko JK, Sung SM, Lee TH. Stacking ensemble learning model to predict 6-month mortality in ischemic stroke patients. Sci Rep 2022; 12:17389. [PMID: 36253488 PMCID: PMC9576722 DOI: 10.1038/s41598-022-22323-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 10/12/2022] [Indexed: 01/10/2023] Open
Abstract
Patients with acute ischemic stroke can benefit from reperfusion therapy. Nevertheless, there are gray areas where initiation of reperfusion therapy is neither supported nor contraindicated by the current practice guidelines. In these situations, a prediction model for mortality can be beneficial in decision-making. This study aimed to develop a mortality prediction model for acute ischemic stroke patients not receiving reperfusion therapies using a stacking ensemble learning model. The model used an artificial neural network as an ensemble classifier. Seven base classifiers were K-nearest neighbors, support vector machine, extreme gradient boosting, random forest, naive Bayes, artificial neural network, and logistic regression algorithms. From the clinical data in the International Stroke Trial database, we selected a concise set of variables assessable at the presentation. The primary study outcome was all-cause mortality at 6 months. Our stacking ensemble model predicted 6-month mortality with acceptable performance in ischemic stroke patients not receiving reperfusion therapy. The area under the curve of receiver-operating characteristics, accuracy, sensitivity, and specificity of the stacking ensemble classifier on a put-aside validation set were 0.783 (95% confidence interval 0.758-0.808), 71.6% (69.3-74.2), 72.3% (69.2-76.4%), and 70.9% (68.9-74.3%), respectively.
Collapse
|
239
|
Gonzalez-Landaeta R, Ramirez B, Mejia J. Estimation of systolic blood pressure by Random Forest using heart sounds and a ballistocardiogram. Sci Rep 2022; 12:17196. [PMID: 36229644 PMCID: PMC9562413 DOI: 10.1038/s41598-022-22205-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 10/11/2022] [Indexed: 01/06/2023] Open
Abstract
Cuffless blood pressure measurement enables unobtrusive and continuous monitoring that can be integrated with wearable devices to extend healthcare to non-hospital settings. Most of the current research has focused on the estimation of blood pressure based on pulse transit time or pulse arrival time using ECG or peripheral cardiac pulse signals as proximal time references. This study proposed the use of a phonocardiogram (PCG) and ballistocardiogram (BCG), two signals detected noninvasively, to estimate systolic blood pressure (SBP). For this, the PCG and the BCG were simultaneously measured in 21 volunteers in the rest, activity, and post-activity conditions. Different features were considered based on the relationships between these signals. The intervals between S1 and S2 of the PCG and the I, J, and K waves of the BCG were statistically analyzed. The IJ and JK slopes were also estimated as additional features to train the machine-learning algorithm. The intervals S1-J, S1-K, S1-I, J-S2, and I-S2 were negatively correlated with changes in SBP (p-val < 0.01). The features were used as explanatory variables for a regressor based on the Random Forest. It was possible to estimate the systolic blood pressure with a mean error of 3.3 mmHg with a standard deviation of ± 5 mmHg. Therefore, we foresee that this proposal has potential applications for wearable devices that use low-cost embedded systems.
Collapse
|
240
|
Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE. Nat Commun 2022; 13:5948. [PMID: 36216816 PMCID: PMC9550814 DOI: 10.1038/s41467-022-33570-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 09/20/2022] [Indexed: 11/08/2022] Open
Abstract
The steady-state localisation of proteins provides vital insight into their function. These localisations are context specific with proteins translocating between different subcellular niches upon perturbation of the subcellular environment. Differential localisation, that is a change in the steady-state subcellular location of a protein, provides a step towards mechanistic insight of subcellular protein dynamics. High-accuracy high-throughput mass spectrometry-based methods now exist to map the steady-state localisation and re-localisation of proteins. Here, we describe a principled Bayesian approach, BANDLE, that uses these data to compute the probability that a protein differentially localises upon cellular perturbation. Extensive simulation studies demonstrate that BANDLE reduces the number of both type I and type II errors compared to existing approaches. Application of BANDLE to several datasets recovers well-studied translocations. In an application to cytomegalovirus infection, we obtain insights into the rewiring of the host proteome. Integration of other high-throughput datasets allows us to provide the functional context of these data.
Collapse
|
241
|
Nasir J, Steinbrück N, Xu K, Engelen B, Schmedt auf der Günne J. Digitization of imaging plates from Guinier powder X-ray diffraction cameras. J Appl Crystallogr 2022; 55:1097-1103. [PMID: 36249503 PMCID: PMC9533741 DOI: 10.1107/s160057672200677x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 07/01/2022] [Indexed: 11/15/2022] Open
Abstract
A Guinier camera equipped with an imaging plate is used to investigate and eliminate the sources of instrumental errors affecting the quality of the obtained scanned Guinier data. A program with a graphical user interface is presented which converts the data of the scanned images into different standard file formats for powder X-ray patterns containing intensities, their standard deviations and the diffraction angles. The program also allows for manual and automatic correction of the 2θ scale against a known reference material. It is shown using LaB6 that the exported X-ray diffraction patterns provide a 2θ scale reproducible enough to allow for averaging diffractograms obtained from different exposures of the imaging plate for the same sample. As shown on a mixture of NaCl and sodalite, the quality of the produced data is sufficient for Rietveld refinement. The software including source code is made available under a free software license.
Collapse
|
242
|
Eastmond C, Subedi A, De S, Intes X. Deep learning in fNIRS: a review. NEUROPHOTONICS 2022; 9:041411. [PMID: 35874933 PMCID: PMC9301871 DOI: 10.1117/1.nph.9.4.041411] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 06/22/2022] [Indexed: 05/28/2023]
Abstract
Significance: Optical neuroimaging has become a well-established clinical and research tool to monitor cortical activations in the human brain. It is notable that outcomes of functional near-infrared spectroscopy (fNIRS) studies depend heavily on the data processing pipeline and classification model employed. Recently, deep learning (DL) methodologies have demonstrated fast and accurate performances in data processing and classification tasks across many biomedical fields. Aim: We aim to review the emerging DL applications in fNIRS studies. Approach: We first introduce some of the commonly used DL techniques. Then, the review summarizes current DL work in some of the most active areas of this field, including brain-computer interface, neuro-impairment diagnosis, and neuroscience discovery. Results: Of the 63 papers considered in this review, 32 report a comparative study of DL techniques to traditional machine learning techniques where 26 have been shown outperforming the latter in terms of the classification accuracy. In addition, eight studies also utilize DL to reduce the amount of preprocessing typically done with fNIRS data or increase the amount of data via data augmentation. Conclusions: The application of DL techniques to fNIRS studies has shown to mitigate many of the hurdles present in fNIRS studies such as lengthy data preprocessing or small sample sizes while achieving comparable or improved classification accuracy.
Collapse
|
243
|
Abstract
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.
Collapse
|
244
|
Singh R, Nagpal S, Pinna NK, Mande SS. Tracking mutational semantics of SARS-CoV-2 genomes. Sci Rep 2022; 12:15704. [PMID: 36127400 PMCID: PMC9487856 DOI: 10.1038/s41598-022-20000-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 09/07/2022] [Indexed: 11/16/2022] Open
Abstract
Natural language processing (NLP) algorithms process linguistic data in order to discover the associated word semantics and develop models that can describe or even predict the latent meanings of the data. The applications of NLP become multi-fold while dealing with dynamic or temporally evolving datasets (e.g., historical literature). Biological datasets of genome-sequences are interesting since they are sequential as well as dynamic. Here we describe how SARS-CoV-2 genomes and mutations thereof can be processed using fundamental algorithms in NLP to reveal the characteristics and evolution of the virus. We demonstrate applicability of NLP in not only probing the temporal mutational signatures through dynamic topic modelling, but also in tracing the mutation-associations through tracing of semantic drift in genomic mutation records. Our approach also yields promising results in unfolding the mutational relevance to patient health status, thereby identifying putative signatures linked to known/highly speculated mutations of concern.
Collapse
|
245
|
Ling W, Lu J, Zhao N, Lulla A, Plantinga AM, Fu W, Zhang A, Liu H, Song H, Li Z, Chen J, Randolph TW, Koay WLA, White JR, Launer LJ, Fodor AA, Meyer KA, Wu MC. Batch effects removal for microbiome data via conditional quantile regression. Nat Commun 2022; 13:5418. [PMID: 36109499 PMCID: PMC9477887 DOI: 10.1038/s41467-022-33071-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 08/29/2022] [Indexed: 11/10/2022] Open
Abstract
Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batch-removed zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.
Collapse
|
246
|
Kulachenkov N, Barsukova M, Alekseevskiy P, Sapianik AA, Sergeev M, Yankin A, Krasilin AA, Bachinin S, Shipilovskikh S, Poturaev P, Medvedeva N, Denislamova E, Zelenovskiy PS, Shilovskikh VV, Kenzhebayeva Y, Efimova A, Novikov AS, Lunev A, Fedin VP, Milichko VA. Dimensionality Mediated Highly Repeatable and Fast Transformation of Coordination Polymer Single Crystals for All-Optical Data Processing. NANO LETTERS 2022; 22:6972-6981. [PMID: 36018814 DOI: 10.1021/acs.nanolett.2c01770] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A family of coordination polymers (CPs) based on dynamic structural elements are of great fundamental and commercial interest addressing modern problems in controlled molecular separation, catalysis, and even data processing. Herein, the endurance and fast structural dynamics of such materials at ambient conditions are still a fundamental challenge. Here, we report on the design of a series of Cu-based CPs [Cu(bImB)Cl2] and [Cu(bImB)2Cl2] with flexible ligand bImB (1,4-bis(imidazol-1-yl)butane) packed into one- and two-dimensional (1D, 2D) structures demonstrating dimensionality mediated flexibility and reversible structural transformations. Using the laser pulses as a fast source of activation energy, we initiate CP heating followed by anisotropic thermal expansion and 0.2-0.8% volume changes with the record transformation rates from 2220 to 1640 s-1 for 1D and 2D CPs, respectively. The endurance over 103 cycles of structural transformations, achieved for the CPs at ambient conditions, allows demonstrating optical fiber integrated all-optical data processing.
Collapse
|
247
|
Hajnal É, Kovács L, Vakulya G. Dairy Cattle Rumen Bolus Developments with Special Regard to the Applicable Artificial Intelligence (AI) Methods. SENSORS (BASEL, SWITZERLAND) 2022; 22:6812. [PMID: 36146158 PMCID: PMC9505622 DOI: 10.3390/s22186812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 09/02/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
It is a well-known worldwide trend to increase the number of animals on dairy farms and to reduce human labor costs. At the same time, there is a growing need to ensure economical animal husbandry and animal welfare. One way to resolve the two conflicting demands is to continuously monitor the animals. In this article, rumen bolus sensor techniques are reviewed, as they can provide lifelong monitoring due to their implementation. The applied sensory modalities are reviewed also using data transmission and data-processing techniques. During the processing of the literature, we have given priority to artificial intelligence methods, the application of which can represent a significant development in this field. Recommendations are also given regarding the applicable hardware and data analysis technologies. Data processing is executed on at least four levels from measurement to integrated analysis. We concluded that significant results can be achieved in this field only if the modern tools of computer science and intelligent data analysis are used at all levels.
Collapse
|
248
|
Barton B, Thomson J, Lozano Diz E, Portela R. Chemometrics for Raman Spectroscopy Harmonization. APPLIED SPECTROSCOPY 2022; 76:1021-1041. [PMID: 35622984 DOI: 10.1177/00037028221094070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Raman spectroscopy is used in a wide variety of fields, and in a plethora of different configurations. Raman spectra of simple analytes can often be analyzed using univariate approaches and interpreted in a straightforward manner. For more complex spetral data such as time series or line profiles (1D), Raman maps (2D), or even volumes (3D), multivariate data analysis (MVDA) becomes a requirement. Even though there are some existing standards for creation, implementation, and validation of methods and models employed in industry and academics, further research and development in the field must contribute to their improvement. This review will cover, in broad terms, existing techniques as well as new developments for MVDA for Raman spectroscopic data, and in particular the use associated with instrumentation and data calibration. Chemometric models are often generated via fusion of analytical data from different sources, which enhances model discrimination and prediction abilities as compared to models derived from a single data source. For Raman spectroscopy, raw or unprocessed data is rarely ever used. Instead, spectra are usually corrected and manipulated,1 often by case-specific rather than universal methods. Calibration models can be used to characterize qualitatively and/or quantitatively samples measured with the same instrumentation that was used to create the model. However, regular validation is required to ensure that aging or incorrect maintenance of the instrument does not alter the model's predictions, particularly when applied in regulated fields such as pharmaceuticals. Furthermore, a model transfer may be required for different reasons, such as replacement or significant repair of the instrumentation. Modeling can also be used to consistently harmonize Raman spectroscopic data across several instrumental designs, accounting for variations in the resulting spectrum induced by different components. Data for Raman harmonization models should be processed in a protocolled manner, and the original data accessible to allow for model reconstruction or transfer when new data is added. Important processing steps will be the calibration of the spectral axes and instrument dependent effects, such as spectral resolution. In addition, data fusion and model transfer are essential for allowing new instrumentation to build on existing models to harmonize their own data. Ideally, an open access database would be created and maintained, for the purpose of allowing for continued harmonization of new Raman instruments using an outlined and accepted protocol.
Collapse
|
249
|
Chen X, Quan X. Analysis of Stray Light and Enhancement of SNR in DMD-Based Spectrometers. SENSORS (BASEL, SWITZERLAND) 2022; 22:6237. [PMID: 36016003 PMCID: PMC9413973 DOI: 10.3390/s22166237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/13/2022] [Accepted: 08/16/2022] [Indexed: 06/15/2023]
Abstract
Due to advantages such as the high efficiency of light utilization, small volume, and vibration resistance, digital micro-mirror device (DMD)-based spectrometers are widely used in ocean investigations, mountain surveys, and other field science research. In order to eliminate the stray light caused by DMDs, the stray light in DMD-based spectrometers was first measured and analyzed. Then, the stray light was classified into wavelength-related components and wavelength-unrelated components. Moreover, the noise caused by the stray light was analyzed from the perspective of encoding equation, and the de-noising decoding equation was deduced. The results showed that the accuracy range of absorbance was enhanced from [0, 1.9] to [0, 3.1] in single-stripe mode and the accuracy range of absorbance was enhanced from [0, 3.8] to [0, 6.3] in Hadamard transform (HT) multiple-stripe mode. A conclusion can be drawn that the de-noising strategy is feasible and effective for enhancing the SNR in DMD-based spectrometers.
Collapse
|
250
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|