Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

545
(from Reference Citation Analysis)

Article PDFs (229)

Cited by > 0 (316)

Searched Name

data processing

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Number	Citation Analysis
76	Nentwich M, Zschornak M, Weigel T, Köhler T, Novikov D, Meyer DC, Richter C. Treatment of multiple-beam X-ray diffraction in energy-dependent measurements. JOURNAL OF SYNCHROTRON RADIATION 2024;31:28-34. [PMID: 38095667 PMCID: PMC10833431 DOI: 10.1107/s1600577523009670] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/06/2023] [Indexed: 01/09/2024] Abstract During X-ray diffraction experiments on single crystals, the diffracted beam intensities may be affected by multiple-beam X-ray diffraction (MBD). This effect is particularly frequent at higher X-ray energies and for larger unit cells. The appearance of this so-called Renninger effect often impairs the interpretation of diffracted intensities. This applies in particular to energy spectra analysed in resonant experiments, since during scans of the incident photon energy these conditions are necessarily met for specific X-ray energies. This effect can be addressed by carefully avoiding multiple-beam reflection conditions at a given X-ray energy and a given position in reciprocal space. However, areas which are (nearly) free of MBD are not always available. This article presents a universal concept of data acquisition and post-processing for resonant X-ray diffraction experiments. Our concept facilitates the reliable determination of kinematic (MBD-free) resonant diffraction intensities even at relatively high energies which, in turn, enables the study of higher absorption edges. This way, the applicability of resonant diffraction, e.g. to reveal the local atomic and electronic structure or chemical environment, is extended for a vast majority of crystalline materials. The potential of this approach compared with conventional data reduction is demonstrated by the measurements of the Ta L3 edge of well studied lithium tantalate LiTaO3. Collapse Key Words Renninger effect data processing multiple-beam X-ray diffraction resonant elastic X-ray scattering Collapse MESH Headings Collapse Grants 324641898 Deutsche Forschungsgemeinschaft 409743569 Deutsche Forschungsgemeinschaft 871072 H2020 Excellent Science I-20181183 Deutsches Elektronen-Synchrotron Collapse
77	Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023;14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open Abstract Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy. Collapse Key Words data processing statistical methods software protein design next-generation sequencing Collapse MESH Headings Protein Engineering Mutation Computer Simulation High-Throughput Nucleotide Sequencing Collapse Grants 22K17994 MEXT \| Japan Society for the Promotion of Science (JSPS) WPI-PRIMe MEXT \| Japan Society for the Promotion of Science (JSPS) REGENETHER 639888 EC \| EU Framework Programme for Research and Innovation H2020 \| H2020 Priority Excellent Science \| H2020 European Research Council (H2020 Excellent Science - European Research Council) 863214 EC \| Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020) ANR-10-LABX-65 Agence Nationale de la Recherche (French National Research Agency) ANR-18-IAHU-01 Agence Nationale de la Recherche (French National Research Agency) Collapse
78	Hejret V, Varadarajan NM, Klimentova E, Gresova K, Giassa IC, Vanacova S, Alexiou P. Analysis of chimeric reads characterises the diverse targetome of AGO2-mediated regulation. Sci Rep 2023;13:22895. [PMID: 38129478 PMCID: PMC10739727 DOI: 10.1038/s41598-023-49757-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023] Open Abstract Argonaute proteins are instrumental in regulating RNA stability and translation. AGO2, the major mammalian Argonaute protein, is known to primarily associate with microRNAs, a family of small RNA 'guide' sequences, and identifies its targets primarily via a 'seed' mediated partial complementarity process. Despite numerous studies, a definitive experimental dataset of AGO2 'guide'-'target' interactions remains elusive. Our study employs two experimental methods-AGO2 CLASH and AGO2 eCLIP, to generate thousands of AGO2 target sites verified by chimeric reads. These chimeric reads contain both the AGO2 loaded small RNA 'guide' and the target sequence, providing a robust resource for modeling AGO2 binding preferences. Our novel analysis pipeline reveals thousands of AGO2 target sites driven by microRNAs and a significant number of AGO2 'guides' derived from fragments of other small RNAs such as tRNAs, YRNAs, snoRNAs, rRNAs, and more. We utilize convolutional neural networks to train machine learning models that accurately predict the binding potential for each 'guide' class and experimentally validate several interactions. In conclusion, our comprehensive analysis of the AGO2 targetome broadens our understanding of its 'guide' repertoire and potential function in development and disease. Moreover, we offer practical bioinformatic tools for future experiments and the prediction of AGO2 targets. All data and code from this study are freely available at https://github.com/ML-Bioinfo-CEITEC/HybriDetector/ . Collapse Key Words mirnas data processing machine learning Collapse MESH Headings Animals MicroRNAs/genetics MicroRNAs/metabolism Argonaute Proteins/genetics Argonaute Proteins/metabolism RNA, Ribosomal RNA, Transfer Mammals/metabolism Collapse Grants 19-10976Y Grantová Agentura České Republiky,Czechia LQ1601 Central European Institute of Technology 20-19617S Grantová Agentura České Republiky CZ.02.01.01/00/22_008/0004575 OP-JAK HORIZON-WIDERA-2022 BioGeMT 101086768 HORIZON EUROPE Framework Programme Collapse
79	Rischke S, Poor SM, Gurke R, Hahnefeld L, Köhm M, Ultsch A, Geisslinger G, Behrens F, Lötsch J. Machine learning identifies right index finger tenderness as key signal of DAS28-CRP based psoriatic arthritis activity. Sci Rep 2023;13:22710. [PMID: 38123604 PMCID: PMC10733369 DOI: 10.1038/s41598-023-49574-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 12/09/2023] [Indexed: 12/23/2023] Open Abstract Psoriatic arthritis (PsA) is a chronic inflammatory systemic disease whose activity is often assessed using the Disease Activity Score 28 (DAS28-CRP). The present study was designed to investigate the significance of individual components within the score for PsA activity. A cohort of 80 PsA patients (44 women and 36 men, aged 56.3 ± 12 years) with a range of disease activity from remission to moderate was analyzed using unsupervised and supervised methods applied to the DAS28-CRP components. Machine learning-based permutation importance identified tenderness in the metacarpophalangeal joint of the right index finger as the most informative item of the DAS28-CRP for PsA activity staging. This symptom alone allowed a machine learned (random forests) classifier to identify PsA remission with 67% balanced accuracy in new cases. Projection of the DAS28-CRP data onto an emergent self-organizing map of artificial neurons identified outliers, which following augmentation of group sizes by emergent self-organizing maps based generative artificial intelligence (AI) could be defined as subgroups particularly characterized by either tenderness or swelling of specific joints. AI-assisted re-evaluation of the DAS28-CRP for PsA has narrowed the score items to a most relevant symptom, and generative AI has been useful for identifying and characterizing small subgroups of patients whose symptom patterns differ from the majority. These findings represent an important step toward precision medicine that can address outliers. Collapse Key Words rheumatic diseases autoimmune diseases data processing machine learning Collapse MESH Headings Male Humans Female Arthritis, Psoriatic/diagnosis Arthritis, Psoriatic/drug therapy Artificial Intelligence Algorithms Metacarpophalangeal Joint Machine Learning Collapse Grants SFB 1039/Z01 Deutsche Forschungsgemeinschaft DFG LO 612/16-1 Deutsche Forschungsgemeinschaft Fraunhofer Cluster of Excellence for Immune Mediated diseases CIMD Fraunhofer-Gesellschaft Fraunhofer Cluster of Excellence for Immune Mediated diseases CIMD Fraunhofer-Gesellschaft 101007757 Innovative Medicines Initiative 2 Joint Undertaking (JU) 101007757 Innovative Medicines Initiative 2 Joint Undertaking (JU) Johann Wolfgang Goethe-Universität, Frankfurt am Main (1022) Collapse
80	Xu Z, Tang S, Liu C, Zhang Q, Gu H, Li X, Di Z, Li Z. Temporal segmentation of EEG based on functional connectivity network structure. Sci Rep 2023;13:22566. [PMID: 38114604 PMCID: PMC10730570 DOI: 10.1038/s41598-023-49891-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/13/2023] [Indexed: 12/21/2023] Open Abstract In the study of brain functional connectivity networks, it is assumed that a network is built from a data window in which activity is stationary. However, brain activity is non-stationary over sufficiently large time periods. Addressing the analysis electroencephalograph (EEG) data, we propose a data segmentation method based on functional connectivity network structure. The goal of segmentation is to ensure that within a window of analysis, there is similar network structure. We designed an intuitive and flexible graph distance measure to quantify the difference in network structure between two analysis windows. This measure is modular: a variety of node importance indices can be plugged into it. We use a reference window versus sliding window comparison approach to detect changes, as indicated by outliers in the distribution of graph distance values. Performance of our segmentation method was tested in simulated EEG data and real EEG data from a drone piloting experiment (using correlation or phase-locking value as the functional connectivity strength metric). We compared our method under various node importance measures and against matrix-based dissimilarity metrics that use singular value decomposition on the connectivity matrix. The results show the graph distance approach worked better than matrix-based approaches; graph distance based on partial node centrality was most sensitive to network structural changes, especially when connectivity matrix values change little. The proposed method provides EEG data segmentation tailored for detecting changes in terms of functional connectivity networks. Our study provides a new perspective on EEG segmentation, one that is based on functional connectivity network structure differences. Collapse Key Words data processing network topology signal processing cognitive neuroscience complex networks Collapse MESH Headings Brain/diagnostic imaging Electroencephalography/methods Collapse Grants 2021ZD0200407 The STI 2030-Major Projects grant of the Ministry of Science and Technology of China 2020YFC0832402 The National Key Research and Development Program of China 2021KCXTD014 The Innovation Team Project of Guangdong Provincial Department of Education The STI 2030—Major Projects grant of the Ministry of Science and Technology of China The Beijing Normal University research start-up fund Collapse
81	Corbe M, Boncompain G, Perez F, Del Nery E, Genovesio A. Transfer learning for versatile and training free high content screening analyses. Sci Rep 2023;13:22599. [PMID: 38114550 PMCID: PMC10730630 DOI: 10.1038/s41598-023-49554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 12/09/2023] [Indexed: 12/21/2023] Open Abstract High content screening (HCS) is a technology that automates cell biology experiments at large scale. A High Content Screen produces a high amount of microscopy images of cells under many conditions and requires that a dedicated image and data analysis workflow be designed for each assay to select hits. This heavy data analytic step remains challenging and has been recognized as one of the burdens hindering the adoption of HCS. In this work we propose a solution to hit selection by using transfer learning without additional training. A pretrained residual network is employed to encode each image of a screen into a discriminant representation. The deep features obtained are then corrected to account for well plate bias and misalignment. We then propose two training-free pipelines dedicated to the two main categories of HCS for compound selection: with or without positive control. When a positive control is available, it is used alongside the negative control to compute a linear discriminant axis, thus building a classifier without training. Once all samples are projected onto this axis, the conditions that best reproduce the positive control can be selected. When no positive control is available, the Mahalanobis distance is computed from each sample to the negative control distribution. The latter provides a metric to identify the conditions that alter the negative control's cell phenotype. This metric is subsequently used to categorize hits through a clustering step. Given the lack of available ground truth in HCS, we provide a qualitative comparison of the results obtained using this approach with results obtained with handcrafted image analysis features for compounds and siRNA screens with or without control. Our results suggests that the fully automated and generic pipeline we propose offers a good alternative to handcrafted dedicated image analysis approaches. Furthermore, we demonstrate that this solution select conditions of interest that had not been identified using the primary dedicated analysis. Altogether, this approach provides a fully automated, reproducible, versatile and comprehensive alternative analysis solution for HCS encompassing compound-based or downregulation screens, with or without positive controls, without the need for training or cell detection, or the development of a dedicated image analysis workflow. Collapse Key Words cellular imaging computational platforms and environments phenotypic screening high-throughput screening data processing Collapse MESH Headings Microscopy Image Processing, Computer-Assisted/methods RNA, Small Interfering Machine Learning Collapse Grants ANR-10-LABX-54 MEMO LIFE Agence Nationale de la Recherche ANR-11-IDEX-0001-02 PSL* Research University Agence Nationale de la Recherche ANR-11-IDEX-0001-02 PSL* Research University Agence Nationale de la Recherche ANR-11-IDEX-0001-02 PSL* Research University Agence Nationale de la Recherche ANR-11-IDEX-0001-02 PSL* Research University Agence Nationale de la Recherche Collapse
82	Liang Z, Liang C. Design and implementation of load intensity monitoring platform supported by big data technology in stage training for women's sitting volleyball. Sci Rep 2023;13:22382. [PMID: 38104202 PMCID: PMC10725414 DOI: 10.1038/s41598-023-50057-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 12/14/2023] [Indexed: 12/19/2023] Open Abstract This study aims to discuss the load intensity monitoring in the training process of sitting volleyball, to help coaches understand the training status of athletes, and to provide a scientific basis for the follow-up training plan. Through big data technology, the physiological changes of athletes can be more accurately grasped. This includes classification and summary of exercise load intensity and experimental study of the relationship between heart rate and rating perceived exertion (RPE). Through monitoring the training process of a provincial women's sitting volleyball team, it is found that there is a significant positive correlation between athletes' RPE and average heart rate. This result shows that by monitoring the change in heart rate and RPE of athletes, athletes' training state and physical condition can be more accurately understood. The results reveal that through the use of big data technology and monitoring experiments, it is found that heart rate and RPE are effective monitoring indicators, which can scientifically reflect the load intensity during sitting volleyball training. The conclusions provide coaches with a more scientific basis for making training plans and useful references for sports involving people with disabilities. Collapse Key Words computational science computer science information technology software statistics health care data processing Collapse MESH Headings Humans Female Volleyball/physiology Big Data Athletes Heart Rate Physical Exertion/physiology Collapse Grants Collapse
83	Zhu Y, Bi D, Saunders M, Ji Y. Prediction of chronic kidney disease progression using recurrent neural network and electronic health records. Sci Rep 2023;13:22091. [PMID: 38086905 PMCID: PMC10716428 DOI: 10.1038/s41598-023-49271-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/06/2023] [Indexed: 12/18/2023] Open Abstract Chronic kidney disease (CKD) is a progressive loss in kidney function. Early detection of patients who will progress to late-stage CKD is of paramount importance for patient care. To address this, we develop a pipeline to process longitudinal electronic heath records (EHRs) and construct recurrent neural network (RNN) models to predict CKD progression from stages II/III to stages IV/V. The RNN model generates predictions based on time-series records of patients, including repeated lab tests and other clinical variables. Our investigation reveals that using a single variable, the recorded estimated glomerular filtration rate (eGFR) over time, the RNN model achieves an average area under the receiver operating characteristic curve (AUROC) of 0.957 for predicting future CKD progression. When additional clinical variables, such as demographics, vital information, lab test results, and health behaviors, are incorporated, the average AUROC increases to 0.967. In both scenarios, the standard deviation of the AUROC across cross-validation trials is less than 0.01, indicating a stable and high prediction accuracy. Our analysis results demonstrate the proposed RNN model outperforms existing standard approaches, including static and dynamic Cox proportional hazards models, random forest, and LightGBM. The utilization of the RNN model and the time-series data of previous eGFR measurements underscores its potential as a straightforward and effective tool for assessing the clinical risk of CKD patients concerning their disease progression. Collapse Key Words computational models machine learning data processing chronic kidney disease predictive markers Collapse MESH Headings Humans Electronic Health Records Renal Insufficiency, Chronic/diagnosis Glomerular Filtration Rate Neural Networks, Computer Time Factors Disease Progression Collapse Grants P30 DK092949 NIDDK NIH HHS NIDDK P30 DK092949 NIH HHS National Institutes of Health Collapse
84	Dunn T, Narayanasamy S. vcfdist: accurately benchmarking phased small variant calls in human genomes. Nat Commun 2023;14:8149. [PMID: 38071244 PMCID: PMC10710436 DOI: 10.1038/s41467-023-43876-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist. Collapse Key Words software genetic databases disease genetics data processing genome informatics Collapse MESH Headings Humans Genome, Human/genetics Benchmarking High-Throughput Nucleotide Sequencing Algorithms Whole Genome Sequencing Polymorphism, Single Nucleotide Software Collapse Grants 2030454 National Science Foundation (NSF) National Science Foundation Graduate Research Fellowship Grant No. 1841052 Kahn Foundation Collapse
85	Sajedi S, Ebrahimi G, Roudi R, Mehta I, Heshmat A, Samimi H, Kazempour S, Zainulabadeen A, Docking TR, Arora SP, Cigarroa F, Seshadri S, Karsan A, Zare H. Integrating DNA methylation and gene expression data in a single gene network using the iNETgrate package. Sci Rep 2023;13:21721. [PMID: 38066050 PMCID: PMC10709411 DOI: 10.1038/s41598-023-48237-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open Abstract Analyzing different omics data types independently is often too restrictive to allow for detection of subtle, but consistent, variations that are coherently supported based upon different assays. Integrating multi-omics data in one model can increase statistical power. However, designing such a model is challenging because different omics are measured at different levels. We developed the iNETgrate package ( https://bioconductor.org/packages/iNETgrate/ ) that efficiently integrates transcriptome and DNA methylation data in a single gene network. Applying iNETgrate on five independent datasets improved prognostication compared to common clinical gold standards and a patient similarity network approach. Collapse Key Words computational models data integration data processing machine learning network topology computational biology and bioinformatics software Collapse MESH Headings Humans Software DNA Methylation Gene Regulatory Networks Gene Expression Collapse Grants P30 AG072975 NIA NIH HHS U01 AG046152 NIA NIH HHS R01 AG017917 NIA NIH HHS R01 AG057896 NIA NIH HHS RF1 AG063507 NIA NIH HHS R01 AG068293 NIA NIH HHS R01 AG015819 NIA NIH HHS U01 AG061356 NIA NIH HHS RF1 AG065301 NIA NIH HHS R01 AG036042 NIA NIH HHS P30 AG010161 NIA NIH HHS U19 NS115388 NINDS NIH HHS RF1 NS112391 NINDS NIH HHS P30 AG066546 NIA NIH HHS RF1 AG057473 NIA NIH HHS RF1 AG059082 NIA NIH HHS National Institute on Aging - National Institutes of Health, United States National Science Foundation, United States Collapse
86	Jeckel H, Nosho K, Neuhaus K, Hastewell AD, Skinner DJ, Saha D, Netter N, Paczia N, Dunkel J, Drescher K. Simultaneous spatiotemporal transcriptomics and microscopy of Bacillus subtilis swarm development reveal cooperation across generations. Nat Microbiol 2023;8:2378-2391. [PMID: 37973866 PMCID: PMC10686836 DOI: 10.1038/s41564-023-01518-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/09/2023] [Indexed: 11/19/2023] Abstract Development of microbial communities is a complex multiscale phenomenon with wide-ranging biomedical and ecological implications. How biological and physical processes determine emergent spatial structures in microbial communities remains poorly understood due to a lack of simultaneous measurements of gene expression and cellular behaviour in space and time. Here we combined live-cell microscopy with a robotic arm for spatiotemporal sampling, which enabled us to simultaneously acquire phenotypic imaging data and spatiotemporal transcriptomes during Bacillus subtilis swarm development. Quantitative characterization of the spatiotemporal gene expression patterns revealed correlations with cellular and collective properties, and phenotypic subpopulations. By integrating these data with spatiotemporal metabolome measurements, we discovered a spatiotemporal cross-feeding mechanism fuelling swarm development: during their migration, earlier generations deposit metabolites which are consumed by later generations that swarm across the same location. These results highlight the importance of spatiotemporal effects during the emergence of phenotypic subpopulations and their interactions in bacterial communities. Collapse Key Words microbial communities data processing biofilms image processing cellular motility Collapse MESH Headings Microscopy Bacillus subtilis/metabolism Transcriptome Gene Expression Profiling Collapse Grants 716734 EC \| EU Framework Programme for Research and Innovation H2020 \| H2020 Priority Excellent Science \| H2020 European Research Council (H2020 Excellent Science - European Research Council) DR 982/5-1 Deutsche Forschungsgemeinschaft (German Research Foundation) DR 982/6-1 SPP2389 Deutsche Forschungsgemeinschaft (German Research Foundation) TARGET-Biofilms 16GW0245 Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research) 51NF40_180541 AntiResist Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Swiss National Science Foundation) DMS-1764421 National Science Foundation (NSF) 597491 Simons Foundation G-2021-16758 Alfred P. Sloan Foundation Max-Planck-Gesellschaft (Max Planck Society) Studienstiftung des Deutschen Volkes (German National Academic Foundation) Joachim Herz Stiftung (Joachim Herz Foundation) Alexander von Humboldt-Stiftung (Alexander von Humboldt Foundation) Collapse
87	Menden K, Francescatto M, Nyima T, Blauwendraat C, Dhingra A, Castillo-Lizardo M, Fernandes N, Kaurani L, Kronenberg-Versteeg D, Atasu B, Sadikoglou E, Borroni B, Rodriguez-Nieto S, Simon-Sanchez J, Fischer A, Craig DW, Neumann M, Bonn S, Rizzu P, Heutink P. A multi-omics dataset for the analysis of frontotemporal dementia genetic subtypes. Sci Data 2023;10:849. [PMID: 38040703 PMCID: PMC10692098 DOI: 10.1038/s41597-023-02598-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/26/2023] [Indexed: 12/03/2023] Open Abstract Understanding the molecular mechanisms underlying frontotemporal dementia (FTD) is essential for the development of successful therapies. Systematic studies on human post-mortem brain tissue of patients with genetic subtypes of FTD are currently lacking. The Risk and Modyfing Factors of Frontotemporal Dementia (RiMod-FTD) consortium therefore has generated a multi-omics dataset for genetic subtypes of FTD to identify common and distinct molecular mechanisms disturbed in disease. Here, we present multi-omics datasets generated from the frontal lobe of post-mortem human brain tissue from patients with mutations in MAPT, GRN and C9orf72 and healthy controls. This data resource consists of four datasets generated with different technologies to capture the transcriptome by RNA-seq, small RNA-seq, CAGE-seq, and methylation profiling. We show concrete examples on how to use the resulting data and confirm current knowledge about FTD and identify new processes for further investigation. This extensive multi-omics dataset holds great value to reveal new research avenues for this devastating disease. Collapse Key Words neurodegeneration data integration functional genomics data processing Collapse MESH Headings Humans Frontal Lobe Frontotemporal Dementia/genetics Multiomics Mutation Collapse Grants NOMIS Stiftung (NOMIS Foundation) EU Joint Programme – Neurodegenerative Disease Research (Programi i Përbashkët i BE-së për Kërkimet mbi Sëmundjet Neuro-degjeneruese) Collapse
88	Ang MY, Takeuchi F, Kato N. Deciphering the genetic landscape of obesity: a data-driven approach to identifying plausible causal genes and therapeutic targets. J Hum Genet 2023;68:823-833. [PMID: 37620670 PMCID: PMC10678330 DOI: 10.1038/s10038-023-01189-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/08/2023] [Accepted: 08/15/2023] [Indexed: 08/26/2023] Abstract OBJECTIVES Genome-wide association studies (GWAS) have successfully revealed numerous susceptibility loci for obesity. However, identifying the causal genes, pathways, and tissues/cell types responsible for these associations remains a challenge, and standardized analysis workflows are lacking. Additionally, due to limited treatment options for obesity, there is a need for the development of new pharmacological therapies. This study aimed to address these issues by performing step-wise utilization of knowledgebase for gene prioritization and assessing the potential relevance of key obesity genes as therapeutic targets. METHODS AND RESULTS First, we generated a list of 28,787 obesity-associated SNPs from the publicly available GWAS dataset (approximately 800,000 individuals in the GIANT meta-analysis). Then, we prioritized 1372 genes with significant in silico evidence against genomic and transcriptomic data, including transcriptionally regulated genes in the brain from transcriptome-wide association studies. In further narrowing down the gene list, we selected key genes, which we found to be useful for the discovery of potential drug seeds as demonstrated in lipid GWAS separately. We thus identified 74 key genes for obesity, which are highly interconnected and enriched in several biological processes that contribute to obesity, including energy expenditure and homeostasis. Of 74 key genes, 37 had not been reported for the pathophysiology of obesity. Finally, by drug-gene interaction analysis, we detected 23 (of 74) key genes that are potential targets for 78 approved and marketed drugs. CONCLUSIONS Our results provide valuable insights into new treatment options for obesity through a data-driven approach that integrates multiple up-to-date knowledgebases. Collapse Key Words gene regulatory networks data mining data processing Collapse MESH Headings Humans Genome-Wide Association Study/methods Genetic Predisposition to Disease Obesity/genetics Gene Expression Profiling Transcriptome Polymorphism, Single Nucleotide Collapse Grants Collapse
89	Zhou Z, Zhong Y, Zhang Z, Ren X. Spatial transcriptomics deconvolution at single-cell resolution using Redeconve. Nat Commun 2023;14:7930. [PMID: 38040768 PMCID: PMC10692090 DOI: 10.1038/s41467-023-43600-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 11/14/2023] [Indexed: 12/03/2023] Open Abstract Computational deconvolution with single-cell RNA sequencing data as reference is pivotal to interpreting spatial transcriptomics data, but the current methods are limited to cell-type resolution. Here we present Redeconve, an algorithm to deconvolute spatial transcriptomics data at single-cell resolution, enabling interpretation of spatial transcriptomics data with thousands of nuanced cell states. We benchmark Redeconve with the state-of-the-art algorithms on diverse spatial transcriptomics platforms and datasets and demonstrate the superiority of Redeconve in terms of accuracy, resolution, robustness, and speed. Application to a human pancreatic cancer dataset reveals cancer-clone-specific T cell infiltration, and application to lymph node samples identifies differential cytotoxic T cells between IgA+ and IgG+ spots, providing novel insights into tumor immunology and the regulatory mechanisms underlying antibody class switch. Collapse Key Words data processing rna sequencing computational models software bioinformatics Collapse MESH Headings Humans Transcriptome/genetics Gene Expression Profiling Algorithms Benchmarking Immunoglobulin Isotypes Single-Cell Analysis Collapse Grants 92159305 National Natural Science Foundation of China (National Science Foundation of China) 31991171 National Natural Science Foundation of China (National Science Foundation of China) Collapse
90	Sun Y, Wiese M, Hmadi R, Karayol R, Seyfferth J, Martinez Greene JA, Erdogdu NU, Deboutte W, Arrigoni L, Holz H, Renschler G, Hirsch N, Foertsch A, Basilicata MF, Stehle T, Shvedunova M, Bella C, Pessoa Rodrigues C, Schwalb B, Cramer P, Manke T, Akhtar A. MSL2 ensures biallelic gene expression in mammals. Nature 2023;624:173-181. [PMID: 38030723 PMCID: PMC10700137 DOI: 10.1038/s41586-023-06781-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Abstract In diploid organisms, biallelic gene expression enables the production of adequate levels of mRNA1,2. This is essential for haploinsufficient genes, which require biallelic expression for optimal function to prevent the onset of developmental disorders1,3. Whether and how a biallelic or monoallelic state is determined in a cell-type-specific manner at individual loci remains unclear. MSL2 is known for dosage compensation of the male X chromosome in flies. Here we identify a role of MSL2 in regulating allelic expression in mammals. Allele-specific bulk and single-cell analyses in mouse neural progenitor cells revealed that, in addition to the targets showing biallelic downregulation, a class of genes transitions from biallelic to monoallelic expression after MSL2 loss. Many of these genes are haploinsufficient. In the absence of MSL2, one allele remains active, retaining active histone modifications and transcription factor binding, whereas the other allele is silenced, exhibiting loss of promoter-enhancer contacts and the acquisition of DNA methylation. Msl2-knockout mice show perinatal lethality and heterogeneous phenotypes during embryonic development, supporting a role for MSL2 in regulating gene dosage. The role of MSL2 in preserving biallelic expression of specific dosage-sensitive genes sets the stage for further investigation of other factors that are involved in allelic dosage compensation in mammalian cells, with considerable implications for human disease. Collapse Key Words epigenetics differentiation data processing epigenomics Collapse MESH Headings Animals Female Male Mice Alleles DNA Methylation Dosage Compensation, Genetic Embryonic Development Enhancer Elements, Genetic Gene Expression Regulation Haploinsufficiency Histones/metabolism Mice, Knockout Promoter Regions, Genetic Transcription Factors/metabolism Ubiquitin-Protein Ligases/deficiency Ubiquitin-Protein Ligases/genetics Ubiquitin-Protein Ligases/metabolism Collapse Grants Collapse
91	Janes RW, Wallace BA. DichroPipeline: A suite of online and downloadable tools and resources for protein circular dichroism spectroscopic data analyses, interpretations, and their interoperability with other bioinformatics tools and resources. Protein Sci 2023;32:e4817. [PMID: 37881887 PMCID: PMC10680340 DOI: 10.1002/pro.4817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 09/30/2023] [Indexed: 10/27/2023] Abstract Circular Dichroism (CD) spectroscopy is a widely-used method for characterizing individual protein structures in solutions, membranes, films and macromolecular complexes, as well as for probing macromolecular interactions, conformational changes associated with binding substrates, and in different functionally-related environments. This paper describes a series of related computational and display tools that have been developed over many years to aid in those characterizations and functional interpretations. The new DichroPipeline described herein links a series of format-compatible data processing, analysis, and display tools to enable users to facilely produce the spectra, which can then be made available in the Protein Circular Dichroism Data Bank (https://pcddb.cryst.bbk.ac.uk/) resource, in which the CD spectral and associated metadata for each entry are linked to other structural and functional data bases including the Protein Data Bank (PDB), and the UniProt sequence data base, amongst others. These tools and resources thus provide the basis for a wide range of traceable structural characterizations of soluble, membrane and intrinsically-disordered proteins. Collapse Key Words GitHub repository Protein Circular Dichroism Data Bank (PCDDB) circular dichroism (CD) spectroscopy data processing intrinsically-disordered proteins (IDPs) membrane proteins online tools and resources protein structure secondary structure analyses soluble proteins spectral displays and comparison methods validation procedures Collapse MESH Headings Circular Dichroism Computational Biology Intrinsically Disordered Proteins Databases, Protein Collapse Grants BB/P024092(B.A.Wallace) Biotechnology and Biological Sciences Research Council BB/P024106(R.W.Janes) Biotechnology and Biological Sciences Research Council Biotechnology and Biological Sciences Research Council International Union of Pure and Applied Chemistry Royal Society Collapse
92	Mason L, Hicks B, Almeida JS. EpiVECS: exploring spatiotemporal epidemiological data using cluster embedding and interactive visualization. Sci Rep 2023;13:21193. [PMID: 38040776 PMCID: PMC10692107 DOI: 10.1038/s41598-023-48484-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 11/27/2023] [Indexed: 12/03/2023] Open Abstract The analysis of data over space and time is a core part of descriptive epidemiology, but the complexity of spatiotemporal data makes this challenging. There is a need for methods that simplify the exploration of such data for tasks such as surveillance and hypothesis generation. In this paper, we use combined clustering and dimensionality reduction methods (hereafter referred to as 'cluster embedding' methods) to spatially visualize patterns in epidemiological time-series data. We compare several cluster embedding techniques to see which performs best along a variety of internal cluster validation metrics. We find that methods based on k-means clustering generally perform better than self-organizing maps on real world epidemiological data, with some minor exceptions. We also introduce EpiVECS, a tool which allows the user to perform cluster embedding and explore the results using interactive visualization. EpiVECS is available as a privacy preserving, in-browser open source web application at https://episphere.github.io/epivecs . Collapse Key Words software machine learning data processing Collapse MESH Headings Collapse Grants National Institutes of Health (NIH) Collapse
93	Wevers D, Ramautar R, Clark C, Hankemeier T, Ali A. Opportunities and challenges for sample preparation and enrichment in mass spectrometry for single-cell metabolomics. Electrophoresis 2023;44:2000-2024. [PMID: 37667867 DOI: 10.1002/elps.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 08/08/2023] [Accepted: 08/19/2023] [Indexed: 09/06/2023] Abstract Single-cell heterogeneity in metabolism, drug resistance and disease type poses the need for analytical techniques for single-cell analysis. As the metabolome provides the closest view of the status quo in the cell, studying the metabolome at single-cell resolution may unravel said heterogeneity. A challenge in single-cell metabolome analysis is that metabolites cannot be amplified, so one needs to deal with picolitre volumes and a wide range of analyte concentrations. Due to high sensitivity and resolution, MS is preferred in single-cell metabolomics. Large numbers of cells need to be analysed for proper statistics; this requires high-throughput analysis, and hence automation of the analytical workflow. Significant advances in (micro)sampling methods, CE and ion mobility spectrometry have been made, some of which have been applied in high-throughput analyses. Microfluidics has enabled an automation of cell picking and metabolite extraction; image recognition has enabled automated cell identification. Many techniques have been used for data analysis, varying from conventional techniques to novel combinations of advanced chemometric approaches. Steps have been set in making data more findable, accessible, interoperable and reusable, but significant opportunities for improvement remain. Herein, advances in single-cell analysis workflows and data analysis are discussed, and recommendations are made based on the experimental goal. Collapse Key Words data processing experimental design mass spectrometry single-cell heterogeneity single-cell metabolomics Collapse MESH Headings Metabolomics/methods Mass Spectrometry/methods Metabolome Specimen Handling Single-Cell Analysis Collapse Grants Collapse
94	Bayer FP, Gander M, Kuster B, The M. CurveCurator: a recalibrated F-statistic to assess, classify, and explore significance of dose-response curves. Nat Commun 2023;14:7902. [PMID: 38036588 PMCID: PMC10689459 DOI: 10.1038/s41467-023-43696-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open Abstract Dose-response curves are key metrics in pharmacology and biology to assess phenotypic or molecular actions of bioactive compounds in a quantitative fashion. Yet, it is often unclear whether or not a measured response significantly differs from a curve without regulation, particularly in high-throughput applications or unstable assays. Treating potency and effect size estimates from random and true curves with the same level of confidence can lead to incorrect hypotheses and issues in training machine learning models. Here, we present CurveCurator, an open-source software that provides reliable dose-response characteristics by computing p-values and false discovery rates based on a recalibrated F-statistic and a target-decoy procedure that considers dataset-specific effect size distributions. The application of CurveCurator to three large-scale datasets enables a systematic drug mode of action analysis and demonstrates its scalable utility across several application areas, facilitated by a performant, interactive dashboard for fast data exploration. Collapse Key Words data processing high-throughput screening target identification cell signalling statistical methods Collapse MESH Headings Software Collapse Grants 833710 EC \| EU Framework Programme for Research and Innovation H2020 \| H2020 Priority Excellent Science \| H2020 European Research Council (H2020 Excellent Science - European Research Council) German Federal Ministry of Education and Research (BMBF) Grand no. 031L0305A Collapse
95	Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023;14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open Abstract Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org . Collapse Key Words computational models machine learning bioinformatics dna sequencing data processing Collapse MESH Headings Humans Genomic Structural Variation Genomics/methods Genome, Human Phenotype Collapse Grants R01 HG013031 NHGRI NIH HHS P50 HD105354 NICHD NIH HHS R01 GM132713 NIGMS NIH HHS Wellcome Trust R01 HG013359 NHGRI NIH HHS U.S. Department of Health & Human Services \| National Institutes of Health (NIH) Collapse
96	Dondi A, Lischetti U, Jacob F, Singer F, Borgsmüller N, Coelho R, Heinzelmann-Schwarz V, Beisel C, Beerenwinkel N. Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer. Nat Commun 2023;14:7780. [PMID: 38012143 PMCID: PMC10682465 DOI: 10.1038/s41467-023-43387-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open Abstract Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine. Collapse Key Words rna sequencing ovarian cancer transcriptomics data processing Collapse MESH Headings Humans Female Sequence Analysis, RNA/methods Genomics/methods Protein Isoforms/genetics High-Throughput Nucleotide Sequencing/methods Ovarian Neoplasms/genetics Transcriptome/genetics RNA-Binding Proteins Collapse Grants Wellcome Trust EC \| EU Framework Programme for Research and Innovation H2020 \| H2020 Priority Excellent Science \| H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions) Part of this work was funded by the SNSF SPARK grant #190413, the grant #2017-510 of the Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain, the “Personalized Health and Related Technologies (PHRT)” for Pioneer Project #715 SNSF SPARK grant #190413 and Grant #2017-510 of the Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain Collapse
97	Xu H, Wang S, Fang M, Luo S, Chen C, Wan S, Wang R, Tang M, Xue T, Li B, Lin J, Qu K. SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat Commun 2023;14:7603. [PMID: 37990022 PMCID: PMC10663563 DOI: 10.1038/s41467-023-43220-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open Abstract Spatial transcriptomics (ST) technologies detect mRNA expression in single cells/spots while preserving their two-dimensional (2D) spatial coordinates, allowing researchers to study the spatial distribution of the transcriptome in tissues; however, joint analysis of multiple ST slices and aligning them to construct a three-dimensional (3D) stack of the tissue still remain a challenge. Here, we introduce spatial architecture characterization by deep learning (SPACEL) for ST data analysis. SPACEL comprises three modules: Spoint embeds a multiple-layer perceptron with a probabilistic model to deconvolute cell type composition for each spot in a single ST slice; Splane employs a graph convolutional network approach and an adversarial learning algorithm to identify spatial domains that are transcriptomically and spatially coherent across multiple ST slices; and Scube automatically transforms the spatial coordinate systems of consecutive slices and stacks them together to construct a 3D architecture of the tissue. Comparisons against 19 state-of-the-art methods using both simulated and real ST datasets from various tissues and ST technologies demonstrate that SPACEL outperforms the others for cell type deconvolution, for spatial domain identification, and for 3D alignment, thus showcasing SPACEL as a valuable integrated toolkit for ST data processing and analysis. Collapse Key Words computational biology and bioinformatics bioinformatics computational models sequencing data processing Collapse MESH Headings Transcriptome/genetics Deep Learning Gene Expression Profiling Algorithms Models, Statistical Collapse Grants the National Key R&D Program of China (2020YFA0112200 and 2022YFA1303200),the National Natural Science Foundation of China grants (T2125012 and 91940306), CAS Project for Young Scientists in Basic Research YSBR-005, Anhui Province Science and Technology Key Program (202003a07020021) and the Fundamental Research Funds for the Central Universities (YD2070002019, WK9110000141, and WK2070000158). the National Natural Science Foundation of China grants (32170668), the National Natural Science Foundation of China grants (81871479), the Fundamental Research Funds for the Central Universities (WK9100000001). Collapse
98	Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023;13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open Abstract Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications. Collapse Key Words genome informatics software data processing Collapse MESH Headings Polymorphism, Single Nucleotide Algorithms Gene Frequency High-Throughput Nucleotide Sequencing/methods Biomedical Research Collapse Grants Collapse
99	Li R, Gibson JM. Predicting Groundwater PFOA Exposure Risks with Bayesian Networks: Empirical Impact of Data Preprocessing on Model Performance. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023;57:18329-18338. [PMID: 37594027 DOI: 10.1021/acs.est.3c00348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/19/2023] Abstract The plethora of data on PFASs in environmental samples collected in response to growing concern about these chemicals could enable the training of machine-learning models for predicting exposure risks. However, differences in sampling and analysis methods across data sets must be reconciled through data preprocessing, and little information is available about how such manipulations affect the resulting models. This study evaluates how data preprocessing influences machine-learned Bayesian network models of PFOA in groundwater. We link 19 years of PFOA measurements from Minnesota, USA, to publicly available information about potential PFOA sources and factors that may influence their environmental fate. Nine different preprocessing methods were tested, and the resulting data sets were used to train models to predict the probability of PFOA ≥ 35 ppt, the 2017 Minnesota health advisory level. Different preprocessing approaches produced varying model structures with significantly different accuracies. Nonetheless, models showed similar relationships between predictor variables and PFOA exposure risks, and all models were relatively accurate, distinguishing wells at high risk from those at low risk for 82.0% to 89.0% of test data samples. There was a trade-off between data quality and model performance since a stricter data screening strategy decreased the sample size for model training. Collapse Key Words Bayesian network PFAS PFOA data processing groundwater contamination machine learning Collapse MESH Headings Fluorocarbons/analysis Bayes Theorem Water Pollutants, Chemical/analysis Groundwater/chemistry Water Wells Alkanesulfonic Acids Collapse Grants Collapse
100	Delamare-Deboutteville J, Meemetta W, Pimsannil K, Sangpo P, Gan HM, Mohan CV, Dong HT, Senapin S. A multiplexed RT-PCR assay for nanopore whole genome sequencing of Tilapia lake virus (TiLV). Sci Rep 2023;13:20276. [PMID: 37985860 PMCID: PMC10661697 DOI: 10.1038/s41598-023-47425-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open Abstract Tilapia lake virus (TiLV) is a highly contagious viral pathogen that affects tilapia, a globally significant and affordable source of fish protein. To prevent the introduction and spread of TiLV and its impact, there is an urgent need for increased surveillance, improved biosecurity measures, and continuous development of effective diagnostic and rapid sequencing methods. In this study, we have developed a multiplexed RT-PCR assay that can amplify all ten complete genomic segments of TiLV from various sources of isolation. The amplicons generated using this approach were immediately subjected to real-time sequencing on the Nanopore system. By using this approach, we have recovered and assembled 10 TiLV genomes from total RNA extracted from naturally TiLV-infected tilapia fish, concentrated tilapia rearing water, and cell culture. Our phylogenetic analysis, consisting of more than 36 TiLV genomes from both newly sequenced and publicly available TiLV genomes, provides new insights into the high genetic diversity of TiLV. This work is an essential steppingstone towards integrating rapid and real-time Nanopore-based amplicon sequencing into routine genomic surveillance of TiLV, as well as future vaccine development. Collapse Key Words viral infection genome assembly algorithms data processing phylogeny dna sequencing next-generation sequencing bioinformatics Collapse MESH Headings Animals Tilapia/genetics Reverse Transcriptase Polymerase Chain Reaction Phylogeny Nanopores Fish Diseases Viruses RNA Viruses Collapse Grants CGIAR Trust Fund Norad Collapse