1
|
Müntefering F, Adhisantoso YG, Chandak S, Ostermann J, Hernaez M, Voges J. Genie: the first open-source ISO/IEC encoder for genomic data. Commun Biol 2024; 7:553. [PMID: 38724695 PMCID: PMC11082222 DOI: 10.1038/s42003-024-06249-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open
Abstract
For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.
Collapse
Affiliation(s)
- Fabian Müntefering
- Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, Appelstraße 9a, Hannover, 30167, Germany.
| | - Yeremia Gunawan Adhisantoso
- Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, Appelstraße 9a, Hannover, 30167, Germany
| | - Shubham Chandak
- Department of Electrical Engineering, Stanford University, 350 Jane Stanford Way, Stanford, CA, 94305, USA
| | - Jörn Ostermann
- Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, Appelstraße 9a, Hannover, 30167, Germany
| | - Mikel Hernaez
- Center for Applied Medical Research (CIMA), University of Navarra, Av. de Pío XII, 55, Pamplona, 31008, Navarra, Spain.
| | - Jan Voges
- Institut für Informationsverarbeitung (TNT), Leibniz University Hannover, Appelstraße 9a, Hannover, 30167, Germany.
| |
Collapse
|
2
|
Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun 2024; 15:3922. [PMID: 38724498 PMCID: PMC11082229 DOI: 10.1038/s41467-024-47899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Collapse
Affiliation(s)
- Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jinyan Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore.
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore.
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
3
|
Acera Mateos P, J Sethi A, Ravindran A, Srivastava A, Woodward K, Mahmud S, Kanchi M, Guarnacci M, Xu J, W S Yuen Z, Zhou Y, Sneddon A, Hamilton W, Gao J, M Starrs L, Hayashi R, Wickramasinghe V, Zarnack K, Preiss T, Burgio G, Dehorter N, E Shirokikh N, Eyras E. Prediction of m6A and m5C at single-molecule resolution reveals a transcriptome-wide co-occurrence of RNA modifications. Nat Commun 2024; 15:3899. [PMID: 38724548 PMCID: PMC11082244 DOI: 10.1038/s41467-024-47953-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
The epitranscriptome embodies many new and largely unexplored functions of RNA. A significant roadblock hindering progress in epitranscriptomics is the identification of more than one modification in individual transcript molecules. We address this with CHEUI (CH3 (methylation) Estimation Using Ionic current). CHEUI predicts N6-methyladenosine (m6A) and 5-methylcytosine (m5C) in individual molecules from the same sample, the stoichiometry at transcript reference sites, and differential methylation between any two conditions. CHEUI processes observed and expected nanopore direct RNA sequencing signals to achieve high single-molecule, transcript-site, and stoichiometry accuracies in multiple tests using synthetic RNA standards and cell line data. CHEUI's capability to identify two modification types in the same sample reveals a co-occurrence of m6A and m5C in individual mRNAs in cell line and tissue transcriptomes. CHEUI provides new avenues to discover and study the function of the epitranscriptome.
Collapse
Affiliation(s)
- P Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - A J Sethi
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - A Ravindran
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - A Srivastava
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - K Woodward
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - S Mahmud
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - M Kanchi
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - M Guarnacci
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - J Xu
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
| | - Z W S Yuen
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - Y Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS) & Faculty of Biological Sciences, Goethe University Frankfurt, 60438, Frankfurt am Main, Germany
| | - A Sneddon
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - W Hamilton
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3052, Australia
| | - J Gao
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - L M Starrs
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - R Hayashi
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | | | - K Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS) & Faculty of Biological Sciences, Goethe University Frankfurt, 60438, Frankfurt am Main, Germany
| | - T Preiss
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- Victor Chang Cardiac Research Institute, Sydney, NSW, 2010, Australia
| | - G Burgio
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
| | - N Dehorter
- The Eccles Institute of Neuroscience, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia
- The Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - N E Shirokikh
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia.
| | - E Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, 2601, Australia.
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia.
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2601, Australia.
- Catalan Institution for Research and Advanced Studies (ICREA), 08010, Barcelona, Spain.
| |
Collapse
|
4
|
Meisburger SP, Ando N. Scaling and merging macromolecular diffuse scattering with mdx2. Acta Crystallogr D Struct Biol 2024; 80:299-313. [PMID: 38606664 PMCID: PMC11066883 DOI: 10.1107/s2059798324002705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/25/2024] [Indexed: 04/13/2024] Open
Abstract
Diffuse scattering is a promising method to gain additional insight into protein dynamics from macromolecular crystallography experiments. Bragg intensities yield the average electron density, while the diffuse scattering can be processed to obtain a three-dimensional reciprocal-space map that is further analyzed to determine correlated motion. To make diffuse scattering techniques more accessible, software for data processing called mdx2 has been created that is both convenient to use and simple to extend and modify. mdx2 is written in Python, and it interfaces with DIALS to implement self-contained data-reduction workflows. Data are stored in NeXus format for software interchange and convenient visualization. mdx2 can be run on the command line or imported as a package, for instance to encapsulate a complete workflow in a Jupyter notebook for reproducible computing and education. Here, mdx2 version 1.0 is described, a new release incorporating state-of-the-art techniques for data reduction. The implementation of a complete multi-crystal scaling and merging workflow is described, and the methods are tested using a high-redundancy data set from cubic insulin. It is shown that redundancy can be leveraged during scaling to correct systematic errors and obtain accurate and reproducible measurements of weak diffuse signals.
Collapse
Affiliation(s)
- Steve P. Meisburger
- Cornell High Energy Synchrotron Source, Cornell University, Ithaca, NY 14850, USA
| | - Nozomi Ando
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14850, USA
| |
Collapse
|
5
|
Li X, Zhu H, Gu B, Yao C, Gu Y, Xu W, Zhang J, He J, Liu X, Li D. Advancing Intelligent Organ-on-a-Chip Systems with Comprehensive In Situ Bioanalysis. Adv Mater 2024; 36:e2305268. [PMID: 37688520 DOI: 10.1002/adma.202305268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/03/2023] [Indexed: 09/11/2023]
Abstract
In vitro models are essential to a broad range of biomedical research, such as pathological studies, drug development, and personalized medicine. As a potentially transformative paradigm for 3D in vitro models, organ-on-a-chip (OOC) technology has been extensively developed to recapitulate sophisticated architectures and dynamic microenvironments of human organs by applying the principles of life sciences and leveraging micro- and nanoscale engineering capabilities. A pivotal function of OOC devices is to support multifaceted and timely characterization of cultured cells and their microenvironments. However, in-depth analysis of OOC models typically requires biomedical assay procedures that are labor-intensive and interruptive. Herein, the latest advances toward intelligent OOC (iOOC) systems, where sensors integrated with OOC devices continuously report cellular and microenvironmental information for comprehensive in situ bioanalysis, are examined. It is proposed that the multimodal data in iOOC systems can support closed-loop control of the in vitro models and offer holistic biomedical insights for diverse applications. Essential techniques for establishing iOOC systems are surveyed, encompassing in situ sensing, data processing, and dynamic modulation. Eventually, the future development of iOOC systems featuring cross-disciplinary strategies is discussed.
Collapse
Affiliation(s)
- Xiao Li
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Hui Zhu
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Bingsong Gu
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Cong Yao
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuyang Gu
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Wangkai Xu
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jia Zhang
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiankang He
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xinyu Liu
- Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, M5S 3G8, Canada
| | - Dichen Li
- State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- NMPA Key Laboratory for Research and Evaluation of Additive Manufacturing Medical Devices, Xi'an Jiaotong University, Xi'an, 710049, China
| |
Collapse
|
6
|
Cheng Z, Xie Z, Wei M, Peng Y, Du C, Tian Y, Song X. Review of Sensor-Based Subgrade Distress Identifications. Sensors (Basel) 2024; 24:2825. [PMID: 38732931 PMCID: PMC11086097 DOI: 10.3390/s24092825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/10/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024]
Abstract
The attributes of diversity and concealment pose formidable challenges in the accurate detection and efficacious management of distresses within subgrade structures. The onset of subgrade distresses may precipitate structural degradation, thereby amplifying the frequency of traffic incidents and instigating economic ramifications. Accurate and timely detection of subgrade distresses is essential for maintaining and repairing road sections with existing distresses. This helps to prolong the service life of road infrastructure and reduce financial burden. In recent years, the advent of numerous novel technologies and methodologies has propelled significant advancements in subgrade distress detection. Therefore, this review delineates a concentrated examination of subgrade distress detection, methodically consolidating and presenting various techniques while dissecting their respective merits and constraints. By furnishing comprehensive guidance on subgrade distress detection, this review facilitates the expedient identification and targeted treatment of subgrade distresses, thereby fortifying safety and enhancing durability. The pivotal role of this review in bolstering the construction and operational facets of transportation infrastructure is underscored.
Collapse
Affiliation(s)
- Zhiheng Cheng
- School of Qilu Transportation, Shandong University, Jinan 250061, China; (Z.C.); (M.W.); (X.S.)
| | - Zhengjian Xie
- CCCC-FHDI Engineering Co., Ltd., Guangzhou 510220, China;
| | - Mingzhao Wei
- School of Qilu Transportation, Shandong University, Jinan 250061, China; (Z.C.); (M.W.); (X.S.)
| | - Yuqing Peng
- School of Traffic and Transportation, Lanzhou Jiaotong University, Lanzhou 730070, China;
| | - Cong Du
- School of Qilu Transportation, Shandong University, Jinan 250061, China; (Z.C.); (M.W.); (X.S.)
| | - Yuan Tian
- School of Qilu Transportation, Shandong University, Jinan 250061, China; (Z.C.); (M.W.); (X.S.)
| | - Xiuguang Song
- School of Qilu Transportation, Shandong University, Jinan 250061, China; (Z.C.); (M.W.); (X.S.)
| |
Collapse
|
7
|
Alfonsi T, Bernasconi A, Chiara M, Ceri S. Data-driven recombination detection in viral genomes. Nat Commun 2024; 15:3313. [PMID: 38632281 PMCID: PMC11024102 DOI: 10.1038/s41467-024-47464-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 03/25/2024] [Indexed: 04/19/2024] Open
Abstract
Recombination is a key molecular mechanism for the evolution and adaptation of viruses. The first recombinant SARS-CoV-2 genomes were recognized in 2021; as of today, more than ninety SARS-CoV-2 lineages are designated as recombinant. In the wake of the COVID-19 pandemic, several methods for detecting recombination in SARS-CoV-2 have been proposed; however, none could faithfully confirm manual analyses by experts in the field. We hereby present RecombinHunt, an original data-driven method for the identification of recombinant genomes, capable of recognizing recombinant SARS-CoV-2 genomes (or lineages) with one or two breakpoints with high accuracy and within reduced turn-around times. ReconbinHunt shows high specificity and sensitivity, compares favorably with other state-of-the-art methods, and faithfully confirms manual analyses by experts. RecombinHunt identifies recombinant viral genomes from the recent monkeypox epidemic in high concordance with manually curated analyses by experts, suggesting that our approach is robust and can be applied to any epidemic/pandemic virus.
Collapse
Affiliation(s)
- Tommaso Alfonsi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.
| | - Matteo Chiara
- Department of Biosciences, Università degli Studi di Milano, Via Celoria 26, 20133, Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| |
Collapse
|
8
|
Duminil A, Ieng SS, Gruyer D. A Comprehensive Exploration of Fidelity Quantification in Computer-Generated Images. Sensors (Basel) 2024; 24:2463. [PMID: 38676079 PMCID: PMC11054344 DOI: 10.3390/s24082463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/04/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024]
Abstract
Generating realistic road scenes is crucial for advanced driving systems, particularly for training deep learning methods and validation. Numerous efforts aim to create larger and more realistic synthetic datasets using graphics engines or synthetic-to-real domain adaptation algorithms. In the realm of computer-generated images (CGIs), assessing fidelity is challenging and involves both objective and subjective aspects. Our study adopts a comprehensive conceptual framework to quantify the fidelity of RGB images, unlike existing methods that are predominantly application-specific. This is probably due to the data complexity and huge range of possible situations and conditions encountered. In this paper, a set of distinct metrics assessing the level of fidelity of virtual RGB images is proposed. For quantifying image fidelity, we analyze both local and global perspectives of texture and the high-frequency information in images. Our focus is on the statistical characteristics of realistic and synthetic road datasets, using over 28,000 images from at least 10 datasets. Through a thorough examination, we aim to reveal insights into texture patterns and high-frequency components contributing to the objective perception of data realism in road scenes. This study, exploring image fidelity in both virtual and real conditions, takes the perspective of an embedded camera rather than the human eye. The results of this work, including a pioneering set of objective scores applied to real, virtual, and improved virtual data, offer crucial insights and are an asset for the scientific community in quantifying fidelity levels.
Collapse
Affiliation(s)
- Alexandra Duminil
- Department of Components and Systems (COSYS)/Perceptions, Interactions, Behaviour and Simulations of Road and Street Users Laboratory (PICS-L)/Gustave Eiffel University, F-77454 Marne-la-Vallée, France; (S.-S.I.); (D.G.)
| | | | | |
Collapse
|
9
|
Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024; 7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.
Collapse
Affiliation(s)
- Konrad Herbst
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Division of Systems Engineering, Boston University, Boston, MA, USA
| | - Elena J Forchielli
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Meghan Thommes
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
| |
Collapse
|
10
|
Oka T, Matsuzawa Y, Tsuneyoshi M, Nakamura Y, Aoshima K, Tsugawa H. Multiomics analysis to explore blood metabolite biomarkers in an Alzheimer's Disease Neuroimaging Initiative cohort. Sci Rep 2024; 14:6797. [PMID: 38565541 PMCID: PMC10987653 DOI: 10.1038/s41598-024-56837-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease that commonly causes dementia. Identifying biomarkers for the early detection of AD is an emerging need, as brain dysfunction begins two decades before the onset of clinical symptoms. To this end, we reanalyzed untargeted metabolomic mass spectrometry data from 905 patients enrolled in the AD Neuroimaging Initiative (ADNI) cohort using MS-DIAL, with 1,304,633 spectra of 39,108 unique biomolecules. Metabolic profiles of 93 hydrophilic metabolites were determined. Additionally, we integrated targeted lipidomic data (4873 samples from 1524 patients) to explore candidate biomarkers for predicting progressive mild cognitive impairment (pMCI) in patients diagnosed with AD within two years using the baseline metabolome. Patients with lower ergothioneine levels had a 12% higher rate of AD progression with the significance of P = 0.012 (Wald test). Furthermore, an increase in ganglioside (GM3) and decrease in plasmalogen lipids, many of which are associated with apolipoprotein E polymorphism, were confirmed in AD patients, and the higher levels of lysophosphatidylcholine (18:1) and GM3 d18:1/20:0 showed 19% and 17% higher rates of AD progression, respectively (Wald test: P = 3.9 × 10-8 and 4.3 × 10-7). Palmitoleamide, oleamide, diacylglycerols, and ether lipids were also identified as significantly altered metabolites at baseline in patients with pMCI. The integrated analysis of metabolites and genomics data showed that combining information on metabolites and genotypes enhances the predictive performance of AD progression, suggesting that metabolomics is essential to complement genomic data. In conclusion, the reanalysis of multiomics data provides new insights to detect early development of AD pathology and to partially understand metabolic changes in age-related onset of AD.
Collapse
Affiliation(s)
- Takaki Oka
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Yuki Matsuzawa
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Momoka Tsuneyoshi
- Human Biology Integration Foundation, Eisai Co., Ltd., Ibaraki, Japan
| | | | - Ken Aoshima
- Microbes & Host Defense Domain, Eisai Co., Ltd., Ibaraki, Japan
- School of Integrative and Global Majors, University of Tsukuba, Ibaraki, Japan
| | - Hiroshi Tsugawa
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Tokyo, Japan.
- RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan.
| |
Collapse
|
11
|
Emmert-Streib F. Can ChatGPT understand genetics? Eur J Hum Genet 2024; 32:371-372. [PMID: 37407734 PMCID: PMC10999414 DOI: 10.1038/s41431-023-01419-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 07/07/2023] Open
Affiliation(s)
- Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.
| |
Collapse
|
12
|
Caniza H, Cáceres JJ, Torres M, Paccanaro A. LanDis: the disease landscape explorer. Eur J Hum Genet 2024; 32:461-465. [PMID: 38200084 PMCID: PMC10999415 DOI: 10.1038/s41431-023-01511-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/01/2023] [Accepted: 11/23/2023] [Indexed: 01/12/2024] Open
Abstract
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
Collapse
Affiliation(s)
- Horacio Caniza
- Universidad Paraguayo Alemana de Ciencias Aplicadas, Facultad de Ciencias de la Ingeniería, San Lorenzo, Paraguay
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Juan J Cáceres
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Mateo Torres
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil.
| |
Collapse
|
13
|
Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024; 14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
This study focused on the heterogeneity in progress notes written by physicians or nurses. A total of 806 days of progress notes written by physicians or nurses from 83 randomly selected patients hospitalized in the Gastroenterology Department at Kagawa University Hospital from January to December 2021 were analyzed. We extracted symptoms as the International Classification of Diseases (ICD) Chapter 18 (R00-R99, hereinafter R codes) from each progress note using MedNER-J natural language processing software and counted the days one or more symptoms were extracted to calculate the extraction rate. The R-code extraction rate was significantly higher from progress notes by nurses than by physicians (physicians 68.5% vs. nurses 75.2%; p = 0.00112), regardless of specialty. By contrast, the R-code subcategory R10-R19 for digestive system symptoms (44.2 vs. 37.5%, respectively; p = 0.00299) and many chapters of ICD codes for disease names, as represented by Chapter 11 K00-K93 (68.4 vs. 30.9%, respectively; p < 0.001), were frequently extracted from the progress notes by physicians, reflecting their specialty. We believe that understanding the information heterogeneity of medical documents, which can be the basis of medical artificial intelligence, is crucial, and this study is a pioneering step in that direction.
Collapse
Affiliation(s)
- Yukinori Mashima
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan.
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan.
| | - Masatoshi Tanigawa
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
| | - Hideto Yokoi
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan
| |
Collapse
|
14
|
Chowdhury D, Mistry A, Maity D, Bhatia R, Priyadarshi S, Wadan S, Chakraborty S, Haldar S. Pan-cancer analyses suggest kindlin-associated global mechanochemical alterations. Commun Biol 2024; 7:372. [PMID: 38548811 PMCID: PMC10978987 DOI: 10.1038/s42003-024-06044-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 03/11/2024] [Indexed: 04/01/2024] Open
Abstract
Kindlins serve as mechanosensitive adapters, transducing extracellular mechanical cues to intracellular biochemical signals and thus, their perturbations potentially lead to cancer progressions. Despite the kindlin involvement in tumor development, understanding their genetic and mechanochemical characteristics across different cancers remains elusive. Here, we thoroughly examined genetic alterations in kindlins across more than 10,000 patients with 33 cancer types. Our findings reveal cancer-specific alterations, particularly prevalent in advanced tumor stage and during metastatic onset. We observed a significant co-alteration between kindlins and mechanochemical proteome in various tumors through the activation of cancer-related pathways and adverse survival outcomes. Leveraging normal mode analysis, we predicted structural consequences of cancer-specific kindlin mutations, highlighting potential impacts on stability and downstream signaling pathways. Our study unraveled alterations in epithelial-mesenchymal transition markers associated with kindlin activity. This comprehensive analysis provides a resource for guiding future mechanistic investigations and therapeutic strategies targeting the roles of kindlins in cancer treatment.
Collapse
Affiliation(s)
- Debojyoti Chowdhury
- Department of Chemical and Biological Sciences, S.N. Bose National Centre for Basic Sciences, Kolkata, West Bengal, 700106, India.
| | - Ayush Mistry
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India
| | - Debashruti Maity
- Department of Chemical and Biological Sciences, S.N. Bose National Centre for Basic Sciences, Kolkata, West Bengal, 700106, India
| | - Riti Bhatia
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India
| | - Shreyansh Priyadarshi
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India
| | - Simran Wadan
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India
| | - Soham Chakraborty
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India
| | - Shubhasis Haldar
- Department of Chemical and Biological Sciences, S.N. Bose National Centre for Basic Sciences, Kolkata, West Bengal, 700106, India.
- Department of Biology, Trivedi School of Biosciences, Ashoka University, Sonepat, Haryana, 131029, India.
- Technical Research Centre, S.N. Bose National Centre for Basic Sciences, Kolkata, West Bengal, 700106, India.
| |
Collapse
|
15
|
Li Z, You L, Hermann A, Bier E. Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation classifier. Nat Commun 2024; 15:2629. [PMID: 38521791 PMCID: PMC10960810 DOI: 10.1038/s41467-024-46479-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024] Open
Abstract
DNA double-strand breaks (DSBs) are repaired by a hierarchically regulated network of pathways. Factors influencing the choice of particular repair pathways, however remain poorly characterized. Here we develop an Integrated Classification Pipeline (ICP) to decompose and categorize CRISPR/Cas9 generated mutations on genomic target sites in complex multicellular insects. The ICP outputs graphic rank ordered classifications of mutant alleles to visualize discriminating DSB repair fingerprints generated from different target sites and alternative inheritance patterns of CRISPR components. We uncover highly reproducible lineage-specific mutation fingerprints in individual organisms and a developmental progression wherein Microhomology-Mediated End-Joining (MMEJ) or Insertion events predominate during early rapid mitotic cell cycles, switching to distinct subsets of Non-Homologous End-Joining (NHEJ) alleles, and then to Homology-Directed Repair (HDR)-based gene conversion. These repair signatures enable marker-free tracking of specific mutations in dynamic populations, including NHEJ and HDR events within the same samples, for in-depth analysis of diverse gene editing events.
Collapse
Affiliation(s)
- Zhiqian Li
- Department of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, 92093, USA
- Tata Institute for Genetics and Society, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Lang You
- Department of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, 92093, USA
- Tata Institute for Genetics and Society, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Anita Hermann
- Department of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, 92093, USA
- Tata Institute for Genetics and Society, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Ethan Bier
- Department of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, 92093, USA.
- Tata Institute for Genetics and Society, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
16
|
Sandler RD, Lai L, Dawson S, Cameron S, Lynam A, Sperrin M, Hoo ZH, Wildman MJ. Development of data processing algorithm to calculate adherence for adults with cystic fibrosis using inhaled therapy - a multi-center observational study within the CFHealthHub learning health system. Expert Rev Pharmacoecon Outcomes Res 2024:1-13. [PMID: 38458615 DOI: 10.1080/14737167.2024.2328085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 02/28/2024] [Indexed: 03/10/2024]
Abstract
OBJECTIVES To develop a robust algorithm to accurately calculate 'daily complete dose counts' for inhaled medicines, used in percent adherence calculations, from electronically-captured nebulizer data within the CFHealthHub Learning Health System. METHODS A multi-center, cross-sectional study involved participants and clinicians reviewing real-world inhaled medicine usage records and triangulating them with objective nebulizer data to establish a consensus on 'daily complete dose counts.' An algorithm, which used only objective nebulizer data, was then developed using a derivation dataset and evaluated using internal validation dataset. The agreement and accuracy between the algorithm-derived and consensus-derived 'daily complete dose counts' was examined, with the consensus-derived count as the reference standard. RESULTS Twelve people with CF participated. The algorithm derived a 'daily complete dose count' by screening out 'invalid' doses (those <60s in duration or run in cleaning mode), combining all doses starting within 120s of each other, and then screening out all doses with duration < 480s which were interrupted by power supply failure. The kappa co-efficient was 0.85 (0.71-0.91) in the derivation and 0.86 (0.77-0.94) in the validation dataset. CONCLUSIONS The algorithm demonstrated strong agreement with the participant-clinician consensus, enhancing confidence in CFHealthHub data. Publishingdata processing methods can encourage trust in digital endpoints and serve as an exemplar for other projects.
Collapse
Affiliation(s)
- Robert D Sandler
- Adult CF Centre, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Sheffield Centre for Health and Related Research, The University of Sheffield, Sheffield, UK
| | - Lana Lai
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK
| | - Sophie Dawson
- Wolfson Adult Cystic Fibrosis Centre, Nottingham University Hospitals NHS Trust, Nottingham, UK
| | - Sarah Cameron
- Adult CF Centre, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Aoife Lynam
- Cystic Fibrosis Unit, Southampton University Hospitals NHS Trust, Southampton, UK
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK
| | - Zhe Hui Hoo
- Adult CF Centre, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Martin J Wildman
- Adult CF Centre, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Sheffield Centre for Health and Related Research, The University of Sheffield, Sheffield, UK
| |
Collapse
|
17
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
Affiliation(s)
- Maximilian T Strauss
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Julia P Schessner
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Florian Meier
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Functional Proteomics, Jena University Hospital, Jena, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
18
|
Ross DH, Bhotika H, Zheng X, Smith RD, Burnum-Johnson KE, Bilbao A. Computational tools and algorithms for ion mobility spectrometry-mass spectrometry. Proteomics 2024:e2200436. [PMID: 38438732 DOI: 10.1002/pmic.202200436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/12/2024] [Accepted: 02/14/2024] [Indexed: 03/06/2024]
Abstract
Ion mobility spectrometry-mass spectrometry (IMS-MS or IM-MS) is a powerful analytical technique that combines the gas-phase separation capabilities of IM with the identification and quantification capabilities of MS. IM-MS can differentiate molecules with indistinguishable masses but different structures (e.g., isomers, isobars, molecular classes, and contaminant ions). The importance of this analytical technique is reflected by a staged increase in the number of applications for molecular characterization across a variety of fields, from different MS-based omics (proteomics, metabolomics, lipidomics, etc.) to the structural characterization of glycans, organic matter, proteins, and macromolecular complexes. With the increasing application of IM-MS there is a pressing need for effective and accessible computational tools. This article presents an overview of the most recent free and open-source software tools specifically tailored for the analysis and interpretation of data derived from IM-MS instrumentation. This review enumerates these tools and outlines their main algorithmic approaches, while highlighting representative applications across different fields. Finally, a discussion of current limitations and expectable improvements is presented.
Collapse
Affiliation(s)
- Dylan H Ross
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Harsh Bhotika
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Xueyun Zheng
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Kristin E Burnum-Johnson
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Aivett Bilbao
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
| |
Collapse
|
19
|
Singhal V, Chou N, Lee J, Yue Y, Liu J, Chock WK, Lin L, Chang YC, Teo EML, Aow J, Lee HK, Chen KH, Prabhakar S. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 2024; 56:431-441. [PMID: 38413725 PMCID: PMC10937399 DOI: 10.1038/s41588-024-01664-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
Spatial omics data are clustered to define both cell types and tissue domains. We present Building Aggregates with a Neighborhood Kernel and Spatial Yardstick (BANKSY), an algorithm that unifies these two spatial clustering problems by embedding cells in a product space of their own and the local neighborhood transcriptome, representing cell state and microenvironment, respectively. BANKSY's spatial feature augmentation strategy improved performance on both tasks when tested on diverse RNA (imaging, sequencing) and protein (imaging) datasets. BANKSY revealed unexpected niche-dependent cell states in the mouse brain and outperformed competing methods on domain segmentation and cell typing benchmarks. BANKSY can also be used for quality control of spatial transcriptomics data and for spatially aware batch effect correction. Importantly, it is substantially faster and more scalable than existing methods, enabling the processing of millions of cell datasets. In summary, BANKSY provides an accurate, biologically motivated, scalable and versatile framework for analyzing spatially resolved omics data.
Collapse
Affiliation(s)
- Vipul Singhal
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Nigel Chou
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Joseph Lee
- Faculty of Science, National University of Singapore, Singapore, Republic of Singapore
| | - Yifei Yue
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, Republic of Singapore
| | - Jinyue Liu
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Wan Kee Chock
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Li Lin
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | | | | | - Jonathan Aow
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Hwee Kuan Lee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- School of Computing, National University of Singapore, Singapore, Republic of Singapore
- Singapore Eye Research Institute, Singapore, Republic of Singapore
- International Research Laboratory on Artificial Intelligence, Singapore, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Republic of Singapore
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research, Singapore, Republic of Singapore
| | - Kok Hao Chen
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
| | - Shyam Prabhakar
- Spatial and Single Cell Systems Domain, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Population and Global Health, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Republic of Singapore.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
20
|
Cetin-Karayumak S, Zhang F, Zurrin R, Billah T, Zekelman L, Makris N, Pieper S, O'Donnell LJ, Rathi Y. Harmonized diffusion MRI data and white matter measures from the Adolescent Brain Cognitive Development Study. Sci Data 2024; 11:249. [PMID: 38413633 PMCID: PMC10899197 DOI: 10.1038/s41597-024-03058-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/12/2024] [Indexed: 02/29/2024] Open
Abstract
The Adolescent Brain Cognitive Development (ABCD) Study® has collected data from over 10,000 children across 21 sites, providing insights into adolescent brain development. However, site-specific scanner variability has made it challenging to use diffusion MRI (dMRI) data from this study. To address this, a dataset of harmonized and processed ABCD dMRI data (from release 3) has been created, comprising quality-controlled imaging data from 9,345 subjects, focusing exclusively on the baseline session, i.e., the first time point of the study. This resource required substantial computational time (approx. 50,000 CPU hours) for harmonization, whole-brain tractography, and white matter parcellation. The dataset includes harmonized dMRI data, 800 white matter clusters, 73 anatomically labeled white matter tracts in full and low resolution, and 804 different dMRI-derived measures per subject (72.3 TB total size). Accessible via the NIMH Data Archive, it offers a large-scale dMRI dataset for studying structural connectivity in child and adolescent neurodevelopment. Additionally, several post-harmonization experiments were conducted to demonstrate the success of the harmonization process on the ABCD dataset.
Collapse
Affiliation(s)
- Suheyla Cetin-Karayumak
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| | - Fan Zhang
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Ryan Zurrin
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Tashrif Billah
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Leo Zekelman
- Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard University, Boston, Massachusetts, USA
| | - Nikos Makris
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Lauren J O'Donnell
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| | - Yogesh Rathi
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
21
|
Xia L, Lee C, Li JJ. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 2024; 15:1753. [PMID: 38409103 PMCID: PMC10897166 DOI: 10.1038/s41467-024-45891-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 02/06/2024] [Indexed: 02/28/2024] Open
Abstract
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
Collapse
Affiliation(s)
- Lucy Xia
- Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Christy Lee
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA.
- Radcliffe Institute of Advanced Study, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
22
|
Wei Z, Zhang L, Gao L, Chen J, Peng L, Yang L. Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data. Sci Data 2024; 11:233. [PMID: 38395911 PMCID: PMC10891105 DOI: 10.1038/s41597-024-03066-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Yunling cattle is a new breed of beef cattle bred in Yunnan Province, China. It is bred by crossing the Brahman, the Murray Grey and the Yunnan Yellow cattle. Yunling cattle can adapt to the tropical and subtropical climate environment, and has good reproductive ability and growth speed under high temperature and high humidity conditions, it also has strong resistance to internal and external parasites and with good beef performance. In this study, we generated a high-quality chromosome-level genome assembly of a male Yunling cattle using a combination of short reads sequencing, PacBio HiFi sequencing and Hi-C scaffolding technologies. The genome assembly(3.09 Gb) is anchored to 31 chromosomes(29 autosomes plus one X and Y), with a contig N50 of 35.97 Mb and a scaffold N50 of 112.01 Mb. It contains 1.62 Gb of repetitive sequences and 20,660 protein-coding genes. This first construction of the Yunling cattle genome provides a valuable genetic resource that will facilitate further study of the genetic diversity of bovine species and accelerate Yunling cattle breeding efforts.
Collapse
Affiliation(s)
- Zaichao Wei
- College of Food Science and Technology, Yunnan Agricultural University, Kunming, China
- College of Big Data, Baoshan University, Baoshan, China
| | - Lilian Zhang
- College of Big Data, Yunnan Agricultural University, Kunming, China
- Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming, China
- Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming, China
| | - Lutao Gao
- College of Big Data, Yunnan Agricultural University, Kunming, China
- Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming, China
- Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming, China
| | - Jian Chen
- College of Big Data, Yunnan Agricultural University, Kunming, China
- Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming, China
- Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming, China
| | - Lin Peng
- College of Big Data, Yunnan Agricultural University, Kunming, China
- Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming, China
- Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming, China
| | - Linnan Yang
- College of Big Data, Yunnan Agricultural University, Kunming, China.
- Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming, China.
- Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming, China.
| |
Collapse
|
23
|
Tang S, Cui X, Wang R, Li S, Li S, Huang X, Chen S. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat Commun 2024; 15:1629. [PMID: 38388573 PMCID: PMC10884038 DOI: 10.1038/s41467-024-46045-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/12/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.
Collapse
Affiliation(s)
- Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xuejian Cui
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Rongxiang Wang
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Xin Huang
- Beijing Key Laboratory for Radiobiology, Department of Radiation Biology, Beijing Institute of Radiation Medicine, 100850, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
24
|
Vargas-Rojas L, Ting TC, Rainey KM, Reynolds M, Wang DR. AgTC and AgETL: open-source tools to enhance data collection and management for plant science research. Front Plant Sci 2024; 15:1265073. [PMID: 38450403 PMCID: PMC10915008 DOI: 10.3389/fpls.2024.1265073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024]
Abstract
Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
Collapse
Affiliation(s)
- Luis Vargas-Rojas
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - To-Chia Ting
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Katherine M. Rainey
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Matthew Reynolds
- Wheat Physiology Group, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Diane R. Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
25
|
Pfeifer E, Rocha EPC. Phage-plasmids promote recombination and emergence of phages and plasmids. Nat Commun 2024; 15:1545. [PMID: 38378896 PMCID: PMC10879196 DOI: 10.1038/s41467-024-45757-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024] Open
Abstract
Phages and plasmids are regarded as distinct types of mobile genetic elements that drive bacterial evolution by horizontal gene transfer. However, the distinction between both types is blurred by the existence of elements known as prophage-plasmids or phage-plasmids, which transfer horizontally between cells as viruses and vertically within cellular lineages as plasmids. Here, we study gene flow between the three types of elements. We show that the gene repertoire of phage-plasmids overlaps with those of phages and plasmids. By tracking recent recombination events, we find that phage-plasmids exchange genes more frequently with plasmids than with phages, and that direct gene exchange between plasmids and phages is less frequent in comparison. The results suggest that phage-plasmids can mediate gene flow between plasmids and phages, including exchange of mobile element core functions, defense systems, and antibiotic resistance. Moreover, a combination of gene transfer and gene inactivation may result in the conversion of elements. For example, gene loss turns P1-like phage-plasmids into integrative prophages or into plasmids (that are no longer phages). Remarkably, some of the latter have acquired conjugation-related functions to became mobilisable by conjugation. Thus, our work indicates that phage-plasmids can play a key role in the transfer of genes across mobile elements within their hosts, and can act as intermediates in the conversion of one type of element into another.
Collapse
Affiliation(s)
- Eugen Pfeifer
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, 75015, Paris, France.
| | - Eduardo P C Rocha
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, 75015, Paris, France.
| |
Collapse
|
26
|
Ovadia D, Segal A, Rabin N. Classification of hand and wrist movements via surface electromyogram using the random convolutional kernels transform. Sci Rep 2024; 14:4134. [PMID: 38374342 PMCID: PMC10876538 DOI: 10.1038/s41598-024-54677-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 02/15/2024] [Indexed: 02/21/2024] Open
Abstract
Prosthetic devices are vital for enhancing personal autonomy and the quality of life for amputees. However, the rejection rate for electric upper-limb prostheses remains high at around 30%, often due to issues like functionality, control, reliability, and cost. Thus, developing reliable, robust, and cost-effective human-machine interfaces is crucial for user acceptance. Machine learning algorithms using Surface Electromyography (sEMG) signal classification hold promise for natural prosthetic control. This study aims to enhance hand and wrist movement classification using sEMG signals, treated as time series data. A novel approach is employed, combining a variation of the Random Convolutional Kernel Transform (ROCKET) for feature extraction with a cross-validation ridge classifier. Traditionally, achieving high accuracy in time series classification required complex, computationally intensive methods. However, recent advances show that simple linear classifiers combined with ROCKET can achieve state-of-the-art accuracy with reduced computational complexity. The algorithm was tested on the UCI sEMG hand movement dataset, as well as on the Ninapro DB5 and DB7 datasets. We demonstrate how the proposed approach delivers high discrimination accuracy with minimal parameter tuning requirements, offering a promising solution to improve prosthetic control and user satisfaction.
Collapse
Affiliation(s)
- Daniel Ovadia
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Alex Segal
- Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel.
| | - Neta Rabin
- Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
27
|
Wang J, Dong L, Zheng Z, Zhu Z, Xie B, Xie Y, Li X, Chen B, Li P. Effects of different KRAS mutants and Ki67 expression on diagnosis and prognosis in lung adenocarcinoma. Sci Rep 2024; 14:4085. [PMID: 38374309 PMCID: PMC10876986 DOI: 10.1038/s41598-023-48307-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 11/24/2023] [Indexed: 02/21/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a prevalent form of non-small cell lung cancer with a rising incidence in recent years. Understanding the mutation characteristics of LUAD is crucial for effective treatment and prediction of this disease. Among the various mutations observed in LUAD, KRAS mutations are particularly common. Different subtypes of KRAS mutations can activate the Ras signaling pathway to varying degrees, potentially influencing the pathogenesis and prognosis of LUAD. This study aims to investigate the relationship between different KRAS mutation subtypes and the pathogenesis and prognosis of LUAD. A total of 63 clinical samples of LUAD were collected for this study. The samples were analyzed using targeted gene sequencing panels to obtain sequencing data. To complement the dataset, additional clinical and sequencing data were obtained from TCGA and MSK. The analysis revealed significantly higher Ki67 immunohistochemical scores in patients with missense mutations compared to controls. Moreover, the expression level of KRAS was found to be significantly correlated with Ki67 expression. Enrichment analysis indicated that KRAS missense mutations activated the SWEET_LUNG_CANCER_KRAS_DN and CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_2 pathways. Additionally, patients with KRAS missense mutations and high Ki67 IHC scores exhibited significantly higher tumor mutational burden levels compared to other groups, which suggests they are more likely to be responsive to ICIs. Based on the data from MSK and TCGA, it was observed that patients with KRAS missense mutations had shorter survival compared to controls, and Ki67 expression level could more accurately predict patient prognosis. In conclusion, when utilizing KRAS mutations as biomarkers for the treatment and prediction of LUAD, it is important to consider the specific KRAS mutant subtypes and Ki67 expression levels. These findings contribute to a better understanding of LUAD and have implications for personalized therapeutic approaches in the management of this disease.
Collapse
Affiliation(s)
- Jun Wang
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Liwen Dong
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Zhaowei Zheng
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Zhen Zhu
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Baisheng Xie
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Yue Xie
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Xiongwei Li
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Bing Chen
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China.
| | - Pan Li
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China.
| |
Collapse
|
28
|
Hayano J, Adachi M, Sasaki F, Yuda E. Quantitative detection of sleep apnea in adults using inertial measurement unit embedded in wristwatch wearable devices. Sci Rep 2024; 14:4050. [PMID: 38374225 PMCID: PMC10876631 DOI: 10.1038/s41598-024-54817-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
Sleep apnea (SA) is associated with risk of cardiovascular disease, cognitive decline, and accidents due to sleepiness, yet the majority (over 80%) of patients remain undiagnosed. Inertial measurement units (IMUs) are built into modern wearable devices and are capable of long-term continuous measurement with low power consumption. We examined if SA can be detected by an IMU embedded in a wristwatch device. In 122 adults who underwent polysomnography (PSG) examinations, triaxial acceleration and triaxial gyro signals from the IMU were recorded during the PSG. Subjects were divided into a training group and a test groups (both n = 61). In the training group, an algorithm was developed to extract signals in the respiratory frequency band (0.13-0.70 Hz) and detect respiratory events as transient (10-90 s) decreases in amplitude. The respiratory event frequency estimated by the algorithm correlated with the apnea-hypopnea index (AHI) of the PSG with r = 0.84 in the test group. With the cutoff values determined in the training group, moderate-to-severe SA (AHI ≥ 15) was identified with 85% accuracy and severe SA (AHI ≥ 30) with 89% accuracy in the test group. SA can be quantitatively detected by the IMU embedded in wristwatch wearable devices in adults with suspected SA.
Collapse
Affiliation(s)
- Junichiro Hayano
- Heart Beat Science Lab, Inc., Sendai, Japan.
- Emeritus Processor, Nagoya City University, Nagoya, Japan.
| | | | | | - Emi Yuda
- Heart Beat Science Lab, Inc., Sendai, Japan
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| |
Collapse
|
29
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
Affiliation(s)
- Wataru Nakamura
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Makoto Hirata
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoyo Oda
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Division of Laboratory Medicine, National Cancer Center Hospital, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Raúl Nicolás Mateos
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahiro Sugawa
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Naoko Iida
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Mineko Ushiama
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Noriko Tanabe
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
| | - Hiromi Sakamoto
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Shigeki Sekine
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Akira Hirasawa
- Department of Clinical Genetics and Genomic Medicine, Okayama University Hospital, Okayama, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Central Biobank, National Center Biobank Network, Tokyo, Japan
| | - Shin-Ichi Tsujimoto
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Norio Shiba
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Shuichi Ito
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Teruhiko Yoshida
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
30
|
Burger T. Fudging the volcano-plot without dredging the data. Nat Commun 2024; 15:1392. [PMID: 38360828 PMCID: PMC10869345 DOI: 10.1038/s41467-024-45834-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/02/2024] [Indexed: 02/17/2024] Open
Affiliation(s)
- Thomas Burger
- Univ. Grenoble Alpes, INSERM, CEA, UA13 BGE, CNRS, CEA, FR2048 ProFI, 38000, Grenoble, France.
| |
Collapse
|
31
|
Zhang J, Ren Y, Lin L, Xing Y, Ren J. Table tennis motion recognition based on the bat trajectory using varying-length-input convolution neural networks. Sci Rep 2024; 14:3549. [PMID: 38347071 PMCID: PMC10861488 DOI: 10.1038/s41598-024-54150-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/08/2024] [Indexed: 02/15/2024] Open
Abstract
Action recognition has been applied in fields such as smart homes, gaming, traffic management, and security monitoring. Motion recognition is helpful for biomechanical analysis, auxiliary training systems, table tennis robots, motion-sensing games, virtual reality and other fields. In our study, we collected data on table tennis skill motion, created the TTMD6 dataset, and analyzed the characteristics of table tennis paddle trajectories. We propose a motion recognition algorithm to recognize paddle trajectories. Other research has used multijoint data to identify actions, while we use only the paddle trajectory to recognize table tennis skill motions, accelerating the speed of motion recognition. Therefore, it is feasible to use paddle trajectories to recognize table tennis skill motions.
Collapse
Affiliation(s)
- Jun Zhang
- School of Exercise and Health, Shanghai University of Sport, Shanghai, 200438, China
- School of Sport Communication and Information Technology, Shandong Sport University, Jinan, 250102, Shandong, China
| | - Yuanshi Ren
- China Table Tennis College, Shanghai University of Sport, Shanghai, 200438, China
| | - Liyue Lin
- School of Psychology, Shanghai University of Sport, Shanghai, 200438, China
| | - Yu Xing
- School of Sport Communication and Information Technology, Shandong Sport University, Jinan, 250102, Shandong, China
| | - Jie Ren
- China Table Tennis College, Shanghai University of Sport, Shanghai, 200438, China.
| |
Collapse
|
32
|
Schmeltz M, Ivanovic A, Schlepütz CM, Wimmer W, Remenschneider AK, Caversaccio M, Stampanoni M, Anschuetz L, Bonnin A. The human middle ear in motion: 3D visualization and quantification using dynamic synchrotron-based X-ray imaging. Commun Biol 2024; 7:157. [PMID: 38326549 PMCID: PMC10850498 DOI: 10.1038/s42003-023-05738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/21/2023] [Indexed: 02/09/2024] Open
Abstract
The characterization of the vibrations of the middle ear ossicles during sound transmission is a focal point in clinical research. However, the small size of the structures, their micrometer-scale movement, and the deep-seated position of the middle ear within the temporal bone make these types of measurements extremely challenging. In this work, dynamic synchrotron-based X-ray phase-contrast microtomography is used on acoustically stimulated intact human ears, allowing for the three-dimensional visualization of entire human eardrums and ossicular chains in motion. A post-gating algorithm is used to temporally resolve the fast micromotions at 128 Hz, coupled with a high-throughput pipeline to process the large tomographic datasets. Seven ex-vivo fresh-frozen human temporal bones in healthy conditions are studied, and the rigid body motions of the ossicles are quantitatively delineated. Clinically relevant regions of the ossicular chain are tracked in 3D, and the amplitudes of their displacement are computed for two acoustic stimuli.
Collapse
Affiliation(s)
- Margaux Schmeltz
- Paul Scherrer Institute, Swiss Light Source, Villigen, Switzerland.
| | - Aleksandra Ivanovic
- Paul Scherrer Institute, Swiss Light Source, Villigen, Switzerland
- Department of Otorhinolaryngology, Head and Neck Surgery, Inselspital, Bern University Hospital, Bern, Switzerland
- Hearing Research Laboratory, ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| | | | - Wilhelm Wimmer
- Department of Otorhinolaryngology, Head and Neck Surgery, Inselspital, Bern University Hospital, Bern, Switzerland
- TUM School of Medicine, Klinikum rechts der Isar, Department of Otorhinolaryngology, Munich, Germany
| | - Aaron K Remenschneider
- Department of Otolaryngology, Head and Neck Surgery, Mass. Eye and Ear, Boston Children Hospital, Harvard Medical School, Boston, MA, USA
| | - Marco Caversaccio
- Department of Otorhinolaryngology, Head and Neck Surgery, Inselspital, Bern University Hospital, Bern, Switzerland
- Hearing Research Laboratory, ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| | - Marco Stampanoni
- Paul Scherrer Institute, Swiss Light Source, Villigen, Switzerland
- Institute for Biomedical Engineering, University and ETH Zürich, Zurich, Switzerland
| | - Lukas Anschuetz
- Department of Otorhinolaryngology, Head and Neck Surgery, Inselspital, Bern University Hospital, Bern, Switzerland
- Hearing Research Laboratory, ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| | - Anne Bonnin
- Paul Scherrer Institute, Swiss Light Source, Villigen, Switzerland
| |
Collapse
|
33
|
Geuenich MJ, Gong DW, Campbell KR. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data. Nat Commun 2024; 15:1014. [PMID: 38307875 PMCID: PMC10837127 DOI: 10.1038/s41467-024-45198-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 01/16/2024] [Indexed: 02/04/2024] Open
Abstract
A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .
Collapse
Affiliation(s)
- Michael J Geuenich
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| | - Dae-Won Gong
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada
| | - Kieran R Campbell
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5S 3G3, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada.
- Ontario Institute of Cancer Research, Toronto, ON, M5G 1M1, Canada.
- Vector Institute, Toronto, ON, M5G 1M1, Canada.
| |
Collapse
|
34
|
Bálint B, Merényi Z, Hegedüs B, Grigoriev IV, Hou Z, Földi C, Nagy LG. ContScout: sensitive detection and removal of contamination from annotated genomes. Nat Commun 2024; 15:936. [PMID: 38296951 PMCID: PMC10831095 DOI: 10.1038/s41467-024-45024-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses.
Collapse
Affiliation(s)
- Balázs Bálint
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Zhihao Hou
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - Csenge Földi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - László G Nagy
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary.
| |
Collapse
|
35
|
Qu H, Liu K, Zhang L. Research on improved black widow algorithm for medical image denoising. Sci Rep 2024; 14:2514. [PMID: 38291147 PMCID: PMC10828493 DOI: 10.1038/s41598-024-51803-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/09/2024] [Indexed: 02/01/2024] Open
Abstract
Improving the quality of medical images is crucial for accurate clinical diagnosis; however, medical images are often disrupted by various types of noise, posing challenges to the reliability and diagnostic accuracy of the images. This study aims to enhance the Black Widow optimization algorithm and apply it to the task of denoising medical images to improve both the quality of medical images and the accuracy of diagnostic results. By introducing Tent mapping, we refined the Black Widow optimization algorithm to better adapt to the complex features of medical images. The algorithm's denoising capabilities for various types of noise were enhanced through the combination of multiple filters, all without the need for training each time to achieve preset goals. Simulation results, based on processing a dataset containing 1588 images with Gaussian, salt-and-pepper, Poisson, and speckle noise, demonstrated a reduction in Mean Squared Error (MSE) by 0.439, an increase in Peak Signal-to-Noise Ratio (PSNR) by 4.315, an improvement in Structural Similarity Index (SSIM) by 0.132, an enhancement in Edge-to-Noise Ratio (ENL) by 0.402, and an increase in Edge Preservation Index (EPI) by 0.614. Simulation experiments verified that the proposed algorithm has a certain advantage in terms of computational efficiency. The improvement, incorporating Tent mapping and a combination of multiple filters, successfully elevated the performance of the Black Widow algorithm in medical image denoising, providing an effective solution for enhancing medical image quality and diagnostic accuracy.
Collapse
Affiliation(s)
- Hepeng Qu
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China
| | - Kun Liu
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China
| | - Lina Zhang
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| |
Collapse
|
36
|
Liu F, Yuan C, Chen H, Yang F. Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings. Sci Rep 2024; 14:2464. [PMID: 38291341 PMCID: PMC10828400 DOI: 10.1038/s41598-024-53028-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/26/2024] [Indexed: 02/01/2024] Open
Abstract
Linear B-cell epitopes (BCEs) play a key role in the development of peptide vaccines and immunodiagnostic reagents. Therefore, the accurate identification of linear BCEs is of great importance in the prevention of infectious diseases and the diagnosis of related diseases. The experimental methods used to identify BCEs are both expensive and time-consuming and they do not meet the demand for identification of large-scale protein sequence data. As a result, there is a need to develop an efficient and accurate computational method to rapidly identify linear BCE sequences. In this work, we developed the new linear BCE prediction method LBCE-BERT. This method is based on peptide chain sequence information and natural language model BERT embedding information, using an XGBoost classifier. The models were trained on three benchmark datasets. The model was training on three benchmark datasets for hyperparameter selection and was subsequently evaluated on several test datasets. The result indicate that our proposed method outperforms others in terms of AUROC and accuracy. The LBCE-BERT model is publicly available at: https://github.com/Lfang111/LBCE-BERT .
Collapse
Affiliation(s)
- Fang Liu
- School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, Anhui, China
| | - ChengCheng Yuan
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230030, Anhui, China
| | - Haoqiang Chen
- School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, Anhui, China
| | - Fei Yang
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230030, Anhui, China.
| |
Collapse
|
37
|
Ennis D, Shmorak S, Jantscher-Krenn E, Yassour M. Longitudinal quantification of Bifidobacterium longum subsp. infantis reveals late colonization in the infant gut independent of maternal milk HMO composition. Nat Commun 2024; 15:894. [PMID: 38291346 PMCID: PMC10827747 DOI: 10.1038/s41467-024-45209-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/15/2024] [Indexed: 02/01/2024] Open
Abstract
Breast milk contains human milk oligosaccharides (HMOs) that cannot be digested by infants, yet nourish their developing gut microbiome. While Bifidobacterium are the best-known utilizers of individual HMOs, a longitudinal study examining the evolving microbial community at high-resolution coupled with mothers' milk HMO composition is lacking. Here, we developed a high-throughput method to quantify Bifidobacterium longum subsp. infantis (BL. infantis), a proficient HMO-utilizer, and applied it to a longitudinal cohort consisting of 21 mother-infant dyads. We observed substantial changes in the infant gut microbiome over the course of several months, while the HMO composition in mothers' milk remained relatively stable. Although Bifidobacterium species significantly influenced sample variation, no specific HMOs correlated with Bifidobacterium species abundance. Surprisingly, we found that BL. infantis colonization began late in the breastfeeding period both in our cohort and in other geographic locations, highlighting the importance of focusing on BL. infantis dynamics in the infant gut.
Collapse
Affiliation(s)
- Dena Ennis
- Microbiology & Molecular Genetics Department, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Shimrit Shmorak
- Microbiology & Molecular Genetics Department, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Moran Yassour
- Microbiology & Molecular Genetics Department, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel.
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
38
|
Babaei Rikan S, Sorayaie Azar A, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Wiil UK. Survival prediction of glioblastoma patients using modern deep learning and machine learning techniques. Sci Rep 2024; 14:2371. [PMID: 38287149 PMCID: PMC10824760 DOI: 10.1038/s41598-024-53006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 01/25/2024] [Indexed: 01/31/2024] Open
Abstract
In this study, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) database to predict the glioblastoma patients' survival outcomes. To assess dataset skewness and detect feature importance, we applied Pearson's second coefficient test of skewness and the Ordinary Least Squares method, respectively. Using two sampling strategies, holdout and five-fold cross-validation, we developed five machine learning (ML) models alongside a feed-forward deep neural network (DNN) for the multiclass classification and regression prediction of glioblastoma patient survival. After balancing the classification and regression datasets, we obtained 46,340 and 28,573 samples, respectively. Shapley additive explanations (SHAP) were then used to explain the decision-making process of the best model. In both classification and regression tasks, as well as across holdout and cross-validation sampling strategies, the DNN consistently outperformed the ML models. Notably, the accuracy were 90.25% and 90.22% for holdout and five-fold cross-validation, respectively, while the corresponding R2 values were 0.6565 and 0.6622. SHAP analysis revealed the importance of age at diagnosis as the most influential feature in the DNN's survival predictions. These findings suggest that the DNN holds promise as a practical auxiliary tool for clinicians, aiding them in optimal decision-making concerning the treatment and care trajectories for glioblastoma patients.
Collapse
Affiliation(s)
| | | | - Amin Naemi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | | | - Habibollah Pirnejad
- Erasmus School of Health Policy and Management (ESHPM), Erasmus University Rotterdam, Rotterdam, The Netherlands.
- Patient Safety Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
39
|
Tyler SR, Lozano-Ojalvo D, Guccione E, Schadt EE. Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq. Nat Commun 2024; 15:699. [PMID: 38267438 PMCID: PMC10808220 DOI: 10.1038/s41467-023-43406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 11/07/2023] [Indexed: 01/26/2024] Open
Abstract
While sub-clustering cell-populations has become popular in single cell-omics, negative controls for this process are lacking. Popular feature-selection/clustering algorithms fail the null-dataset problem, allowing erroneous subdivisions of homogenous clusters until nearly each cell is called its own cluster. Using real and synthetic datasets, we find that anti-correlated gene selection reduces or eliminates erroneous subdivisions, increases marker-gene selection efficacy, and efficiently scales to millions of cells.
Collapse
Affiliation(s)
- Scott R Tyler
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Daniel Lozano-Ojalvo
- Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ernesto Guccione
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
40
|
Sun Z, Zhang L, Wang R, Wang Z, Liang X, Gao J. Identification of shared pathogenetic mechanisms between COVID-19 and IC through bioinformatics and system biology. Sci Rep 2024; 14:2114. [PMID: 38267482 PMCID: PMC10808107 DOI: 10.1038/s41598-024-52625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
COVID-19 increased global mortality in 2019. Cystitis became a contributing factor in SARS-CoV-2 and COVID-19 complications. The complex molecular links between cystitis and COVID-19 are unclear. This study investigates COVID-19-associated cystitis (CAC) molecular mechanisms and drug candidates using bioinformatics and systems biology. Obtain the gene expression profiles of IC (GSE11783) and COVID-19 (GSE147507) from the Gene Expression Omnibus (GEO) database. Identified the common differentially expressed genes (DEGs) in both IC and COVID-19, and extracted a number of key genes from this group. Subsequently, conduct Gene Ontology (GO) functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on the DEGs. Additionally, design a protein-protein interaction (PPI) network, a transcription factor gene regulatory network, a TF miRNA regulatory network, and a gene disease association network using the DEGs. Identify and extract hub genes from the PPI network. Then construct Nomogram diagnostic prediction models based on the hub genes. The DSigDB database was used to forecast many potential molecular medicines that are associated with common DEGs. Assess the precision of hub genes and Nomogram models in diagnosing IC and COVID-19 by employing Receiver Operating Characteristic (ROC) curves. The IC dataset (GSE57560) and the COVID-19 dataset (GSE171110) were selected to validate the models' diagnostic accuracy. A grand total of 198 DEGs that overlapped were found and chosen for further research. FCER1G, ITGAM, LCP2, LILRB2, MNDA, SPI1, and TYROBP were screened as the hub genes. The Nomogram model, built using the seven hub genes, demonstrates significant utility as a diagnostic prediction model for both IC and COVID-19. Multiple potential molecular medicines associated with common DEGs have been discovered. These pathways, hub genes, and models may provide new perspectives for future research into mechanisms and guide personalised and effective therapeutics for IC patients infected with COVID-19.
Collapse
Affiliation(s)
- Zhenpeng Sun
- Department of Urology, Qingdao Municipal Hospital, No.5, Donghai Middle Road, Shinan District, Qingdao, 266001, Shandong, China
- Qingdao Medical College, Qingdao University, Qingdao, China
| | - Li Zhang
- Institute of Systems Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Ruihong Wang
- Department of Outpatient, Qingdao Central Hospital, Qingdao University, Qingdao, China
| | - Zheng Wang
- Zhucheng People's Hospital, Zhucheng, China
| | - Xin Liang
- Department of Urology, Qingdao Municipal Hospital, No.5, Donghai Middle Road, Shinan District, Qingdao, 266001, Shandong, China
| | - Jiangang Gao
- Department of Urology, Qingdao Municipal Hospital, No.5, Donghai Middle Road, Shinan District, Qingdao, 266001, Shandong, China.
| |
Collapse
|
41
|
Aarthy M, Pandiyan GN, Paramasivan R, Kumar A, Gupta B. Identification and prioritisation of potential vaccine candidates using subtractive proteomics and designing of a multi-epitope vaccine against Wuchereria bancrofti. Sci Rep 2024; 14:1970. [PMID: 38263422 PMCID: PMC10806236 DOI: 10.1038/s41598-024-52457-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
This study employed subtractive proteomics and immunoinformatics to analyze the Wuchereria bancrofti proteome and identify potential therapeutic targets, with a focus on designing a vaccine against the parasite species. A comprehensive bioinformatics analysis of the parasite's proteome identified 51 probable therapeutic targets, among which "Kunitz/bovine pancreatic trypsin inhibitor domain-containing protein" was identified as the most promising vaccine candidate. The candidate protein was used to design a multi-epitope vaccine, incorporating B-cell and T-cell epitopes identified through various tools. The vaccine construct underwent extensive analysis of its antigenic, physical, and chemical features, including the determination of secondary and tertiary structures. Docking and molecular dynamics simulations were performed with HLA alleles, Toll-like receptor 4 (TLR4), and TLR3 to assess its potential to elicit the human immune response. Immune simulation analysis confirmed the predicted vaccine's strong binding affinity with immunoglobulins, indicating its potential efficacy in generating an immune response. However, experimental validation and testing of this multi-epitope vaccine construct would be needed to assess its potential against W. bancrofti and even for a broader range of lymphatic filarial infections given the similarities between W. bancrofti and Brugia.
Collapse
Affiliation(s)
- Murali Aarthy
- ICMR-Vector Control Research Centre (VCRC), Field Station, Madurai, Tamil Nadu, 625002, India
| | - G Navaneetha Pandiyan
- ICMR-Vector Control Research Centre (VCRC), Field Station, Madurai, Tamil Nadu, 625002, India
| | - R Paramasivan
- ICMR-Vector Control Research Centre (VCRC), Field Station, Madurai, Tamil Nadu, 625002, India
| | - Ashwani Kumar
- ICMR-Vector Control Research Centre (VCRC), Puducherry, India
- Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Tandhalam, Chennai, Tamil Nadu, 602105, India
| | - Bhavna Gupta
- ICMR-Vector Control Research Centre (VCRC), Field Station, Madurai, Tamil Nadu, 625002, India.
| |
Collapse
|
42
|
Leung YY, Naj AC, Chou YF, Valladares O, Schmidt M, Hamilton-Nelson K, Wheeler N, Lin H, Gangadharan P, Qu L, Clark K, Kuzma AB, Lee WP, Cantwell L, Nicaretta H, Haines J, Farrer L, Seshadri S, Brkanac Z, Cruchaga C, Pericak-Vance M, Mayeux RP, Bush WS, Destefano A, Martin E, Schellenberg GD, Wang LS. Human whole-exome genotype data for Alzheimer's disease. Nat Commun 2024; 15:684. [PMID: 38263370 PMCID: PMC10805795 DOI: 10.1038/s41467-024-44781-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 01/02/2024] [Indexed: 01/25/2024] Open
Abstract
The heterogeneity of the whole-exome sequencing (WES) data generation methods present a challenge to a joint analysis. Here we present a bioinformatics strategy for joint-calling 20,504 WES samples collected across nine studies and sequenced using ten capture kits in fourteen sequencing centers in the Alzheimer's Disease Sequencing Project. The joint-genotype called variant-called format (VCF) file contains only positions within the union of capture kits. The VCF was then processed specifically to account for the batch effects arising from the use of different capture kits from different studies. We identified 8.2 million autosomal variants. 96.82% of the variants are high-quality, and are located in 28,579 Ensembl transcripts. 41% of the variants are intronic and 1.8% of the variants are with CADD > 30, indicating they are of high predicted pathogenicity. Here we show our new strategy can generate high-quality data from processing these diversely generated WES samples. The improved ability to combine data sequenced in different batches benefits the whole genomics research community.
Collapse
Affiliation(s)
- Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Adam C Naj
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi-Fan Chou
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Otto Valladares
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael Schmidt
- Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
- The John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Kara Hamilton-Nelson
- Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
- The John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Nicholas Wheeler
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Honghuang Lin
- Department of Medicine, UMass Chan Medical School, Boston, MA, USA
| | - Prabhakaran Gangadharan
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Liming Qu
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kaylyn Clark
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amanda B Kuzma
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Laura Cantwell
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jonathan Haines
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Lindsay Farrer
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Sudha Seshadri
- Boston University School of Medicine, Boston, MA, USA
- The Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center, San Antonio, TX, USA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Carlos Cruchaga
- Washington University School of Medicine, St. Louis, MO, USA
| | - Margaret Pericak-Vance
- Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
- The John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Richard P Mayeux
- Department of Neurology, Taub Institute for Research on Alzheimer's Disease and the Aging Brain and the Gertrude H. Sergievsky Center, Columbia University and the New York Presbyterian Hospital, New York, NY, USA
| | - William S Bush
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Anita Destefano
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Eden Martin
- Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
- The John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA
| | - Gerard D Schellenberg
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
43
|
Ashraf H, Waris A, Gilani SO, Shafiq U, Iqbal J, Kamavuako EN, Berrouche Y, Brüls O, Boutaayamou M, Niazi IK. Optimizing the performance of convolutional neural network for enhanced gesture recognition using sEMG. Sci Rep 2024; 14:2020. [PMID: 38263441 PMCID: PMC10805798 DOI: 10.1038/s41598-024-52405-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
Deep neural networks (DNNs) have demonstrated higher performance results when compared to traditional approaches for implementing robust myoelectric control (MEC) systems. However, the delay induced by optimising a MEC remains a concern for real-time applications. As a result, an optimised DNN architecture based on fine-tuned hyperparameters is required. This study investigates the optimal configuration of convolutional neural network (CNN)-based MEC by proposing an effective data segmentation technique and a generalised set of hyperparameters. Firstly, two segmentation strategies (disjoint and overlap) and various segment and overlap sizes were studied to optimise segmentation parameters. Secondly, to address the challenge of optimising the hyperparameters of a DNN-based MEC system, the problem has been abstracted as an optimisation problem, and Bayesian optimisation has been used to solve it. From 20 healthy people, ten surface electromyography (sEMG) grasping movements abstracted from daily life were chosen as the target gesture set. With an ideal segment size of 200 ms and an overlap size of 80%, the results show that the overlap segmentation technique outperforms the disjoint segmentation technique (p-value < 0.05). In comparison to manual (12.76 ± 4.66), grid (0.10 ± 0.03), and random (0.12 ± 0.05) search hyperparameters optimisation strategies, the proposed optimisation technique resulted in a mean classification error rate (CER) of 0.08 ± 0.03 across all subjects. In addition, a generalised CNN architecture with an optimal set of hyperparameters is proposed. When tested separately on all individuals, the single generalised CNN architecture produced an overall CER of 0.09 ± 0.03. This study's significance lies in its contribution to the field of EMG signal processing by demonstrating the superiority of the overlap segmentation technique, optimizing CNN hyperparameters through Bayesian optimization, and offering practical insights for improving prosthetic control and human-computer interfaces.
Collapse
Affiliation(s)
- Hassan Ashraf
- Laboratory of Movement Analysis (LAM-Motion Lab), University of Liège, Liège, Belgium
| | - Asim Waris
- Department of Biomedical Engineering and Sciences, School of Mechanical and Manufacturing Engineering (SMME), National University of Science and Technology (NUST), Islamabad, 44000, Pakistan.
| | - Syed Omer Gilani
- Department of Electrical, Computer and Biomedical Engineering, Faculty of Engineering, Abu Dhabi University, Abu Dhabi, United Arab Emirates
| | - Uzma Shafiq
- Department of Biomedical Engineering and Sciences, School of Mechanical and Manufacturing Engineering (SMME), National University of Science and Technology (NUST), Islamabad, 44000, Pakistan
| | - Javaid Iqbal
- Department of Biomedical Engineering and Sciences, School of Mechanical and Manufacturing Engineering (SMME), National University of Science and Technology (NUST), Islamabad, 44000, Pakistan
| | | | - Yaakoub Berrouche
- LIS Laboratory, Department of Electronics, Faculty of Technology, Ferhat Abbas University Setif 1, Setif, Algeria
| | - Olivier Brüls
- Laboratory of Movement Analysis (LAM-Motion Lab), University of Liège, Liège, Belgium
| | - Mohamed Boutaayamou
- Laboratory of Movement Analysis (LAM-Motion Lab), University of Liège, Liège, Belgium
| | | |
Collapse
|
44
|
Sabary O, Yucovich A, Shapira G, Yaakobi E. Reconstruction algorithms for DNA-storage systems. Sci Rep 2024; 14:1951. [PMID: 38263421 PMCID: PMC10806084 DOI: 10.1038/s41598-024-51730-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
Motivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm's estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel.
| | - Alexander Yucovich
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Guy Shapira
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, 3200003, Haifa, Israel
| |
Collapse
|
45
|
Wang H, Gao C, Dantona C, Hull B, Sun J. DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med 2024; 7:16. [PMID: 38253711 PMCID: PMC10803802 DOI: 10.1038/s41746-023-00989-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA 's performance correlates with increased model parameters and input context lengths.
Collapse
Affiliation(s)
- Hanyin Wang
- Division of Hospital Internal Medicine, Mayo Clinic Health System, Mankato, MN, USA
| | - Chufan Gao
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Christopher Dantona
- Enterprise Inpatient Clinical Documentation Integrity, Mayo Clinic, Rochester, MN, USA
| | - Bryan Hull
- Division of Hospital Internal Medicine, Mayo Clinic, Phoenix, AZ, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL, USA.
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Champaign, IL, USA.
| |
Collapse
|
46
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects. Sci Rep 2024; 14:1668. [PMID: 38238448 PMCID: PMC10796434 DOI: 10.1038/s41598-024-51940-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/11/2024] [Indexed: 01/22/2024] Open
Abstract
Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that ML models trained on the augmented data consistently achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.
Collapse
Affiliation(s)
- Mengmeng Liu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Gopal Srivastava
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - J Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA.
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA.
| |
Collapse
|
47
|
Ghosh T, Han Y, Raju V, Hossain D, McCrory MA, Higgins J, Boushey C, Delp EJ, Sazonov E. Integrated image and sensor-based food intake detection in free-living. Sci Rep 2024; 14:1665. [PMID: 38238423 PMCID: PMC10796396 DOI: 10.1038/s41598-024-51687-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 01/08/2024] [Indexed: 01/22/2024] Open
Abstract
The first step in any dietary monitoring system is the automatic detection of eating episodes. To detect eating episodes, either sensor data or images can be used, and either method can result in false-positive detection. This study aims to reduce the number of false positives in the detection of eating episodes by a wearable sensor, Automatic Ingestion Monitor v2 (AIM-2). Thirty participants wore the AIM-2 for two days each (pseudo-free-living and free-living). The eating episodes were detected by three methods: (1) recognition of solid foods and beverages in images captured by AIM-2; (2) recognition of chewing from the AIM-2 accelerometer sensor; and (3) hierarchical classification to combine confidence scores from image and accelerometer classifiers. The integration of image- and sensor-based methods achieved 94.59% sensitivity, 70.47% precision, and 80.77% F1-score in the free-living environment, which is significantly better than either of the original methods (8% higher sensitivity). The proposed method successfully reduces the number of false positives in the detection of eating episodes.
Collapse
Affiliation(s)
- Tonmoy Ghosh
- Electrical and Computer Engineering Department, University of Alabama, Tuscaloosa, AL, 35401, USA.
| | - Yue Han
- Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN, 47907, USA
| | - Viprav Raju
- Electrical and Computer Engineering Department, University of Alabama, Tuscaloosa, AL, 35401, USA
| | - Delwar Hossain
- Electrical and Computer Engineering Department, University of Alabama, Tuscaloosa, AL, 35401, USA
| | - Megan A McCrory
- Department of Health Sciences, Boston University, Boston, MA, 02215, USA
| | - Janine Higgins
- Department of Pediatrics-Endocrinology, University of Colorado, Denver, CO, 80045, USA
| | - Carol Boushey
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA
| | - Edward J Delp
- Electrical and Computer Engineering Department, Purdue University, West Lafayette, IN, 47907, USA
| | - Edward Sazonov
- Electrical and Computer Engineering Department, University of Alabama, Tuscaloosa, AL, 35401, USA
| |
Collapse
|
48
|
Ali L, Javeed A, Noor A, Rauf HT, Kadry S, Gandomi AH. Parkinson's disease detection based on features refinement through L1 regularized SVM and deep neural network. Sci Rep 2024; 14:1333. [PMID: 38228772 PMCID: PMC10791701 DOI: 10.1038/s41598-024-51600-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 01/07/2024] [Indexed: 01/18/2024] Open
Abstract
In previous studies, replicated and multiple types of speech data have been used for Parkinson's disease (PD) detection. However, two main problems in these studies are lower PD detection accuracy and inappropriate validation methodologies leading to unreliable results. This study discusses the effects of inappropriate validation methodologies used in previous studies and highlights the use of appropriate alternative validation methods that would ensure generalization. To enhance PD detection accuracy, we propose a two-stage diagnostic system that refines the extracted set of features through [Formula: see text] regularized linear support vector machine and classifies the refined subset of features through a deep neural network. To rigorously evaluate the effectiveness of the proposed diagnostic system, experiments are performed on two different voice recording-based benchmark datasets. For both datasets, the proposed diagnostic system achieves 100% accuracy under leave-one-subject-out (LOSO) cross-validation (CV) and 97.5% accuracy under k-fold CV. The results show that the proposed system outperforms the existing methods regarding PD detection accuracy. The results suggest that the proposed diagnostic system is essential to improving non-invasive diagnostic decision support in PD.
Collapse
Affiliation(s)
- Liaqat Ali
- Department of Electrical Engineering, University of Science and Technology Bannu, Bannu, Pakistan
| | - Ashir Javeed
- Aging Research Center, Karolinska Institutet, Solna, Sweden
| | - Adeeb Noor
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, 80221, Jeddah, Saudi Arabia
| | - Hafiz Tayyab Rauf
- Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent, ST4 2DE, UK
| | - Seifedine Kadry
- Department of Applied Data Science, Noroff University College, Kristiansand, Norway
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, 346, United Arab Emirates
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary.
| |
Collapse
|
49
|
Kini AS, Prema KV, Pai SN. Early stage black pepper leaf disease prediction based on transfer learning using ConvNets. Sci Rep 2024; 14:1404. [PMID: 38228767 PMCID: PMC10791634 DOI: 10.1038/s41598-024-51884-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 01/10/2024] [Indexed: 01/18/2024] Open
Abstract
Plants get exposed to diseases, insects and fungus. This causes heavy damages to crop resulting in various leaves diseases. Leaf diseases can be diagnosed at an early stage with the aid of a smart computer vision system and timely disease prevention can be targeted. Black pepper is a medicinal plant that is extensively used in Ayurvedic medicine because of its therapeutic properties. The proposed work represents an intelligent transfer learning technique through state-of-the-art deep learning implementation using convolutional neural network to predict the presence of prominent diseases in black pepper leaves. The ImageNet dataset available online is used for training deep neural network. Later, this trained network is utilized for the prediction of the newly developed black pepper leaf image dataset. The developed data set consist of real time leaf images, which are candidly taken from the fields and annotated under supervision of an expert. The leaf diseases considered are anthracnose, slow wilt, early stage phytophthora, phytophthora and yellowing. The hyperparameters chosen for tuning in to deep learning models are initial learning rates, optimization algorithm, image batches, epochs, validation and training data, etc. The accuracy obtained with 0.001 learning rate ranges from 99.1 to 99.7% for the Inception V3, GoogleNet, SqueezeNet and Resnet18 models. Proposed Resnet18 model outperforms all model with 99.67% accuracy. The resulting validation accuracy obtained using these models is high and the validation loss is low. This work represents improvement in agriculture and a cutting edge deep neural network method for early stage leaf disease identification and prediction. This is an approach using a deep learning network to predict early stage black pepper leaf diseases.
Collapse
Affiliation(s)
- Anita S Kini
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, India
| | - K V Prema
- Department of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education (MAHE), Manipal, India.
| | - Smitha N Pai
- Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, India
| |
Collapse
|
50
|
Meisburger SP, Ando N. Scaling and merging macromolecular diffuse scattering with mdx2. bioRxiv 2024:2024.01.16.575887. [PMID: 38293202 PMCID: PMC10827198 DOI: 10.1101/2024.01.16.575887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Diffuse scattering is a promising method to gain additional insight into protein dynamics from macromolecular crystallography (MX) experiments. Bragg intensities yield the average electron density, while the diffuse scattering can be processed to obtain a three-dimensional reciprocal space map, that is further analyzed to determine correlated motion. To make diffuse scattering techniques more accessible, we have created software for data processing called mdx2 that is both convenient to use and simple to extend and modify. Mdx2 is written in Python, and it interfaces with DIALS to implement self-contained data reduction workflows. Data are stored in NeXus format for software interchange and convenient visualization. Mdx2 can be run on the command line or imported as a package, for instance to encapsulate a complete workflow in a Jupyter notebook for reproducible computing and education. Here, we describe mdx2 version 1.0, a new release incorporating state-of-the-art techniques for data reduction. We describe the implementation of a complete multi-crystal scaling and merging workflow, and test the methods using a high-redundancy dataset from cubic insulin. We show that redundancy can be leveraged during scaling to correct systematic errors, and obtain accurate and reproducible measurements of weak diffuse signals.
Collapse
Affiliation(s)
- Steve P. Meisburger
- Cornell High Energy Synchrotron Source, Cornell University, Ithaca, New York 14850, USA
| | - Nozomi Ando
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14850, USA
| |
Collapse
|