26
|
Emmert-Streib F. Can ChatGPT understand genetics? Eur J Hum Genet 2024; 32:371-372. [PMID: 37407734 PMCID: PMC10999414 DOI: 10.1038/s41431-023-01419-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 07/07/2023] Open
|
27
|
Caniza H, Cáceres JJ, Torres M, Paccanaro A. LanDis: the disease landscape explorer. Eur J Hum Genet 2024; 32:461-465. [PMID: 38200084 PMCID: PMC10999415 DOI: 10.1038/s41431-023-01511-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/01/2023] [Accepted: 11/23/2023] [Indexed: 01/12/2024] Open
Abstract
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
Collapse
|
28
|
Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024; 14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
This study focused on the heterogeneity in progress notes written by physicians or nurses. A total of 806 days of progress notes written by physicians or nurses from 83 randomly selected patients hospitalized in the Gastroenterology Department at Kagawa University Hospital from January to December 2021 were analyzed. We extracted symptoms as the International Classification of Diseases (ICD) Chapter 18 (R00-R99, hereinafter R codes) from each progress note using MedNER-J natural language processing software and counted the days one or more symptoms were extracted to calculate the extraction rate. The R-code extraction rate was significantly higher from progress notes by nurses than by physicians (physicians 68.5% vs. nurses 75.2%; p = 0.00112), regardless of specialty. By contrast, the R-code subcategory R10-R19 for digestive system symptoms (44.2 vs. 37.5%, respectively; p = 0.00299) and many chapters of ICD codes for disease names, as represented by Chapter 11 K00-K93 (68.4 vs. 30.9%, respectively; p < 0.001), were frequently extracted from the progress notes by physicians, reflecting their specialty. We believe that understanding the information heterogeneity of medical documents, which can be the basis of medical artificial intelligence, is crucial, and this study is a pioneering step in that direction.
Collapse
|
29
|
Chowdhury D, Mistry A, Maity D, Bhatia R, Priyadarshi S, Wadan S, Chakraborty S, Haldar S. Pan-cancer analyses suggest kindlin-associated global mechanochemical alterations. Commun Biol 2024; 7:372. [PMID: 38548811 PMCID: PMC10978987 DOI: 10.1038/s42003-024-06044-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 03/11/2024] [Indexed: 04/01/2024] Open
Abstract
Kindlins serve as mechanosensitive adapters, transducing extracellular mechanical cues to intracellular biochemical signals and thus, their perturbations potentially lead to cancer progressions. Despite the kindlin involvement in tumor development, understanding their genetic and mechanochemical characteristics across different cancers remains elusive. Here, we thoroughly examined genetic alterations in kindlins across more than 10,000 patients with 33 cancer types. Our findings reveal cancer-specific alterations, particularly prevalent in advanced tumor stage and during metastatic onset. We observed a significant co-alteration between kindlins and mechanochemical proteome in various tumors through the activation of cancer-related pathways and adverse survival outcomes. Leveraging normal mode analysis, we predicted structural consequences of cancer-specific kindlin mutations, highlighting potential impacts on stability and downstream signaling pathways. Our study unraveled alterations in epithelial-mesenchymal transition markers associated with kindlin activity. This comprehensive analysis provides a resource for guiding future mechanistic investigations and therapeutic strategies targeting the roles of kindlins in cancer treatment.
Collapse
|
30
|
Li Z, You L, Hermann A, Bier E. Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation classifier. Nat Commun 2024; 15:2629. [PMID: 38521791 PMCID: PMC10960810 DOI: 10.1038/s41467-024-46479-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/27/2024] [Indexed: 03/25/2024] Open
Abstract
DNA double-strand breaks (DSBs) are repaired by a hierarchically regulated network of pathways. Factors influencing the choice of particular repair pathways, however remain poorly characterized. Here we develop an Integrated Classification Pipeline (ICP) to decompose and categorize CRISPR/Cas9 generated mutations on genomic target sites in complex multicellular insects. The ICP outputs graphic rank ordered classifications of mutant alleles to visualize discriminating DSB repair fingerprints generated from different target sites and alternative inheritance patterns of CRISPR components. We uncover highly reproducible lineage-specific mutation fingerprints in individual organisms and a developmental progression wherein Microhomology-Mediated End-Joining (MMEJ) or Insertion events predominate during early rapid mitotic cell cycles, switching to distinct subsets of Non-Homologous End-Joining (NHEJ) alleles, and then to Homology-Directed Repair (HDR)-based gene conversion. These repair signatures enable marker-free tracking of specific mutations in dynamic populations, including NHEJ and HDR events within the same samples, for in-depth analysis of diverse gene editing events.
Collapse
|
31
|
Sandler RD, Lai L, Dawson S, Cameron S, Lynam A, Sperrin M, Hoo ZH, Wildman MJ. Development of data processing algorithm to calculate adherence for adults with cystic fibrosis using inhaled therapy - a multi-center observational study within the CFHealthHub learning health system. Expert Rev Pharmacoecon Outcomes Res 2024:1-13. [PMID: 38458615 DOI: 10.1080/14737167.2024.2328085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 02/28/2024] [Indexed: 03/10/2024]
Abstract
OBJECTIVES To develop a robust algorithm to accurately calculate 'daily complete dose counts' for inhaled medicines, used in percent adherence calculations, from electronically-captured nebulizer data within the CFHealthHub Learning Health System. METHODS A multi-center, cross-sectional study involved participants and clinicians reviewing real-world inhaled medicine usage records and triangulating them with objective nebulizer data to establish a consensus on 'daily complete dose counts.' An algorithm, which used only objective nebulizer data, was then developed using a derivation dataset and evaluated using internal validation dataset. The agreement and accuracy between the algorithm-derived and consensus-derived 'daily complete dose counts' was examined, with the consensus-derived count as the reference standard. RESULTS Twelve people with CF participated. The algorithm derived a 'daily complete dose count' by screening out 'invalid' doses (those <60s in duration or run in cleaning mode), combining all doses starting within 120s of each other, and then screening out all doses with duration < 480s which were interrupted by power supply failure. The kappa co-efficient was 0.85 (0.71-0.91) in the derivation and 0.86 (0.77-0.94) in the validation dataset. CONCLUSIONS The algorithm demonstrated strong agreement with the participant-clinician consensus, enhancing confidence in CFHealthHub data. Publishingdata processing methods can encourage trust in digital endpoints and serve as an exemplar for other projects.
Collapse
|
32
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
|
33
|
Ross DH, Bhotika H, Zheng X, Smith RD, Burnum-Johnson KE, Bilbao A. Computational tools and algorithms for ion mobility spectrometry-mass spectrometry. Proteomics 2024:e2200436. [PMID: 38438732 DOI: 10.1002/pmic.202200436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/12/2024] [Accepted: 02/14/2024] [Indexed: 03/06/2024]
Abstract
Ion mobility spectrometry-mass spectrometry (IMS-MS or IM-MS) is a powerful analytical technique that combines the gas-phase separation capabilities of IM with the identification and quantification capabilities of MS. IM-MS can differentiate molecules with indistinguishable masses but different structures (e.g., isomers, isobars, molecular classes, and contaminant ions). The importance of this analytical technique is reflected by a staged increase in the number of applications for molecular characterization across a variety of fields, from different MS-based omics (proteomics, metabolomics, lipidomics, etc.) to the structural characterization of glycans, organic matter, proteins, and macromolecular complexes. With the increasing application of IM-MS there is a pressing need for effective and accessible computational tools. This article presents an overview of the most recent free and open-source software tools specifically tailored for the analysis and interpretation of data derived from IM-MS instrumentation. This review enumerates these tools and outlines their main algorithmic approaches, while highlighting representative applications across different fields. Finally, a discussion of current limitations and expectable improvements is presented.
Collapse
|
34
|
Singhal V, Chou N, Lee J, Yue Y, Liu J, Chock WK, Lin L, Chang YC, Teo EML, Aow J, Lee HK, Chen KH, Prabhakar S. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat Genet 2024; 56:431-441. [PMID: 38413725 PMCID: PMC10937399 DOI: 10.1038/s41588-024-01664-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 01/16/2024] [Indexed: 02/29/2024]
Abstract
Spatial omics data are clustered to define both cell types and tissue domains. We present Building Aggregates with a Neighborhood Kernel and Spatial Yardstick (BANKSY), an algorithm that unifies these two spatial clustering problems by embedding cells in a product space of their own and the local neighborhood transcriptome, representing cell state and microenvironment, respectively. BANKSY's spatial feature augmentation strategy improved performance on both tasks when tested on diverse RNA (imaging, sequencing) and protein (imaging) datasets. BANKSY revealed unexpected niche-dependent cell states in the mouse brain and outperformed competing methods on domain segmentation and cell typing benchmarks. BANKSY can also be used for quality control of spatial transcriptomics data and for spatially aware batch effect correction. Importantly, it is substantially faster and more scalable than existing methods, enabling the processing of millions of cell datasets. In summary, BANKSY provides an accurate, biologically motivated, scalable and versatile framework for analyzing spatially resolved omics data.
Collapse
|
35
|
Cetin-Karayumak S, Zhang F, Zurrin R, Billah T, Zekelman L, Makris N, Pieper S, O'Donnell LJ, Rathi Y. Harmonized diffusion MRI data and white matter measures from the Adolescent Brain Cognitive Development Study. Sci Data 2024; 11:249. [PMID: 38413633 PMCID: PMC10899197 DOI: 10.1038/s41597-024-03058-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/12/2024] [Indexed: 02/29/2024] Open
Abstract
The Adolescent Brain Cognitive Development (ABCD) Study® has collected data from over 10,000 children across 21 sites, providing insights into adolescent brain development. However, site-specific scanner variability has made it challenging to use diffusion MRI (dMRI) data from this study. To address this, a dataset of harmonized and processed ABCD dMRI data (from release 3) has been created, comprising quality-controlled imaging data from 9,345 subjects, focusing exclusively on the baseline session, i.e., the first time point of the study. This resource required substantial computational time (approx. 50,000 CPU hours) for harmonization, whole-brain tractography, and white matter parcellation. The dataset includes harmonized dMRI data, 800 white matter clusters, 73 anatomically labeled white matter tracts in full and low resolution, and 804 different dMRI-derived measures per subject (72.3 TB total size). Accessible via the NIMH Data Archive, it offers a large-scale dMRI dataset for studying structural connectivity in child and adolescent neurodevelopment. Additionally, several post-harmonization experiments were conducted to demonstrate the success of the harmonization process on the ABCD dataset.
Collapse
|
36
|
Xia L, Lee C, Li JJ. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 2024; 15:1753. [PMID: 38409103 PMCID: PMC10897166 DOI: 10.1038/s41467-024-45891-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 02/06/2024] [Indexed: 02/28/2024] Open
Abstract
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
Collapse
|
37
|
Wei Z, Zhang L, Gao L, Chen J, Peng L, Yang L. Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data. Sci Data 2024; 11:233. [PMID: 38395911 PMCID: PMC10891105 DOI: 10.1038/s41597-024-03066-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Yunling cattle is a new breed of beef cattle bred in Yunnan Province, China. It is bred by crossing the Brahman, the Murray Grey and the Yunnan Yellow cattle. Yunling cattle can adapt to the tropical and subtropical climate environment, and has good reproductive ability and growth speed under high temperature and high humidity conditions, it also has strong resistance to internal and external parasites and with good beef performance. In this study, we generated a high-quality chromosome-level genome assembly of a male Yunling cattle using a combination of short reads sequencing, PacBio HiFi sequencing and Hi-C scaffolding technologies. The genome assembly(3.09 Gb) is anchored to 31 chromosomes(29 autosomes plus one X and Y), with a contig N50 of 35.97 Mb and a scaffold N50 of 112.01 Mb. It contains 1.62 Gb of repetitive sequences and 20,660 protein-coding genes. This first construction of the Yunling cattle genome provides a valuable genetic resource that will facilitate further study of the genetic diversity of bovine species and accelerate Yunling cattle breeding efforts.
Collapse
|
38
|
Tang S, Cui X, Wang R, Li S, Li S, Huang X, Chen S. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat Commun 2024; 15:1629. [PMID: 38388573 PMCID: PMC10884038 DOI: 10.1038/s41467-024-46045-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/12/2024] [Indexed: 02/24/2024] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.
Collapse
|
39
|
Vargas-Rojas L, Ting TC, Rainey KM, Reynolds M, Wang DR. AgTC and AgETL: open-source tools to enhance data collection and management for plant science research. FRONTIERS IN PLANT SCIENCE 2024; 15:1265073. [PMID: 38450403 PMCID: PMC10915008 DOI: 10.3389/fpls.2024.1265073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024]
Abstract
Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
Collapse
|
40
|
Pfeifer E, Rocha EPC. Phage-plasmids promote recombination and emergence of phages and plasmids. Nat Commun 2024; 15:1545. [PMID: 38378896 PMCID: PMC10879196 DOI: 10.1038/s41467-024-45757-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024] Open
Abstract
Phages and plasmids are regarded as distinct types of mobile genetic elements that drive bacterial evolution by horizontal gene transfer. However, the distinction between both types is blurred by the existence of elements known as prophage-plasmids or phage-plasmids, which transfer horizontally between cells as viruses and vertically within cellular lineages as plasmids. Here, we study gene flow between the three types of elements. We show that the gene repertoire of phage-plasmids overlaps with those of phages and plasmids. By tracking recent recombination events, we find that phage-plasmids exchange genes more frequently with plasmids than with phages, and that direct gene exchange between plasmids and phages is less frequent in comparison. The results suggest that phage-plasmids can mediate gene flow between plasmids and phages, including exchange of mobile element core functions, defense systems, and antibiotic resistance. Moreover, a combination of gene transfer and gene inactivation may result in the conversion of elements. For example, gene loss turns P1-like phage-plasmids into integrative prophages or into plasmids (that are no longer phages). Remarkably, some of the latter have acquired conjugation-related functions to became mobilisable by conjugation. Thus, our work indicates that phage-plasmids can play a key role in the transfer of genes across mobile elements within their hosts, and can act as intermediates in the conversion of one type of element into another.
Collapse
|
41
|
Ovadia D, Segal A, Rabin N. Classification of hand and wrist movements via surface electromyogram using the random convolutional kernels transform. Sci Rep 2024; 14:4134. [PMID: 38374342 PMCID: PMC10876538 DOI: 10.1038/s41598-024-54677-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 02/15/2024] [Indexed: 02/21/2024] Open
Abstract
Prosthetic devices are vital for enhancing personal autonomy and the quality of life for amputees. However, the rejection rate for electric upper-limb prostheses remains high at around 30%, often due to issues like functionality, control, reliability, and cost. Thus, developing reliable, robust, and cost-effective human-machine interfaces is crucial for user acceptance. Machine learning algorithms using Surface Electromyography (sEMG) signal classification hold promise for natural prosthetic control. This study aims to enhance hand and wrist movement classification using sEMG signals, treated as time series data. A novel approach is employed, combining a variation of the Random Convolutional Kernel Transform (ROCKET) for feature extraction with a cross-validation ridge classifier. Traditionally, achieving high accuracy in time series classification required complex, computationally intensive methods. However, recent advances show that simple linear classifiers combined with ROCKET can achieve state-of-the-art accuracy with reduced computational complexity. The algorithm was tested on the UCI sEMG hand movement dataset, as well as on the Ninapro DB5 and DB7 datasets. We demonstrate how the proposed approach delivers high discrimination accuracy with minimal parameter tuning requirements, offering a promising solution to improve prosthetic control and user satisfaction.
Collapse
|
42
|
Wang J, Dong L, Zheng Z, Zhu Z, Xie B, Xie Y, Li X, Chen B, Li P. Effects of different KRAS mutants and Ki67 expression on diagnosis and prognosis in lung adenocarcinoma. Sci Rep 2024; 14:4085. [PMID: 38374309 PMCID: PMC10876986 DOI: 10.1038/s41598-023-48307-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 11/24/2023] [Indexed: 02/21/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a prevalent form of non-small cell lung cancer with a rising incidence in recent years. Understanding the mutation characteristics of LUAD is crucial for effective treatment and prediction of this disease. Among the various mutations observed in LUAD, KRAS mutations are particularly common. Different subtypes of KRAS mutations can activate the Ras signaling pathway to varying degrees, potentially influencing the pathogenesis and prognosis of LUAD. This study aims to investigate the relationship between different KRAS mutation subtypes and the pathogenesis and prognosis of LUAD. A total of 63 clinical samples of LUAD were collected for this study. The samples were analyzed using targeted gene sequencing panels to obtain sequencing data. To complement the dataset, additional clinical and sequencing data were obtained from TCGA and MSK. The analysis revealed significantly higher Ki67 immunohistochemical scores in patients with missense mutations compared to controls. Moreover, the expression level of KRAS was found to be significantly correlated with Ki67 expression. Enrichment analysis indicated that KRAS missense mutations activated the SWEET_LUNG_CANCER_KRAS_DN and CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_2 pathways. Additionally, patients with KRAS missense mutations and high Ki67 IHC scores exhibited significantly higher tumor mutational burden levels compared to other groups, which suggests they are more likely to be responsive to ICIs. Based on the data from MSK and TCGA, it was observed that patients with KRAS missense mutations had shorter survival compared to controls, and Ki67 expression level could more accurately predict patient prognosis. In conclusion, when utilizing KRAS mutations as biomarkers for the treatment and prediction of LUAD, it is important to consider the specific KRAS mutant subtypes and Ki67 expression levels. These findings contribute to a better understanding of LUAD and have implications for personalized therapeutic approaches in the management of this disease.
Collapse
|
43
|
Hayano J, Adachi M, Sasaki F, Yuda E. Quantitative detection of sleep apnea in adults using inertial measurement unit embedded in wristwatch wearable devices. Sci Rep 2024; 14:4050. [PMID: 38374225 PMCID: PMC10876631 DOI: 10.1038/s41598-024-54817-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
Sleep apnea (SA) is associated with risk of cardiovascular disease, cognitive decline, and accidents due to sleepiness, yet the majority (over 80%) of patients remain undiagnosed. Inertial measurement units (IMUs) are built into modern wearable devices and are capable of long-term continuous measurement with low power consumption. We examined if SA can be detected by an IMU embedded in a wristwatch device. In 122 adults who underwent polysomnography (PSG) examinations, triaxial acceleration and triaxial gyro signals from the IMU were recorded during the PSG. Subjects were divided into a training group and a test groups (both n = 61). In the training group, an algorithm was developed to extract signals in the respiratory frequency band (0.13-0.70 Hz) and detect respiratory events as transient (10-90 s) decreases in amplitude. The respiratory event frequency estimated by the algorithm correlated with the apnea-hypopnea index (AHI) of the PSG with r = 0.84 in the test group. With the cutoff values determined in the training group, moderate-to-severe SA (AHI ≥ 15) was identified with 85% accuracy and severe SA (AHI ≥ 30) with 89% accuracy in the test group. SA can be quantitatively detected by the IMU embedded in wristwatch wearable devices in adults with suspected SA.
Collapse
|
44
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
|
45
|
Burger T. Fudging the volcano-plot without dredging the data. Nat Commun 2024; 15:1392. [PMID: 38360828 PMCID: PMC10869345 DOI: 10.1038/s41467-024-45834-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/02/2024] [Indexed: 02/17/2024] Open
|
46
|
Zhang J, Ren Y, Lin L, Xing Y, Ren J. Table tennis motion recognition based on the bat trajectory using varying-length-input convolution neural networks. Sci Rep 2024; 14:3549. [PMID: 38347071 PMCID: PMC10861488 DOI: 10.1038/s41598-024-54150-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/08/2024] [Indexed: 02/15/2024] Open
Abstract
Action recognition has been applied in fields such as smart homes, gaming, traffic management, and security monitoring. Motion recognition is helpful for biomechanical analysis, auxiliary training systems, table tennis robots, motion-sensing games, virtual reality and other fields. In our study, we collected data on table tennis skill motion, created the TTMD6 dataset, and analyzed the characteristics of table tennis paddle trajectories. We propose a motion recognition algorithm to recognize paddle trajectories. Other research has used multijoint data to identify actions, while we use only the paddle trajectory to recognize table tennis skill motions, accelerating the speed of motion recognition. Therefore, it is feasible to use paddle trajectories to recognize table tennis skill motions.
Collapse
|
47
|
Schmeltz M, Ivanovic A, Schlepütz CM, Wimmer W, Remenschneider AK, Caversaccio M, Stampanoni M, Anschuetz L, Bonnin A. The human middle ear in motion: 3D visualization and quantification using dynamic synchrotron-based X-ray imaging. Commun Biol 2024; 7:157. [PMID: 38326549 PMCID: PMC10850498 DOI: 10.1038/s42003-023-05738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/21/2023] [Indexed: 02/09/2024] Open
Abstract
The characterization of the vibrations of the middle ear ossicles during sound transmission is a focal point in clinical research. However, the small size of the structures, their micrometer-scale movement, and the deep-seated position of the middle ear within the temporal bone make these types of measurements extremely challenging. In this work, dynamic synchrotron-based X-ray phase-contrast microtomography is used on acoustically stimulated intact human ears, allowing for the three-dimensional visualization of entire human eardrums and ossicular chains in motion. A post-gating algorithm is used to temporally resolve the fast micromotions at 128 Hz, coupled with a high-throughput pipeline to process the large tomographic datasets. Seven ex-vivo fresh-frozen human temporal bones in healthy conditions are studied, and the rigid body motions of the ossicles are quantitatively delineated. Clinically relevant regions of the ossicular chain are tracked in 3D, and the amplitudes of their displacement are computed for two acoustic stimuli.
Collapse
|
48
|
Geuenich MJ, Gong DW, Campbell KR. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data. Nat Commun 2024; 15:1014. [PMID: 38307875 PMCID: PMC10837127 DOI: 10.1038/s41467-024-45198-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 01/16/2024] [Indexed: 02/04/2024] Open
Abstract
A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .
Collapse
|
49
|
Bálint B, Merényi Z, Hegedüs B, Grigoriev IV, Hou Z, Földi C, Nagy LG. ContScout: sensitive detection and removal of contamination from annotated genomes. Nat Commun 2024; 15:936. [PMID: 38296951 PMCID: PMC10831095 DOI: 10.1038/s41467-024-45024-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses.
Collapse
|
50
|
Qu H, Liu K, Zhang L. Research on improved black widow algorithm for medical image denoising. Sci Rep 2024; 14:2514. [PMID: 38291147 PMCID: PMC10828493 DOI: 10.1038/s41598-024-51803-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/09/2024] [Indexed: 02/01/2024] Open
Abstract
Improving the quality of medical images is crucial for accurate clinical diagnosis; however, medical images are often disrupted by various types of noise, posing challenges to the reliability and diagnostic accuracy of the images. This study aims to enhance the Black Widow optimization algorithm and apply it to the task of denoising medical images to improve both the quality of medical images and the accuracy of diagnostic results. By introducing Tent mapping, we refined the Black Widow optimization algorithm to better adapt to the complex features of medical images. The algorithm's denoising capabilities for various types of noise were enhanced through the combination of multiple filters, all without the need for training each time to achieve preset goals. Simulation results, based on processing a dataset containing 1588 images with Gaussian, salt-and-pepper, Poisson, and speckle noise, demonstrated a reduction in Mean Squared Error (MSE) by 0.439, an increase in Peak Signal-to-Noise Ratio (PSNR) by 4.315, an improvement in Structural Similarity Index (SSIM) by 0.132, an enhancement in Edge-to-Noise Ratio (ENL) by 0.402, and an increase in Edge Preservation Index (EPI) by 0.614. Simulation experiments verified that the proposed algorithm has a certain advantage in terms of computational efficiency. The improvement, incorporating Tent mapping and a combination of multiple filters, successfully elevated the performance of the Black Widow algorithm in medical image denoising, providing an effective solution for enhancing medical image quality and diagnostic accuracy.
Collapse
|