1
|
Garcés-Jiménez A, Polo-Luque ML, Gómez-Pulido JA, Rodríguez-Puyol D, Gómez-Pulido JM. Predictive health monitoring: Leveraging artificial intelligence for early detection of infectious diseases in nursing home residents through discontinuous vital signs analysis. Comput Biol Med 2024; 174:108469. [PMID: 38636331 DOI: 10.1016/j.compbiomed.2024.108469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024]
Abstract
This research addresses the problem of detecting acute respiratory, urinary tract, and other infectious diseases in elderly nursing home residents using machine learning algorithms. The study analyzes data extracted from multiple vital signs and other contextual information for diagnostic purposes. The daily data collection process encounters sampling constraints due to weekends, holidays, shift changes, staff turnover, and equipment breakdowns, resulting in numerous nulls, repeated readings, outliers, and meaningless values. The short time series generated also pose a challenge to analysis, preventing the extraction of seasonal information or consistent trends. Blind data collection results in most of the data coming from periods when residents are healthy, resulting in excessively imbalanced data. This study proposes a data cleaning process and then builds a mechanism that reproduces the basal activity of the residents to improve the classification of the disease. The results show that the proposed basal module-assisted machine learning techniques allow anticipating diagnostics 2, 3 or 4 days before doctors decide to start treatment with antibiotics, achieving a performance measured by the area-under-the-curve metric of 0.857. The contributions of this work are: (1) a new data cleaning process; (2) the analysis of contextual information to improve data quality; (3) the generation of a baseline measure for relative comparison; and (4) the use of either binary (disease/no disease) or multiclass classification, differentiating among types of infections and showing the advantages of multiclass versus binary classification. From a medical point of view, the anticipated detection of infectious diseases in institutionalized individuals is brand new.
Collapse
Affiliation(s)
- Alberto Garcés-Jiménez
- Department of Computer Science, Universidad de Alcalá, Politechnic School, Alcala de Henares, 28805, Spain
| | - María-Luz Polo-Luque
- Department of Nursing and Physiotherapy, Universidad de Alcalá, Faculty of Medicine and Health Sciences, Alcala de Henares, 28805, Spain
| | - Juan A Gómez-Pulido
- Department of Technologies of Computers and Communications, Universidad de Extremadura, School of Technology, Cáceres, 10003, Spain.
| | - Diego Rodríguez-Puyol
- Department of Medicine and Medical Specialties, Research Foundation of the University Hospital Príncipe de Asturias, Campus Científico Tecnológico, Alcala de Henares, 28805, Spain
| | - José M Gómez-Pulido
- Department of Computer Science, Universidad de Alcalá, Politechnic School, Alcala de Henares, 28805, Spain
| |
Collapse
|
2
|
Acharjee A, Singh U, Choudhury SP, Gkoutos GV. The diagnostic potential and barriers of microbiome based therapeutics. Diagnosis (Berl) 2022; 9:411-420. [PMID: 36000189 DOI: 10.1515/dx-2022-0052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 08/03/2022] [Indexed: 02/07/2023]
Abstract
High throughput technological innovations in the past decade have accelerated research into the trillions of commensal microbes in the gut. The 'omics' technologies used for microbiome analysis are constantly evolving, and large-scale datasets are being produced. Despite of the fact that much of the research is still in its early stages, specific microbial signatures have been associated with the promotion of cancer, as well as other diseases such as inflammatory bowel disease, neurogenerative diareses etc. It has been also reported that the diversity of the gut microbiome influences the safety and efficacy of medicines. The availability and declining sequencing costs has rendered the employment of RNA-based diagnostics more common in the microbiome field necessitating improved data-analytical techniques so as to fully exploit all the resulting rich biological datasets, while accounting for their unique characteristics, such as their compositional nature as well their heterogeneity and sparsity. As a result, the gut microbiome is increasingly being demonstrating as an important component of personalised medicine since it not only plays a role in inter-individual variability in health and disease, but it also represents a potentially modifiable entity or feature that may be addressed by treatments in a personalised way. In this context, machine learning and artificial intelligence-based methods may be able to unveil new insights into biomedical analyses through the generation of models that may be used to predict category labels, and continuous values. Furthermore, diagnostic aspects will add value in the identification of the non invasive markers in the critical diseases like cancer.
Collapse
Affiliation(s)
- Animesh Acharjee
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University of Birmingham, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, UK.,MRC Health Data Research UK (HDR UK), Birmingham, UK
| | - Utpreksha Singh
- Department of Health and Life Sciences, Coventry University, Coventry, UK
| | | | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University of Birmingham, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, UK.,MRC Health Data Research UK (HDR UK), Birmingham, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, UK
| |
Collapse
|
3
|
Munquad S, Si T, Mallik S, Li A, Das AB. Subtyping and grading of lower-grade gliomas using integrated feature selection and support vector machine. Brief Funct Genomics 2022; 21:408-421. [PMID: 35923100 DOI: 10.1093/bfgp/elac025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/23/2022] [Accepted: 07/17/2022] [Indexed: 11/13/2022] Open
Abstract
Classifying lower-grade gliomas (LGGs) is a crucial step for accurate therapeutic intervention. The histopathological classification of various subtypes of LGG, including astrocytoma, oligodendroglioma and oligoastrocytoma, suffers from intraobserver and interobserver variability leading to inaccurate classification and greater risk to patient health. We designed an efficient machine learning-based classification framework to diagnose LGG subtypes and grades using transcriptome data. First, we developed an integrated feature selection method based on correlation and support vector machine (SVM) recursive feature elimination. Then, implementation of the SVM classifier achieved superior accuracy compared with other machine learning frameworks. Most importantly, we found that the accuracy of subtype classification is always high (>90%) in a specific grade rather than in mixed grade (~80%) cancer. Differential co-expression analysis revealed higher heterogeneity in mixed grade cancer, resulting in reduced prediction accuracy. Our findings suggest that it is necessary to identify cancer grades and subtypes to attain a higher classification accuracy. Our six-class classification model efficiently predicts the grades and subtypes with an average accuracy of 91% (±0.02). Furthermore, we identify several predictive biomarkers using co-expression, gene set enrichment and survival analysis, indicating our framework is biologically interpretable and can potentially support the clinician.
Collapse
Affiliation(s)
- Sana Munquad
- Department of Biotechnology, National Institute of Technology Warangal, Warangal 506004, Telangana, India
| | - Tapas Si
- Department of Computer Science and Engineering, Bankura Unnayani Institute of Engineering, Bankura 722146, West Bengal, India
| | - Saurav Mallik
- Department of Environmental Epigenetics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Aimin Li
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Asim Bikas Das
- Department of Biotechnology, National Institute of Technology Warangal, Warangal 506004, Telangana, India
| |
Collapse
|
4
|
Munquad S, Si T, Mallik S, Das AB, Zhao Z. A Deep Learning-Based Framework for Supporting Clinical Diagnosis of Glioblastoma Subtypes. Front Genet 2022; 13:855420. [PMID: 35419027 PMCID: PMC9000988 DOI: 10.3389/fgene.2022.855420] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/17/2022] [Indexed: 12/12/2022] Open
Abstract
Understanding molecular features that facilitate aggressive phenotypes in glioblastoma multiforme (GBM) remains a major clinical challenge. Accurate diagnosis of GBM subtypes, namely classical, proneural, and mesenchymal, and identification of specific molecular features are crucial for clinicians for systematic treatment. We develop a biologically interpretable and highly efficient deep learning framework based on a convolutional neural network for subtype identification. The classifiers were generated from high-throughput data of different molecular levels, i.e., transcriptome and methylome. Furthermore, an integrated subsystem of transcriptome and methylome data was also used to build the biologically relevant model. Our results show that deep learning model outperforms the traditional machine learning algorithms. Furthermore, to evaluate the biological and clinical applicability of the classification, we performed weighted gene correlation network analysis, gene set enrichment, and survival analysis of the feature genes. We identified the genotype-phenotype relationship of GBM subtypes and the subtype-specific predictive biomarkers for potential diagnosis and treatment.
Collapse
Affiliation(s)
- Sana Munquad
- Department of Biotechnology, National Institute of Technology Warangal, Warangal, India
| | - Tapas Si
- Department of Computer Science and Engineering, Bankura Unnayani Institute of Engineering, Bankura, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Asim Bikas Das
- Department of Biotechnology, National Institute of Technology Warangal, Warangal, India
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States.,Department of Pathology and Laboratory Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
5
|
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach. Virus Res 2018; 258:55-63. [PMID: 30291874 DOI: 10.1016/j.virusres.2018.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/25/2018] [Accepted: 10/02/2018] [Indexed: 11/22/2022]
Abstract
Unbiased sequencing is an upcoming method to gain information of the microbiome in a sample and for the detection of unrecognized pathogens. There are many software tools for a taxonomic classification of such metagenomics datasets available. Numerous of them have a satisfactory sensitivity and specificity for known organisms, but they fail if the sample contains unknown organisms, which cannot be detected by similarity-based classification employing available databases. However, recognition of unknowns is especially important for the detection of newly emerging pathogens, which are often RNA viruses. Here we present the composition-based analysis tool LVQ-KNN for binning unclassified nucleotide sequence reads into their provenance classes DNA or RNA. With a 5-fold cross-validation, LVQ-KNN reached correct classification rates (CCR) of up to 99.9% for the classification into DNA/RNA. Real datasets gained CCRs of up to 94.5%. Comparing the method to another composition-based analysis tool, similar or better classification results were reached. LVQ-KNN is a new tool for DNA/RNA classification of sequence reads from unbiased sequencing approaches that could be applicable for the detection of yet unknown RNA viruses in metagenomic samples. The source-code, training and test data for LVQ-KNN is available at Github (https://github.com/ab1989/LVQ-KNN).
Collapse
|
6
|
Intrusion detection system for wireless mesh network using multiple support vector machine classifiers with genetic-algorithm-based feature selection. Comput Secur 2018. [DOI: 10.1016/j.cose.2018.04.010] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
7
|
Hou T, Liu F, Liu Y, Zou QY, Zhang X, Wang K. Classification of metagenomics data at lower taxonomic level using a robust supervised classifier. Evol Bioinform Online 2015; 11:3-10. [PMID: 25673967 PMCID: PMC4309676 DOI: 10.4137/ebo.s20523] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2014] [Revised: 11/25/2014] [Accepted: 12/14/2014] [Indexed: 11/11/2022] Open
Abstract
As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers.
Collapse
Affiliation(s)
- Tao Hou
- College of Communications Engineering, Jilin University, Changchun, China
| | - Fu Liu
- College of Communications Engineering, Jilin University, Changchun, China
| | - Yun Liu
- College of Communications Engineering, Jilin University, Changchun, China
| | - Qing Yu Zou
- College of Communications Engineering, Jilin University, Changchun, China
| | - Xiao Zhang
- College of Communications Engineering, Jilin University, Changchun, China
| | - Ke Wang
- College of Communications Engineering, Jilin University, Changchun, China
| |
Collapse
|