1
|
AI- and IoT-Enabled Solutions for Healthcare. SENSORS (BASEL, SWITZERLAND) 2024; 24:2607. [PMID: 38676224 PMCID: PMC11053817 DOI: 10.3390/s24082607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/12/2024] [Indexed: 04/28/2024]
Abstract
Patient care and management have entered a new arena, where intelligent technology can assist clinicians in both diagnosis and treatment [...].
Collapse
|
2
|
Digital remote monitoring for screening and early detection of urinary tract infections. NPJ Digit Med 2024; 7:11. [PMID: 38218738 PMCID: PMC10787784 DOI: 10.1038/s41746-023-00995-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/11/2023] [Indexed: 01/15/2024] Open
Abstract
Urinary Tract Infections (UTIs) are one of the most prevalent bacterial infections in older adults and a significant contributor to unplanned hospital admissions in People Living with Dementia (PLWD), with early detection being crucial due to the predicament of reporting symptoms and limited help-seeking behaviour. The most common diagnostic tool is urine sample analysis, which can be time-consuming and is only employed where UTI clinical suspicion exists. In this method development and proof-of-concept study, participants living with dementia were monitored via low-cost devices in the home that passively measure activity, sleep, and nocturnal physiology. Using 27828 person-days of remote monitoring data (from 117 participants), we engineered features representing symptoms used for diagnosing a UTI. We then evaluate explainable machine learning techniques in passively calculating UTI risk and perform stratification on scores to support clinical translation and allow control over the balance between alert rate and sensitivity and specificity. The proposed UTI algorithm achieves a sensitivity of 65.3% (95% Confidence Interval (CI) = 64.3-66.2) and specificity of 70.9% (68.6-73.1) when predicting UTIs on unseen participants and after risk stratification, a sensitivity of 74.7% (67.9-81.5) and specificity of 87.9% (85.0-90.9). In addition, feature importance methods reveal that the largest contributions to the predictions were bathroom visit statistics, night-time respiratory rate, and the number of previous UTI events, aligning with the literature. Our machine learning method alerts clinicians of UTI risk in subjects, enabling earlier detection and enhanced screening when considering treatment.
Collapse
|
3
|
Quantitative measurement of antibiotic resistance in Mycobacterium tuberculosis reveals genetic determinants of resistance and susceptibility in a target gene approach. Nat Commun 2024; 15:488. [PMID: 38216576 PMCID: PMC10786857 DOI: 10.1038/s41467-023-44325-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 12/08/2023] [Indexed: 01/14/2024] Open
Abstract
The World Health Organization has a goal of universal drug susceptibility testing for patients with tuberculosis. However, molecular diagnostics to date have focused largely on first-line drugs and predicting susceptibilities in a binary manner (classifying strains as either susceptible or resistant). Here, we used a multivariable linear mixed model alongside whole genome sequencing and a quantitative microtiter plate assay to relate genomic mutations to minimum inhibitory concentration (MIC) in 15,211 Mycobacterium tuberculosis clinical isolates from 23 countries across five continents. We identified 492 unique MIC-elevating variants across 13 drugs, as well as 91 mutations likely linked to hypersensitivity. Our results advance genetics-based diagnostics for tuberculosis and serve as a curated training/testing dataset for development of drug resistance prediction algorithms.
Collapse
|
4
|
Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
|
5
|
TIHM: An open dataset for remote healthcare monitoring in dementia. Sci Data 2023; 10:606. [PMID: 37689815 PMCID: PMC10492790 DOI: 10.1038/s41597-023-02519-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 08/30/2023] [Indexed: 09/11/2023] Open
Abstract
Dementia is a progressive condition that affects cognitive and functional abilities. There is a need for reliable and continuous health monitoring of People Living with Dementia (PLWD) to improve their quality of life and support their independent living. Healthcare services often focus on addressing and treating already established health conditions that affect PLWD. Managing these conditions continuously can inform better decision-making earlier for higher-quality care management for PLWD. The Technology Integrated Health Management (TIHM) project developed a new digital platform to routinely collect longitudinal, observational, and measurement data, within the home and apply machine learning and analytical models for the detection and prediction of adverse health events affecting the well-being of PLWD. This work describes the TIHM dataset collected during the second phase (i.e., feasibility study) of the TIHM project. The data was collected from homes of 56 PLWD and associated with events and clinical observations (daily activity, physiological monitoring, and labels for health-related conditions). The study recorded an average of 50 days of data per participant, totalling 2803 days.
Collapse
|
6
|
On the effectiveness of compact biomedical transformers. Bioinformatics 2023; 39:btad103. [PMID: 36825820 PMCID: PMC10027428 DOI: 10.1093/bioinformatics/btad103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 12/23/2022] [Accepted: 02/23/2023] [Indexed: 02/25/2023] Open
Abstract
MOTIVATION Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension and number of layers. The natural language processing community has developed numerous strategies to compress these models utilizing techniques such as pruning, quantization and knowledge distillation, resulting in models that are considerably faster, smaller and subsequently easier to use in practice. By the same token, in this article, we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create the best efficient lightweight models that perform on par with their larger counterparts. RESULTS We trained six different models in total, with the largest model having 65 million in parameters and the smallest having 15 million; a far lower range of parameters compared with BioBERT's 110M. Based on our experiments on three different biomedical tasks, we found that models distilled from a biomedical teacher and models that have been additionally pre-trained on the PubMed dataset can retain up to 98.8% and 98.6% of the performance of the BioBERT-v1.1, respectively. Overall, our best model below 30 M parameters is BioMobileBERT, while our best models over 30 M parameters are DistilBioBERT and CompactBioBERT, which can keep up to 98.2% and 98.8% of the performance of the BioBERT-v1.1, respectively. AVAILABILITY AND IMPLEMENTATION Codes are available at: https://github.com/nlpie-research/Compact-Biomedical-Transformers. Trained models can be accessed at: https://huggingface.co/nlpie.
Collapse
|
7
|
Privacy-Aware Early Detection of COVID-19 Through Adversarial Training. IEEE J Biomed Health Inform 2022; PP:1249-1258. [PMID: 37015447 PMCID: PMC10824398 DOI: 10.1109/jbhi.2022.3230663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 11/25/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect potential cases of coronavirus using routine clinical data (blood tests, and vital signs measurements). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this study, two machine learning techniques that aim to predict a patient's COVID-19 status are examined. Using adversarial training, robust deep learning architectures are explored with the aim to protect attributes related to demographic information about the patients. The two models examined in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals (OUH), Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth Hospitals University NHS Trust (PUH), two neural networks are trained and evaluated. These networks predict PCR test results using information from basic laboratory blood tests, and vital signs collected from a patient upon arrival to the hospital. The level of privacy each one of the models can provide is assessed and the efficacy and robustness of the proposed architectures are compared with a relevant baseline. One of the main contributions in this work is the particular focus on the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks. The results on hold-out test set and external validation confirmed that there was no impact on the generalisibility of the model using adversarial learning.
Collapse
|
8
|
Network analysis to identify symptoms clusters and temporal interconnections in oncology patients. Sci Rep 2022; 12:17052. [PMID: 36224203 PMCID: PMC9556713 DOI: 10.1038/s41598-022-21140-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 09/22/2022] [Indexed: 12/30/2022] Open
Abstract
Oncology patients experience numerous co-occurring symptoms during their treatment. The identification of sentinel/core symptoms is a vital prerequisite for therapeutic interventions. In this study, using Network Analysis, we investigated the inter-relationships among 38 common symptoms over time (i.e., a total of six time points over two cycles of chemotherapy) in 987 oncology patients with four different types of cancer (i.e., breast, gastrointestinal, gynaecological, and lung). In addition, we evaluated the associations between and among symptoms and symptoms clusters and examined the strength of these interactions over time. Eight unique symptom clusters were identified within the networks. Findings from this research suggest that changes occur in the relationships and interconnections between and among co-occurring symptoms and symptoms clusters that depend on the time point in the chemotherapy cycle and the type of cancer. The evaluation of the centrality measures provides new insights into the relative importance of individual symptoms within various networks that can be considered as potential targets for symptom management interventions.
Collapse
|
9
|
High fluoroquinolone resistance proportions among multidrug-resistant tuberculosis driven by dominant L2 Mycobacterium tuberculosis clones in the Mumbai Metropolitan Region. Genome Med 2022; 14:95. [PMID: 35989319 PMCID: PMC9394022 DOI: 10.1186/s13073-022-01076-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 06/20/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Multidrug-resistant (MDR) Mycobacterium tuberculosis complex (MTBC) strains are a serious health problem in India, also contributing to one-fourth of the global MDR tuberculosis (TB) burden. About 36% of the MDR MTBC strains are reported fluoroquinolone (FQ) resistant leading to high pre-extensively drug-resistant (pre-XDR) and XDR-TB (further resistance against bedaquiline and/or linezolid) rates. Still, factors driving the MDR/pre-XDR epidemic in India are not well defined.
Methods
In a retrospective study, we analyzed 1852 consecutive MTBC strains obtained from patients from a tertiary care hospital laboratory in Mumbai by whole genome sequencing (WGS). Univariate and multivariate statistics was used to investigate factors associated with pre-XDR. Core genome multi locus sequence typing, time scaled haplotypic density (THD) method and homoplasy analysis were used to analyze epidemiological success, and positive selection in different strain groups, respectively.
Results
In total, 1016 MTBC strains were MDR, out of which 703 (69.2%) were pre-XDR and 45 (4.4%) were XDR. Cluster rates were high among MDR (57.8%) and pre-XDR/XDR (79%) strains with three dominant L2 (Beijing) strain clusters (Cl 1–3) representing half of the pre-XDR and 40% of the XDR-TB cases. L2 strains were associated with pre-XDR/XDR-TB (P < 0.001) and, particularly Cl 1–3 strains, had high first-line and FQ resistance rates (81.6–90.6%). Epidemic success analysis using THD showed that L2 strains outperformed L1, L3, and L4 strains in short- and long-term time scales. More importantly, L2 MDR and MDR + strains had higher THD success indices than their not-MDR counterparts. Overall, compensatory mutation rates were highest in L2 strains and positive selection was detected in genes of L2 strains associated with drug tolerance (prpB and ppsA) and virulence (Rv2828c). Compensatory mutations in L2 strains were associated with a threefold increase of THD indices, suggesting improved transmissibility.
Conclusions
Our data indicate a drastic increase of FQ resistance, as well as emerging bedaquiline resistance which endangers the success of newly endorsed MDR-TB treatment regimens. Rapid changes in treatment and control strategies are required to contain transmission of highly successful pre-XDR L2 strains in the Mumbai Metropolitan region but presumably also India-wide.
Collapse
|
10
|
A crowd of BashTheBug volunteers reproducibly and accurately measure the minimum inhibitory concentrations of 13 antitubercular drugs from photographs of 96-well broth microdilution plates. eLife 2022; 11:e75046. [PMID: 35588296 PMCID: PMC9286738 DOI: 10.7554/elife.75046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 05/15/2022] [Indexed: 11/28/2022] Open
Abstract
Tuberculosis is a respiratory disease that is treatable with antibiotics. An increasing prevalence of resistance means that to ensure a good treatment outcome it is desirable to test the susceptibility of each infection to different antibiotics. Conventionally, this is done by culturing a clinical sample and then exposing aliquots to a panel of antibiotics, each being present at a pre-determined concentration, thereby determining if the sample isresistant or susceptible to each sample. The minimum inhibitory concentration (MIC) of a drug is the lowestconcentration that inhibits growth and is a more useful quantity but requires each sample to be tested at a range ofconcentrations for each drug. Using 96-well broth micro dilution plates with each well containing a lyophilised pre-determined amount of an antibiotic is a convenient and cost-effective way to measure the MICs of several drugs at once for a clinical sample. Although accurate, this is still an expensive and slow process that requires highly-skilled and experienced laboratory scientists. Here we show that, through the BashTheBug project hosted on the Zooniverse citizen science platform, a crowd of volunteers can reproducibly and accurately determine the MICs for 13 drugs and that simply taking the median or mode of 11-17 independent classifications is sufficient. There is therefore a potential role for crowds to support (but not supplant) the role of experts in antibiotic susceptibility testing.
Collapse
|
11
|
An Unsupervised Data-driven Anomaly Detection Approach for Detection of Adverse Health Conditions in People Living with Dementia: Cohort Study (Preprint). JMIR Aging 2022; 5:e38211. [PMID: 36121687 PMCID: PMC9531007 DOI: 10.2196/38211] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/04/2022] [Accepted: 07/30/2022] [Indexed: 11/16/2022] Open
Abstract
Background Sensor-based remote health monitoring can be used for the timely detection of health deterioration in people living with dementia with minimal impact on their day-to-day living. Anomaly detection approaches have been widely applied in various domains, including remote health monitoring. However, current approaches are challenged by noisy, multivariate data and low generalizability. Objective This study aims to develop an online, lightweight unsupervised learning–based approach to detect anomalies representing adverse health conditions using activity changes in people living with dementia. We demonstrated its effectiveness over state-of-the-art methods on a real-world data set of 9363 days collected from 15 participant households by the UK Dementia Research Institute between August 2019 and July 2021. Our approach was applied to household movement data to detect urinary tract infections (UTIs) and hospitalizations. Methods We propose and evaluate a solution based on Contextual Matrix Profile (CMP), an exact, ultrafast distance-based anomaly detection algorithm. Using daily aggregated household movement data collected via passive infrared sensors, we generated CMPs for location-wise sensor counts, duration, and change in hourly movement patterns for each patient. We computed a normalized anomaly score in 2 ways: by combining univariate CMPs and by developing a multidimensional CMP. The performance of our method was evaluated relative to Angle-Based Outlier Detection, Copula-Based Outlier Detection, and Lightweight Online Detector of Anomalies. We used the multidimensional CMP to discover and present the important features associated with adverse health conditions in people living with dementia. Results The multidimensional CMP yielded, on average, 84.3% recall with 32.1 alerts, or a 5.1% alert rate, offering the best balance of recall and relative precision compared with Copula-Based and Angle-Based Outlier Detection and Lightweight Online Detector of Anomalies when evaluated for UTI and hospitalization. Midnight to 6 AM bathroom activity was shown to be the most important cross-patient digital biomarker of anomalies indicative of UTI, contributing approximately 30% to the anomaly score. We also demonstrated how CMP-based anomaly scoring can be used for a cross-patient view of anomaly patterns. Conclusions To the best of our knowledge, this is the first real-world study to adapt the CMP to continuous anomaly detection in a health care scenario. The CMP inherits the speed, accuracy, and simplicity of the Matrix Profile, providing configurability, the ability to denoise and detect patterns, and explainability to clinical practitioners. We addressed the need for anomaly scoring in multivariate time series health care data by developing the multidimensional CMP. With high sensitivity, a low alert rate, better overall performance than state-of-the-art methods, and the ability to discover digital biomarkers of anomalies, the CMP is a clinically meaningful unsupervised anomaly detection technique extensible to multimodal data for dementia and other health care scenarios.
Collapse
|
12
|
Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network. Brief Bioinform 2022; 23:bbac015. [PMID: 35152280 DOI: 10.1093/bib/bbac015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/17/2021] [Accepted: 01/12/2022] [Indexed: 12/17/2023] Open
Abstract
Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.
Collapse
|
13
|
Analysing behavioural changes in people with dementia using in-home monitoring technologies. Alzheimers Dement 2022. [PMID: 34971045 DOI: 10.1002/alz.052181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
BACKGROUND Behavioural changes and neuropsychiatric symptoms such as agitation are common in people with dementia. These symptoms impact the quality of life of people with dementia and can increase the stress on caregivers. This study aims to identify the likelihood of having agitation in people affected by dementia (i.e., patients and carers) using routinely collected data from in-home monitoring technologies. We have used a digital platform and analytical methods, developed in our previous study, to generate alerts when changes occur in the digital markers collected using in-home sensing technologies (i.e., vital signs, environmental and activity data). A care monitoring team use the platform and interact with participants and caregivers when an alert is generated. METHOD We have used connected sensory devices to collect environmental markers, including Passive Infra-Red (PIR), smart power plugs for monitoring home appliance use, motion and door sensors. The environmental marker data have been aggregated within each hour and used to train an agitation risk analysis model. We have trained a model using data collected from 88 homes (∼6 months of data from each home). The proposed model has two components: a self-supervised transformation learning and an ensemble classification model for agitation likelihood. Ten different neural network encoders are learned to create pseudo-labels using the samples from the unlabelled data. We use these pseudo-labels to train a classification model with a convolutional block and a decision layer. The trained convolutional block is then used to learn a latent representation of the data for an ensemble classification block. RESULTS Comparing with baseline models such as LSTM network, Bidirectional LSTM (BiLSTM) network, VGG, ResNet, Inception, Random Forest (RF), Support Vector Machine (SVM) and Gaussian Process (GP) classifiers, the proposed model performs better in sensitivity (recall) and area under the precision-recall curve with at most 40% improvement. The recall measure using the 10-fold cross-validation technique is 61%. CONCLUSION This method can support early interventions and help develop new pathways to support people affected by dementia. A limitation in our current study is that the environmental and movement data is at the home level and not personalised.
Collapse
|
14
|
An end-to-end heterogeneous graph attention network for Mycobacterium tuberculosis drug-resistance prediction. Brief Bioinform 2021; 22:6355133. [PMID: 34414415 PMCID: PMC8575050 DOI: 10.1093/bib/bbab299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/28/2021] [Accepted: 07/16/2021] [Indexed: 11/23/2022] Open
Abstract
Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
Collapse
|
15
|
Development and validation of early warning score systems for COVID-19 patients. Healthc Technol Lett 2021; 8:105-117. [PMID: 34221413 PMCID: PMC8239612 DOI: 10.1049/htl2.12009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 02/22/2021] [Accepted: 03/19/2021] [Indexed: 12/15/2022] Open
Abstract
COVID‐19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high‐flow nasal oxygen, continuous positive airways pressure, non‐invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub‐optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.
Collapse
|
16
|
Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test. Lancet Digit Health 2021; 3:e78-e87. [PMID: 33509388 PMCID: PMC7831998 DOI: 10.1016/s2589-7500(20)30274-0] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/20/2020] [Accepted: 11/10/2020] [Indexed: 01/19/2023]
Abstract
BACKGROUND The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. METHODS We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. FINDINGS We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). INTERPRETATION Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. FUNDING Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.
Collapse
|
17
|
Machine Learning Techniques for Soybean Charcoal Rot Disease Prediction. FRONTIERS IN PLANT SCIENCE 2020; 11:590529. [PMID: 33381132 PMCID: PMC7767839 DOI: 10.3389/fpls.2020.590529] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 11/23/2020] [Indexed: 06/01/2023]
Abstract
Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in https://github.com/Elham-khalili/Soybean-Charcoal-Rot-Disease-Prediction-Dataset-code.
Collapse
|
18
|
Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking. Front Microbiol 2020; 11:667. [PMID: 32390972 PMCID: PMC7188832 DOI: 10.3389/fmicb.2020.00667] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/24/2020] [Indexed: 12/12/2022] Open
Abstract
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.
Collapse
|
19
|
Two-Step Deep Learning for Estimating Human Sleep Pose Occluded by Bed Covers. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:3115-3118. [PMID: 31946547 DOI: 10.1109/embc.2019.8856873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this study, a novel sleep pose identification method has been proposed for classifying 12 different sleep postures using a two-step deep learning process. For this purpose, transfer learning as an initial stage retrains a well-known CNN network (VGG-19) to categorise the data into four main pose classes, namely: supine, left, right, and prone. According to the decision made by VGG-19, subsets of the image data are next passed to one of four dedicated sub-class CNNs. As a result, the pose estimation label is further refined from one of four sleep pose labels to one of 12 sleep pose labels. 10 participants contributed for recording infrared (IR) images of 12 pre-defined sleep positions. Participants were covered by a blanket to occlude the original pose and present a more realistic sleep situation. Finally, we have compared our results with (1) the traditional CNN learning from scratch and (2) retrained VGG-19 network in one stage. The average accuracy increased from 74.5% & 78.1% to 85.6% compared with (1) & (2) respectively.
Collapse
|
20
|
Application of machine learning techniques to tuberculosis drug resistance analysis. Bioinformatics 2019; 35:2276-2282. [PMID: 30462147 PMCID: PMC6596891 DOI: 10.1093/bioinformatics/bty949] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Revised: 10/28/2018] [Accepted: 11/19/2018] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Timely identification of Mycobacterium tuberculosis (MTB) resistance to existing drugs is vital to decrease mortality and prevent the amplification of existing antibiotic resistance. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. However, they have been not validated on a large cohort of MTB samples from multi-centers across the world in terms of resistance prediction and resistance marker identification. Several machine learning classifiers and linear dimension reduction techniques were developed and compared for a cohort of 13 402 isolates collected from 16 countries across 6 continents and tested 11 drugs. RESULTS Compared to conventional molecular diagnostic test, area under curve of the best machine learning classifier increased for all drugs especially by 23.11%, 15.22% and 10.14% for pyrazinamide, ciprofloxacin and ofloxacin, respectively (P < 0.01). Logistic regression and gradient tree boosting found to perform better than other techniques. Moreover, logistic regression/gradient tree boosting with a sparse principal component analysis/non-negative matrix factorization step compared with the classifier alone enhanced the best performance in terms of F1-score by 12.54%, 4.61%, 7.45% and 9.58% for amikacin, moxifloxacin, ofloxacin and capreomycin, respectively, as well increasing area under curve for amikacin and capreomycin. Results provided a comprehensive comparison of various techniques and confirmed the application of machine learning for better prediction of the large diverse tuberculosis data. Furthermore, mutation ranking showed the possibility of finding new resistance/susceptible markers. AVAILABILITY AND IMPLEMENTATION The source code can be found at http://www.robots.ox.ac.uk/ davidc/code.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
21
|
Improving time–frequency domain sleep EEG classification via singular spectrum analysis. J Neurosci Methods 2016; 273:96-106. [DOI: 10.1016/j.jneumeth.2016.08.008] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2015] [Revised: 08/10/2016] [Accepted: 08/11/2016] [Indexed: 11/28/2022]
|
22
|
Quaternion Singular Spectrum Analysis of Electroencephalogram With Application in Sleep Analysis. IEEE Trans Neural Syst Rehabil Eng 2016; 24:57-67. [PMID: 26276995 DOI: 10.1109/tnsre.2015.2465177] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
A group decision-making tool for the application of membrane technologies in different water reuse scenarios. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2015; 156:97-108. [PMID: 25839744 DOI: 10.1016/j.jenvman.2015.02.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 02/15/2015] [Accepted: 02/28/2015] [Indexed: 06/04/2023]
Abstract
A global challenge of increasing concern is diminishing fresh water resources. A growing practice in many communities to supplement diminishing fresh water availability has been the reuse of water. Novel methods of treating polluted waters, such as membrane assisted technologies, have recently been developed and successfully implemented in many places. Given the diversity of membrane assisted technologies available, the current challenge is how to select a reliable alternative among numerous technologies for appropriate water reuse. In this research, a fuzzy logic based multi-criteria, group decision making tool has been developed. This tool has been employed in the selection of appropriate membrane treatment technologies for several non-potable and potable reuse scenarios. Robust criteria, covering technical, environmental, economic and socio-cultural aspects, were selected, while 10 different membrane assisted technologies were assessed in the tool. The results show this approach capable of facilitating systematic and rigorous analysis in the comparison and selection of membrane assisted technologies for advanced wastewater treatment and reuse.
Collapse
|
24
|
Complex tensor based blind source separation of EEG for tracking P300 subcomponents. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2015:6999-7002. [PMID: 26737903 DOI: 10.1109/embc.2015.7320003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Complex tensor factorisation of correlated brain sources is addressed in this paper. The electrical brain responses due to motory, sensory, or cognitive stimuli, i.e. event related potentials (ERPs), particularly P300, have been used for cognitive information processing. P300 has two subcomponents, P3a and P3b which are correlated and therefore, the traditional blind source separation approaches cannot solve the problem. In this work, a complex-valued tensor factorisation of electroencephalography (EEG) signals is introduced with the aim of separating P300 subcomponents. The proposed method uses complex-valued statistics to exploit the data correlation. In this way, the variations of P3a and p3b can be tracked for the assessment of the brain state. The results of this work will be compared with those of spatial principal component analysis (SPCA) method.
Collapse
|
25
|
Tensor Based Singular Spectrum Analysis for Automatic Scoring of Sleep EEG. IEEE Trans Neural Syst Rehabil Eng 2015; 23:1-9. [DOI: 10.1109/tnsre.2014.2329557] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
26
|
Comparative Application of Non-negative Decomposition Methods in Classifying Fatigue and Non-fatigue States. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2014. [DOI: 10.1007/s13369-014-1242-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|