1
|
An explainable AI-assisted web application in cancer drug value prediction. MethodsX 2024; 12:102696. [PMID: 38633421 PMCID: PMC11022087 DOI: 10.1016/j.mex.2024.102696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
In recent years, there has been an increase in the interest in adopting Explainable Artificial Intelligence (XAI) for healthcare. The proposed system includes•An XAI model for cancer drug value prediction. The model provides data that is easy to understand and explain, which is critical for medical decision-making. It also produces accurate projections.•A model outperformed existing models due to extensive training and evaluation on a large cancer medication chemical compounds dataset.•Insights into the causation and correlation between the dependent and independent actors in the chemical composition of the cancer cell. While the model is evaluated on Lung Cancer data, the architecture offered in the proposed solution is cancer agnostic. It may be scaled out to other cancer cell data if the properties are similar. The work presents a viable route for customizing treatments and improving patient outcomes in oncology by combining XAI with a large dataset. This research attempts to create a framework where a user can upload a test case and receive forecasts with explanations, all in a portable PDF report.
Collapse
|
2
|
Extracting Drug-Protein Relation from Literature Using Ensembles of Biomedical Transformers. Stud Health Technol Inform 2024; 310:639-643. [PMID: 38269887 DOI: 10.3233/shti231043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Automatic extraction of relations between drugs/chemicals and proteins from ever-growing biomedical literature is required to build up-to-date knowledge bases in biomedicine. To promote the development of automated methods, BioCreative-VII organized a shared task - the DrugProt track, to recognize drug-protein entity relations from PubMed abstracts. We participated in the shared task and leveraged deep learning-based transformer models pre-trained on biomedical data to build ensemble approaches to automatically extract drug-protein relation from biomedical literature. On the main corpora of 10,750 abstracts, our best system obtained an F1-score of 77.60% (ranked 4th among 30 participating teams), and on the large-scale corpus of 2.4M documents, our system achieved micro-averaged F1-score of 77.32% (ranked 2nd among 9 system submissions). This demonstrates the effectiveness of domain-specific transformer models and ensemble approaches for automatic relation extraction from biomedical literature.
Collapse
|
3
|
One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300380. [PMID: 38196648 PMCID: PMC10775333 DOI: 10.1101/2023.12.21.23300380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Objective To enhance the accuracy and reliability of diverse medical question-answering (QA) tasks and investigate efficient approaches deploying the Large Language Models (LLM) technologies, We developed a novel ensemble learning pipeline by utilizing state-of-the-art LLMs, focusing on improving performance on diverse medical QA datasets. Materials and Methods Our study employs three medical QA datasets: PubMedQA, MedQA-USMLE, and MedMCQA, each presenting unique challenges in biomedical question-answering. The proposed LLM-Synergy framework, focusing exclusively on zero-shot cases using LLMs, incorporates two primary ensemble methods. The first is a Boosting-based weighted majority vote ensemble, where decision-making is expedited and refined by assigning variable weights to different LLMs through a boosting algorithm. The second method is Cluster-based Dynamic Model Selection, which dynamically selects the most suitable LLM votes for each query, based on the characteristics of question contexts, using a clustering approach. Results The Majority Weighted Vote and Dynamic Model Selection methods demonstrate superior performance compared to individual LLMs across three medical QA datasets. Specifically, the accuracies are 35.84%, 96.21%, and 37.26% for MedMCQA, PubMedQA, and MedQA-USMLE, respectively, with the Majority Weighted Vote. Correspondingly, the Dynamic Model Selection yields slightly higher accuracies of 38.01%, 96.36%, and 38.13%. Conclusion The LLM-Synergy framework with two ensemble methods, represents a significant advancement in leveraging LLMs for medical QA tasks and provides an innovative way of efficiently utilizing the development with LLM Technologies, customing for both existing and potentially future challenge tasks in biomedical and health informatics research.
Collapse
|
4
|
Towards Automated COVID-19 Presence and Severity Classification. Stud Health Technol Inform 2023; 302:917-921. [PMID: 37203536 DOI: 10.3233/shti230309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
COVID-19 presence classification and severity prediction via (3D) thorax computed tomography scans have become important tasks in recent times. Especially for capacity planning of intensive care units, predicting the future severity of a COVID-19 patient is crucial. The presented approach follows state-of-theart techniques to aid medical professionals in these situations. It comprises an ensemble learning strategy via 5-fold cross-validation that includes transfer learning and combines pre-trained 3D-versions of ResNet34 and DenseNet121 for COVID19 classification and severity prediction respectively. Further, domain-specific preprocessing was applied to optimize model performance. In addition, medical information like the infection-lung-ratio, patient age, and sex were included. The presented model achieves an AUC of 79.0% to predict COVID-19 severity, and 83.7% AUC to classify the presence of an infection, which is comparable with other currently popular methods. This approach is implemented using the AUCMEDI framework and relies on well-known network architectures to ensure robustness and reproducibility.
Collapse
|
5
|
Movie recommendation model based on probabilistic matrix decomposition using hybrid AdaBoost integration. PeerJ Comput Sci 2023; 9:e1338. [PMID: 37346524 PMCID: PMC10280431 DOI: 10.7717/peerj-cs.1338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/17/2023] [Indexed: 06/23/2023]
Abstract
In recent years, recommendation systems have already played a significant role in major streaming video platforms.The probabilistic matrix factorization (PMF) model has advantages in addressing high-dimension problems and rating data sparsity in the recommendation system. However, in practical application, PMF has poor generalization ability and low prediction accuracy. For this reason, this article proposes the Hybrid AdaBoost Ensemble Method. Firstly, we use the membership function and the cluster center selection in fuzzy clustering to calculate the scoring matrix of the user-items. Secondly, the clustering user items' scoring matrix is trained by the neural network to improve the scoring prediction accuracy further. Finally, with the stability of the model, the AdaBoost integration method is introduced, and the score matrix is used as the base learner; then, the base learner is trained by different neural networks, and finally, the score prediction is obtained by voting results. In this article, we compare and analyze the performance of the proposed model on the MovieLens and FilmTrust datasets. In comparison with the PMF, FCM-PMF, Bagging-BP-PMF, and AdaBoost-SVM-PMF models, several experiments show that the mean absolute error of the proposed model increases by 1.24% and 0.79% compared with Bagging-BP-PMF model on two different datasets, and the root-mean-square error increases by 2.55% and 1.87% respectively. Finally, we introduce the weights of different neural network training based learners to improve the stability of the model's score prediction, which also proves the method's universality.
Collapse
|
6
|
Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:3861. [PMID: 37112202 PMCID: PMC10146782 DOI: 10.3390/s23083861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 03/31/2023] [Accepted: 04/06/2023] [Indexed: 06/19/2023]
Abstract
Over the last decade, the Short Message Service (SMS) has become a primary communication channel. Nevertheless, its popularity has also given rise to the so-called SMS spam. These messages, i.e., spam, are annoying and potentially malicious by exposing SMS users to credential theft and data loss. To mitigate this persistent threat, we propose a new model for SMS spam detection based on pre-trained Transformers and Ensemble Learning. The proposed model uses a text embedding technique that builds on the recent advancements of the GPT-3 Transformer. This technique provides a high-quality representation that can improve detection results. In addition, we used an Ensemble Learning method where four machine learning models were grouped into one model that performed significantly better than its separate constituent parts. The experimental evaluation of the model was performed using the SMS Spam Collection Dataset. The obtained results showed a state-of-the-art performance that exceeded all previous works with an accuracy that reached 99.91%.
Collapse
|
7
|
A framework for prediction of personalized pediatric nuclear medical dosimetry based on Machine Learning and Monte Carlo techniques. Phys Med Biol 2023; 68. [PMID: 36921349 DOI: 10.1088/1361-6560/acc4a5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 03/15/2023] [Indexed: 03/17/2023]
Abstract
GOAL A methodology is introduced for the development of an internal dosimetry prediction toolkit for nuclear medical pediatric applications. The proposed study exploits Artificial Intelligence techniques using Monte Carlo simulations as ground truth for accurate prediction of absorbed doses per organ considering personalized anatomical characteristics of any new pediatric patient.
Method: GATE Monte Carlo simulations were performed using a population of computational pediatric models to calculate the specific absorbed dose rates (SADRs) in several organs. A simulated dosimetry database was developed for 28 pediatric phantoms (age range 2-17 years old, both genders) and 5 different radiopharmaceuticals. Machine Learning regression models were trained on the produced simulated dataset, with Leave One Out Cross Validation for the prediction model evaluation. Hyperparameter optimization and ensemble learning techniques for a variation of input features were applied for achieving the best predictive power, leading to the development of a SADR prediction toolkit for any new pediatric patient for the studied organs and radiopharmaceuticals.
Main results: SADR values for 30 organs of interest were calculated via Monte Carlo simulations for 28 pediatric phantoms for the cases of five radiopharmaceuticals. The relative percentage uncertainty in the extracted dose values per organ was lower than 2.7%. An internal dosimetry prediction toolkit which can accurately predict SADRs in 30 organs for five different radiopharmaceuticals, with mean absolute percentage error on the level of 8% was developed, with specific focus on pediatric patients, by using Machine Learning regression algorithms, Single or Multiple organ training and Artificial Intelligence ensemble techniques.
Conclusion: A large dosimetry simulated database was developed and utilized for the training of Machine Learning models. The developed predictive models provide very fast results (< 2 sec) with an accuracy >90% with respect to the ground truth of Monte Carlo, considering personalized anatomical characteristics and the biodistribution of each radiopharmaceutical. The proposed method is applicable to other medical dosimetry applications in different patients' populations.
Collapse
|
8
|
Ensemble multimodal deep learning for early diagnosis and accurate classification of COVID-19. COMPUTERS & ELECTRICAL ENGINEERING : AN INTERNATIONAL JOURNAL 2022; 103:108396. [PMID: 36160764 PMCID: PMC9485428 DOI: 10.1016/j.compeleceng.2022.108396] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 09/08/2022] [Accepted: 09/12/2022] [Indexed: 05/12/2023]
Abstract
Over the past few years, the awful COVID-19 pandemic effect has become a lethal sickness. The processing of the gathered samples requires extra time due to the use of medical diagnostic equipment, methodologies, and clinical testing procedures for the early diagnosis of infected individuals. An innovative multimodal paradigm for the early diagnosis and precise categorization of COVID-19 is put up as a solution to this issue. To extract distinguishing features from the prepared chest X-ray picture and cough (audio) database, chest X-ray-based and cough-based model are used here. Other public chest X-ray image datasets, and the Coswara cough (audio) dataset containing 92 COVID-19 positive, and 1079 healthy subjects (people) using the deep Uniform-Net, and Convolutional Neural Network (CNN). The weighted sum-rule fusion method and ensemble deep learning algorithms are utilized to further combine the extracted features. For the early diagnosis of patients, the framework offers an accuracy of 98.67%.
Collapse
|
9
|
Image Quality Classification for Automated Visual Evaluation of Cervical Precancer. MEDICAL IMAGE LEARNING WITH LIMITED AND NOISY DATA : FIRST INTERNATIONAL WORKSHOP, MILLAND 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. MILLAND (WORKSHOP) (1ST : 2022 : SINGAPORE) 2022; 13559:206-217. [PMID: 36315110 PMCID: PMC9614805 DOI: 10.1007/978-3-031-16760-7_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Image quality control is a critical element in the process of data collection and cleaning. Both manual and automated analyses alike are adversely impacted by bad quality data. There are several factors that can degrade image quality and, correspondingly, there are many approaches to mitigate their negative impact. In this paper, we address image quality control toward our goal of improving the performance of automated visual evaluation (AVE) for cervical precancer screening. Specifically, we report efforts made toward classifying images into four quality categories ("unusable", "unsatisfactory", "limited", and "evaluable") and improving the quality classification performance by automatically identifying mislabeled and overly ambiguous images. The proposed new deep learning ensemble framework is an integration of several networks that consists of three main components: cervix detection, mislabel identification, and quality classification. We evaluated our method using a large dataset that comprises 87,420 images obtained from 14,183 patients through several cervical cancer studies conducted by different providers using different imaging devices in different geographic regions worldwide. The proposed ensemble approach achieved higher performance than the baseline approaches.
Collapse
|
10
|
Optimization of Performance by Combining Most Sensitive and Specific Models in Data Science Results in Majority Voting Ensemble. Stud Health Technol Inform 2022; 294:435-439. [PMID: 35612117 DOI: 10.3233/shti220496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Ensemble modeling is an increasingly popular data science technique that combines the knowledge of multiple base learners to enhance predictive performance. In this paper, the idea was to increase predictive performance by holding out three algorithms when testing multiple classifiers: (a) the best overall performing algorithm (based on the harmonic mean of sensitivity and specificity (HMSS) of that algorithm); (b) the most sensitive model; and (c) the most specific model. This approach boils down to majority voting between the predictions of these three base learners. In this exemplary study, a case of identifying a prolonged QT interval after administering a drug-drug interaction with increased risk of QT prolongation (QT-DDI) is presented. Performance measures included accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Overall performance was measured by calculating the HMSS. Results show an increase in all performance measure characteristics compared to the original best performing algorithm, except for specificity where performance remained stable. The presented approach is fairly simple and shows potential to increase predictive performance, even without adjusting the default cut-offs to differentiate between high and low risk cases. Future research should look at a way of combining all tested algorithms, instead of using only three. Similarly, this approach should be tested on a multiclass prediction problem.
Collapse
|
11
|
Multi-Disease Detection in Retinal Imaging Based on Ensembling Heterogeneous Deep Learning Models. Stud Health Technol Inform 2021. [PMID: 34545816 DOI: 10.3233/shti210537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Preventable or undiagnosed visual impairment and blindness affect billion of people worldwide. Automated multi-disease detection models offer great potential to address this problem via clinical decision support in diagnosis. In this work, we proposed an innovative multi-disease detection pipeline for retinal imaging which utilizes ensemble learning to combine the predictive capabilities of several heterogeneous deep convolutional neural network models. Our pipeline includes state-of-the-art strategies like transfer learning, class weighting, real-time image augmentation and Focal loss utilization. Furthermore, we integrated ensemble learning techniques like heterogeneous deep learning models, bagging via 5-fold cross-validation and stacked logistic regression models. Through internal and external evaluation, we were able to validate and demonstrate high accuracy and reliability of our pipeline, as well as the comparability with other state-of-the-art pipelines for retinal disease prediction.
Collapse
|
12
|
A Cross-validated Ensemble Approach to Robust Hypothesis Testing of Continuous Nonlinear Interactions: Application to Nutrition-Environment Studies. J Am Stat Assoc 2021; 117:561-573. [PMID: 36310839 PMCID: PMC9611147 DOI: 10.1080/01621459.2021.1962889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 01/03/2023]
Abstract
Gene-environment and nutrition-environment studies often involve testing of high-dimensional interactions between two sets of variables, each having potentially complex nonlinear main effects on an outcome. Construction of a valid and powerful hypothesis test for such an interaction is challenging, due to the difficulty in constructing an efficient and unbiased estimator for the complex, nonlinear main effects. In this work we address this problem by proposing a Cross-validated Ensemble of Kernels (CVEK) that learns the space of appropriate functions for the main effects using a cross-validated ensemble approach. With a carefully chosen library of base kernels, CVEK flexibly estimates the form of the main-effect functions from the data, and encourages test power by guarding against over-fitting under the alternative. The method is motivated by a study on the interaction between metal exposures in utero and maternal nutrition on children's neurodevelopment in rural Bangladesh. The proposed tests identified evidence of an interaction between minerals and vitamins intake and arsenic and manganese exposures. Results suggest that the detrimental effects of these metals are most pronounced at low intake levels of the nutrients, suggesting nutritional interventions in pregnant women could mitigate the adverse impacts of in utero metal exposures on children's neurodevelopment.
Collapse
|
13
|
Multi-class Breast Cancer Classification using Ensemble of Pretrained models and Transfer Learning. Curr Med Imaging 2021; 18:409-416. [PMID: 33602102 DOI: 10.2174/1573405617666210218101418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/21/2020] [Accepted: 01/06/2021] [Indexed: 02/08/2023]
Abstract
AIMS Early detection of breast cancer has reduced many deaths. Earlier CAD systems used to be the second opinion for radiologists and clinicians. Machine learning and deep learning has brought tremendous changes in medical diagnosis and imagining. BACKGROUND Breast cancer is the most commonly occurring cancer in the women and it is the second most common cancer overall. According to the 2018 statistics, there were over 2million cases all over the world. Belgium and Luxembourg have the highest rate of cancer. OBJECTIVE Proposed a method for breast cancer detection using Ensemble learning. 2-class and 8-class classification is performed. METHOD To deal with imbalance classification the authors have proposed an ensemble of pretrained models. RESULT 98.5% training accuracy and 89% of test accuracy are achieved on 8-class classification. And 99.1% and 98% train and test accuracy are achieved on 2 class classification. CONCLUSION it is found that there are high misclassifications in class DC when compared to the other classes, this is due to the imbalance in the dataset. In future, one can increase the size of the datasets or use different methods. In implement this research work, authors have used 2 Nvidia Tesla V100 GPU's in google cloud platform.
Collapse
|
14
|
CoVNet-19: A Deep Learning model for the detection and analysis of COVID-19 patients. Appl Soft Comput 2021; 104:107184. [PMID: 33613140 PMCID: PMC7883765 DOI: 10.1016/j.asoc.2021.107184] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 02/01/2021] [Accepted: 02/10/2021] [Indexed: 12/20/2022]
Abstract
Background: The ongoing fight with Novel Corona Virus, getting quick treatment, and rapid diagnosis reports have become an act of high priority. With millions getting infected daily and a fatality rate of 2%, we made it our motive to contribute a little to solve this real-world problem by accomplishing a significant and substantial method for diagnosing COVID-19 patients. Aim: The Exponential growth of COVID-19 cases worldwide has severely affected the health care system of highly populated countries due to proportionally a smaller number of medical practitioners, testing kits, and other resources, thus becoming essential to identify the infected people. Catering to the above problems, the purpose of this paper is to formulate an accurate, efficient, and time-saving method for detecting positive corona patients. Method: In this paper, an Ensemble Deep Convolution Neural Network model “CoVNet-19” is being proposed that can unveil important diagnostic characteristics to find COVID-19 infected patients using X-ray images chest and help radiologists and medical experts to fight this pandemic. Results: The experimental results clearly show that the overall classification accuracy obtained with the proposed approach for three-class classification among COVID-19, Pneumonia, and Normal is 98.28%, along with an average precision and Recall of 98.33% and 98.33%, respectively. Besides this, for binary classification between Non-COVID and COVID Chest X-ray images, an overall accuracy of 99.71% was obtained. Conclusion: Having a high diagnostic accuracy, our proposed ensemble Deep Learning classification model can be a productive and substantial contribution to detecting COVID-19 infected patients.
Collapse
|
15
|
Detection of lung cancer with electronic nose using a novel ensemble learning framework. J Breath Res 2021; 15. [PMID: 33578407 DOI: 10.1088/1752-7163/abe5c9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 02/12/2021] [Indexed: 02/02/2023]
Abstract
Breath analysis based on electronic nose (e-nose) is a promising new technology for the detection of lung cancer that is non-invasive, simple to operate and cost-effective. Lung cancer screening by e-nose relies on predictive models established using machine learning methods. However, using only a single machine learning method to detect lung cancer has some disadvantages, including low detection accuracy and high false negative rate. To address these problems, groups of individual learning models with excellent performance were selected from classic models, including Support Vector Machine, Decision Tree, Random Forest, Logistic Regression and K-nearest neighbor regression, to build an ensemble learning framework (PCA-SVE). The output result of the PCA-SVE framework was obtained by voting. To test this approach, we analyzed 214 breath samples measured by e-nose with 11 gas sensors of four types using the proposed PCA-SVE framework. Experimental results indicated that the accuracy, sensitivity, and specificity of the proposed framework were 95.75%, 94.78%, and 96.96%, respectively. This framework overcomes the disadvantages of a single model, thereby providing an improved, practical alternative for exhaled breath analysis by e-nose.
Collapse
|
16
|
Remote Patient Monitoring Using Radio Frequency Identification (RFID) Technology and Machine Learning for Early Detection of Suicidal Behaviour in Mental Health Facilities. SENSORS 2021; 21:s21030776. [PMID: 33498893 PMCID: PMC7865785 DOI: 10.3390/s21030776] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/20/2021] [Accepted: 01/20/2021] [Indexed: 11/24/2022]
Abstract
Remote Patient Monitoring (RPM) has gained great popularity with an aim to measure vital signs and gain patient related information in clinics. RPM can be achieved with noninvasive digital technology without hindering a patient’s daily activities and can enhance the efficiency of healthcare delivery in acute clinical settings. In this study, an RPM system was built using radio frequency identification (RFID) technology for early detection of suicidal behaviour in a hospital-based mental health facility. A range of machine learning models such as Linear Regression, Decision Tree, Random Forest, and XGBoost were investigated to help determine the optimum fixed positions of RFID reader–antennas in a simulated hospital ward. Empirical experiments showed that Decision Tree had the best performance compared to Random Forest and XGBoost models. An Ensemble Learning model was also developed, took advantage of these machine learning models based on their individual performance. The research set a path to analyse dynamic moving RFID tags and builds an RPM system to help retrieve patient vital signs such as heart rate, pulse rate, respiration rate and subtle motions to make this research state-of-the-art in terms of managing acute suicidal and self-harm behaviour in a mental health ward.
Collapse
|
17
|
Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2020; 2020. [PMID: 34278382 DOI: 10.1145/3388440.3412462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Single cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.
Collapse
|
18
|
A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing. Mol Inform 2020; 39:e1900062. [PMID: 32003548 DOI: 10.1002/minf.201900062] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 01/28/2020] [Indexed: 01/19/2023]
Abstract
Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.
Collapse
|
19
|
Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods. J Appl Stat 2019; 46:2216-2236. [PMID: 32843815 PMCID: PMC7444746 DOI: 10.1080/02664763.2019.1582614] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 02/08/2019] [Indexed: 02/06/2023]
Abstract
The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.
Collapse
|
20
|
The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification. J Appl Stat 2018; 45:2800-2818. [PMID: 31631918 PMCID: PMC6800663 DOI: 10.1080/02664763.2018.1441383] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 02/11/2018] [Indexed: 10/17/2022]
Abstract
Artificial neural networks have been successfully applied to a variety of machine learning tasks, including image recognition, semantic segmentation, and machine translation. However, few studies fully investigated ensembles of artificial neural networks. In this work, we investigated multiple widely used ensemble methods, including unweighted averaging, majority voting, the Bayes Optimal Classifier, and the (discrete) Super Learner, for image recognition tasks, with deep neural networks as candidate algorithms. We designed several experiments, with the candidate algorithms being the same network structure with different model checkpoints within a single training process, networks with same structure but trained multiple times stochastically, and networks with different structure. In addition, we further studied the over-confidence phenomenon of the neural networks, as well as its impact on the ensemble methods. Across all of our experiments, the Super Learner achieved best performance among all the ensemble methods in this study.
Collapse
|
21
|
Abstract
BACKGROUND As we are witnessing a great interest in identifying and extracting chemical entities in academic articles, many approaches have been proposed to solve this problem. In this work we describe a probabilistic framework that allows for the output of multiple information extraction systems to be combined in a systematic way. The identified entities are assigned a probability score that reflects the extractors' confidence, without the need for each individual extractor to generate a probability score. We quantitively compared the performance of multiple chemical tokenizers to measure the effect of tokenization on extraction accuracy. Later, a single Conditional Random Fields (CRF) extractor that utilizes the best performing tokenizer is built using a unique collection of features such as word embeddings and Soundex codes, which, to the best of our knowledge, has not been explored in this context before. RESULTS The ensemble of multiple extractors outperforms each extractor's individual performance during the CHEMDNER challenge. When the runs were optimized to favor recall, the ensemble approach achieved the second highest recall on unseen entities. As for the single CRF model with novel features, the extractor achieves an F1 score of 83.3% on the test set, without any post processing or abbreviation matching. CONCLUSIONS Ensemble information extraction is effective when multiple stand alone extractors are to be used, and produces higher performance than individual off the shelf extractors. The novel features introduced in the single CRF model are sufficient to achieve very competitive F1 score using a simple standalone extractor.
Collapse
|