301
|
Ecological analysis of gamasid mites on the body surface of Norway rats (Rattus norvegicus) in Yunnan Province, Southwest China. Biologia (Bratisl) 2019. [DOI: 10.2478/s11756-019-00383-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
302
|
Sohail A, Arif F. Supervised and unsupervised algorithms for bioinformatics and data science. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 151:14-22. [PMID: 31816343 DOI: 10.1016/j.pbiomolbio.2019.11.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 09/25/2019] [Accepted: 11/27/2019] [Indexed: 01/16/2023]
Abstract
Bioinformatics refers to an ever evolving huge field of research based on millions of algorithms, designated to several data banks. Such algorithms are either supervised or unsupervised. In this article, a detailed overview of the supervised and unsupervised techniques is presented with the aid of examples. The aim of this article is to provide the readers with the basic understanding of the state of the art models, which are key ingredients of explainable machine learning in the field of bioinformatics.
Collapse
Affiliation(s)
- Ayesha Sohail
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, 54000, Pakistan.
| | - Fatima Arif
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, 54000, Pakistan
| |
Collapse
|
303
|
Alexander J, Edwards RA, Manca L, Grugni R, Bonfanti G, Emir B, Whalen E, Watt S, Brodsky M, Parsons B. Integrating Machine Learning With Microsimulation to Classify Hypothetical, Novel Patients for Predicting Pregabalin Treatment Response Based on Observational and Randomized Data in Patients With Painful Diabetic Peripheral Neuropathy. Pragmat Obs Res 2019; 10:67-76. [PMID: 31802967 PMCID: PMC6827520 DOI: 10.2147/por.s214412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 10/15/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose Variability in patient treatment responses can be a barrier to effective care. Utilization of available patient databases may improve the prediction of treatment responses. We evaluated machine learning methods to predict novel, individual patient responses to pregabalin for painful diabetic peripheral neuropathy, utilizing an agent-based modeling and simulation platform that integrates real-world observational study (OS) data and randomized clinical trial (RCT) data. Patients and methods The best supervised machine learning methods were selected (through literature review) and combined in a novel way for aligning patients with relevant subgroups that best enable prediction of pregabalin responses. Data were derived from a German OS of pregabalin (N=2642) and nine international RCTs (N=1320). Coarsened exact matching of OS and RCT patients was used and a hierarchical cluster analysis was implemented. We tested which machine learning methods would best align candidate patients with specific clusters that predict their pain scores over time. Cluster alignments would trigger assignments of cluster-specific time-series regressions with lagged variables as inputs in order to simulate "virtual" patients and generate 1000 trajectory variations for given novel patients. Results Instance-based machine learning methods (k-nearest neighbor, supervised fuzzy c-means) were selected for quantitative analyses. Each method alone correctly classified 56.7% and 39.1% of patients, respectively. An "ensemble method" (combining both methods) correctly classified 98.4% and 95.9% of patients in the training and testing datasets, respectively. Conclusion An ensemble combination of two instance-based machine learning techniques best accommodated different data types (dichotomous, categorical, continuous) and performed better than either technique alone in assigning novel patients to subgroups for predicting treatment outcomes using microsimulation. Assignment of novel patients to a cluster of similar patients has the potential to improve prediction of patient outcomes for chronic conditions in which initial treatment response can be incorporated using microsimulation. Clinical trial registries www.clinicaltrials.gov: NCT00156078, NCT00159679, NCT00143156, NCT00553475.
Collapse
Affiliation(s)
- Joe Alexander
- Global Medical Affairs, Pfizer Inc, New York, NY 10017, USA
| | - Roger A Edwards
- Health Services Consulting Corporation, Boxborough, MA 01719, USA
| | | | | | | | - Birol Emir
- Global Statistics, Pfizer Inc, New York, NY 10017, USA
| | - Ed Whalen
- Global Statistics, Pfizer Inc, New York, NY 10017, USA
| | - Steve Watt
- Global Medical Affairs, Pfizer Inc, New York, NY 10017, USA
| | - Marina Brodsky
- Global Medical Affairs, Pfizer Inc, Groton, CT 06340, USA
| | - Bruce Parsons
- Global Medical Product Evaluation, Pfizer Inc, New York, NY 10017, USA
| |
Collapse
|
304
|
Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16234842. [PMID: 31810204 PMCID: PMC6926879 DOI: 10.3390/ijerph16234842] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 11/26/2019] [Accepted: 11/27/2019] [Indexed: 01/14/2023]
Abstract
Despite a decline in the prevalence of hepatitis B in China, the disease burden remains high. Large populations unaware of infection risk often fail to meet the ideal treatment window, resulting in poor prognosis. The purpose of this study was to develop and evaluate models identifying high-risk populations who should be tested for hepatitis B surface antigen. Data came from a large community-based health screening, including 97,173 individuals, with an average age of 54.94. A total of 33 indicators were collected as model predictors, including demographic characteristics, routine blood indicators, and liver function. Borderline-Synthetic minority oversampling technique (SMOTE) was conducted to preprocess the data and then four predictive models, namely, the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and logistic regression (LR) algorithms, were developed. The positive rate of hepatitis B surface antigen (HBsAg) was 8.27%. The area under the receiver operating characteristic curves for XGBoost, RF, DT, and LR models were 0.779, 0.752, 0.619, and 0.742, respectively. The Borderline-SMOTE XGBoost combined model outperformed the other models, which correctly predicted 13,637/19,435 cases (sensitivity 70.8%, specificity 70.1%), and the variable importance plot of XGBoost model indicated that age was of high importance. The prediction model can be used to accurately identify populations at high risk of hepatitis B infection that should adopt timely appropriate medical treatment measures.
Collapse
|
305
|
Nagaraj SB, Sidorenkov G, van Boven JFM, Denig P. Predicting short- and long-term glycated haemoglobin response after insulin initiation in patients with type 2 diabetes mellitus using machine-learning algorithms. Diabetes Obes Metab 2019; 21:2704-2711. [PMID: 31453664 PMCID: PMC6899933 DOI: 10.1111/dom.13860] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 07/30/2019] [Accepted: 08/20/2019] [Indexed: 01/04/2023]
Abstract
AIM To assess the potential of supervised machine-learning techniques to identify clinical variables for predicting short-term and long-term glycated haemoglobin (HbA1c) response after insulin treatment initiation in patients with type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS We included patients with T2DM from the Groningen Initiative to Analyse Type 2 diabetes Treatment (GIANTT) database who started insulin treatment between 2007 and 2013 and had a minimum follow-up of 2 years. Short- and long-term responses at 6 (±2) and 24 (±2) months after insulin initiation, respectively, were assessed. Patients were defined as good responders if they had a decrease in HbA1c ≥ 5 mmol/mol or reached the recommended level of HbA1c ≤ 53 mmol/mol. Twenty-four baseline clinical variables were used for the analysis and an elastic net regularization technique was used for variable selection. The performance of three traditional machine-learning algorithms was compared for the prediction of short- and long-term responses and the area under the receiver-operating characteristic curve (AUC) was used to assess the performance of the prediction models. RESULTS The elastic net regularization-based generalized linear model, which included baseline HbA1c and estimated glomerular filtration rate, correctly classified short- and long-term HbA1c response after treatment initiation, with AUCs of 0.80 (95% CI 0.78-0.83) and 0.81 (95% CI 0.79-0.84), respectively, and outperformed the other machine-learning algorithms. Using baseline HbA1c alone, an AUC = 0.71 (95% CI 0.65-0.73) and 0.72 (95% CI 0.66-0.75) was obtained for predicting short-term and long-term response, respectively. CONCLUSIONS Machine-learning algorithm performed well in the prediction of an individual's short-term and long-term HbA1c response using baseline clinical variables.
Collapse
Affiliation(s)
- Sunil B. Nagaraj
- Department of Clinical Pharmacy and Pharmacology, University of GroningenUniversity Medical Centre GroningenGroningenThe Netherlands
| | - Grigory Sidorenkov
- Department of Clinical Pharmacy and Pharmacology, University of GroningenUniversity Medical Centre GroningenGroningenThe Netherlands
- Department of Epidemiology, University of GroningenUniversity Medical Centre GroningenGroningenThe Netherlands
| | - Job F. M. van Boven
- Department of Clinical Pharmacy and Pharmacology, University of GroningenUniversity Medical Centre GroningenGroningenThe Netherlands
| | - Petra Denig
- Department of Clinical Pharmacy and Pharmacology, University of GroningenUniversity Medical Centre GroningenGroningenThe Netherlands
| |
Collapse
|
306
|
A Review of Methodological Approaches for Developing Diagnostic Algorithms for Diabetes Screening. J Nurs Meas 2019; 27:433-457. [PMID: 31871284 DOI: 10.1891/1061-3749.27.3.433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND AND PURPOSE Diagnostic algorithms are invaluable tools for screening diabetes. This review aimed to evaluate and identify the most robust methodological approaches for developing diagnostic algorithms for screening diabetes. METHODS Following a literature search, methodological quality of algorithm development studies was evaluated using the TRIPOD guidelines (Collins, Reitsma, Altman, & Moons, 2015). RESULTS Methods used for developing the algorithms included logistic regression models, classification and regression trees, Random Forest and TreeNet, Artificial Neural Networks, and Naïve Bayes. Methodological issues for algorithm development studies were related to handling of missing values, reporting recruitment methods, categorization of continuous variables, and statistical controls. CONCLUSIONS Most studies exhibited critical methodological flaws and poor adherence to reporting standards. Diabetes screening algorithms can easily be availed electronically and utilized by nurses at minimal cost even in underserved areas.
Collapse
|
307
|
Neto C, Brito M, Lopes V, Peixoto H, Abelha A, Machado J. Application of Data Mining for the Prediction of Mortality and Occurrence of Complications for Gastric Cancer Patients. ENTROPY 2019. [PMCID: PMC7514508 DOI: 10.3390/e21121163] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The development of malign cells that can grow in any part of the stomach, known as gastric cancer, is one of the most common causes of death worldwide. In order to increase the survival rate in patients with this condition, it is essential to improve the decision-making process leading to a better and more efficient selection of treatment strategies. Nowadays, with the large amount of information present in hospital institutions, it is possible to use data mining algorithms to improve the healthcare delivery. Thus, this study, using the CRISP methodology, aims to predict not only the mortality associated with this disease, but also the occurrence of any complication following surgery. A set of classification models were tested and compared in order to improve the prediction accuracy. The study showed that, on one hand, the J48 algorithm using oversampling is the best technique to predict the mortality in gastric cancer patients, with an accuracy of approximately 74%. On the other hand, the rain forest algorithm using oversampling presents the best results when predicting the possible occurrence of complications among gastric cancer patients after their in-hospital stays, with an accuracy of approximately 83%.
Collapse
Affiliation(s)
- Cristiana Neto
- Algoritmi Research Center, University of Minho, 4710-057 Braga, Portugal; (C.N.); (M.B.); (H.P.); (A.A.)
| | - Maria Brito
- Algoritmi Research Center, University of Minho, 4710-057 Braga, Portugal; (C.N.); (M.B.); (H.P.); (A.A.)
| | - Vítor Lopes
- São João Hospital Center, 4200-319 Porto, Portugal;
| | - Hugo Peixoto
- Algoritmi Research Center, University of Minho, 4710-057 Braga, Portugal; (C.N.); (M.B.); (H.P.); (A.A.)
| | - António Abelha
- Algoritmi Research Center, University of Minho, 4710-057 Braga, Portugal; (C.N.); (M.B.); (H.P.); (A.A.)
| | - José Machado
- Algoritmi Research Center, University of Minho, 4710-057 Braga, Portugal; (C.N.); (M.B.); (H.P.); (A.A.)
- Correspondence:
| |
Collapse
|
308
|
Mahabub A. A robust voting approach for diabetes prediction using traditional machine learning techniques. SN APPLIED SCIENCES 2019. [DOI: 10.1007/s42452-019-1759-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
|
309
|
Kearney E, Wojcik A, Babu D. Artificial intelligence in genetic services delivery: Utopia or apocalypse? J Genet Couns 2019; 29:8-17. [PMID: 31749317 DOI: 10.1002/jgc4.1192] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 10/26/2019] [Accepted: 10/28/2019] [Indexed: 11/08/2022]
Abstract
Artificial intelligence (AI) technologies have a long history, with increasing presence and potential in society and medicine. Much of the medical literature is highly optimistic about AI and machine learning, but fears also exist that healthcare professionals will be replaced by machines. AI remains mysterious for many practitioners, so this paper aims to unwind both hype and fear related to the technology for genetics professionals. After an historical introduction to AI in understandable and practical terms, we review its limitations. Building upon this foundation, we discuss current AI applications in medicine, including genomics and genetic counseling, offering grounded ideas about the impact and role of AI in genetic counseling and delivery of genetic services. Since AI is already being used in genomics today, now is the time to fundamentally understand what it is, how it is being used, what its limitations are, and how it will continue to be integrated into genetics as we look ahead.
Collapse
|
310
|
You Y, Doubova SV, Pinto-Masis D, Pérez-Cuevas R, Borja-Aburto VH, Hubbard A. Application of machine learning methodology to assess the performance of DIABETIMSS program for patients with type 2 diabetes in family medicine clinics in Mexico. BMC Med Inform Decis Mak 2019; 19:221. [PMID: 31718638 PMCID: PMC6852791 DOI: 10.1186/s12911-019-0950-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/25/2019] [Indexed: 12/05/2022] Open
Abstract
Background The study aimed to assess the performance of a multidisciplinary-team diabetes care program called DIABETIMSS on glycemic control of type 2 diabetes (T2D) patients, by using available observational patient data and machine-learning-based targeted learning methods. Methods We analyzed electronic health records and laboratory databases from the year 2012 to 2016 of T2D patients from six family medicine clinics (FMCs) delivering the DIABETIMSS program, and five FMCs providing routine care. All FMCs belong to the Mexican Institute of Social Security and are in Mexico City and the State of Mexico. The primary outcome was glycemic control. The study covariates included: patient sex, age, anthropometric data, history of glycemic control, diabetic complications and comorbidity. We measured the effects of DIABETIMSS program through 1) simple unadjusted mean differences; 2) adjusted via standard logistic regression and 3) adjusted via targeted machine learning. We treated the data as a serial cross-sectional study, conducted a standard principal components analysis to explore the distribution of covariates among clinics, and performed regression tree on data transformed to use the prediction model to identify patient sub-groups in whom the program was most successful. To explore the robustness of the machine learning approaches, we conducted a set of simulations and the sensitivity analysis with process-of-care indicators as possible confounders. Results The study included 78,894 T2D patients, from which 37,767patients received care through DIABETIMSS. The impact of DIABETIMSS ranged, among clinics, from 2 to 8% improvement in glycemic control, with an overall (pooled) estimate of 5% improvement. T2D patients with fewer complications have more significant benefit from DIABETIMSS than those with more complications. At the FMC’s delivering the conventional model the predicted impacts were like what was observed empirically in the DIABETIMSS clinics. The sensitivity analysis did not change the overall estimate average across clinics. Conclusions DIABETIMSS program had a small, but significant increase in glycemic control. The use of machine learning methods yields both population-level effects and pinpoints the sub-groups of patients the program benefits the most. These methods exploit the potential of routine observational patient data within complex healthcare systems to inform decision-makers.
Collapse
Affiliation(s)
- Yue You
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, USA
| | - Svetlana V Doubova
- Epidemiology and Health Services Research Unit, CMN Siglo XXI, Mexican Institute of Social Security, Av. Cuauhtemoc 330, Col. Doctores, Mexico City, Mexico.
| | - Diana Pinto-Masis
- Interamerican Development Bank, 1300 New York Ave NW, Washington DC, 20577E, USA
| | | | | | - Alan Hubbard
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, USA
| |
Collapse
|
311
|
Liu Y, Ye S, Xiao X, Sun C, Wang G, Wang G, Zhang B. Machine Learning For Tuning, Selection, And Ensemble Of Multiple Risk Scores For Predicting Type 2 Diabetes. Risk Manag Healthc Policy 2019; 12:189-198. [PMID: 31807099 PMCID: PMC6842709 DOI: 10.2147/rmhp.s225762] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 10/08/2019] [Indexed: 12/31/2022] Open
Abstract
Background This study proposes the use of machine learning algorithms to improve the accuracy of type 2 diabetes predictions using non-invasive risk score systems. Methods We evaluated and compared the prediction accuracies of existing non-invasive risk score systems using the data from the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals: A Longitudinal Study). Two simple risk scores were established on the bases of logistic regression. Machine learning techniques (ensemble methods) were used to improve prediction accuracies by combining the individual score systems. Results Existing score systems from Western populations performed worse than the scores from Eastern populations in general. The two newly established score systems performed better than most existing scores systems but a little worse than the Chinese score system. Using ensemble methods with model selection algorithms yielded better prediction accuracy than all the simple score systems. Conclusion Our proposed machine learning methods can be used to improve the accuracy of screening the undiagnosed type 2 diabetes and identifying the high-risk patients.
Collapse
Affiliation(s)
- Yujia Liu
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China
| | - Shangyuan Ye
- Department of Population Medicine, Harvard Pilgrim Health Care and Harvard Medical School, Boston, MA, USA
| | - Xianchao Xiao
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China
| | - Chenglin Sun
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China
| | - Gang Wang
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China
| | - Guixia Wang
- Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People's Republic of China
| | - Bo Zhang
- Department of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
312
|
Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods. Healthc Inform Res 2019; 25:248-261. [PMID: 31777668 PMCID: PMC6859270 DOI: 10.4258/hir.2019.25.4.248] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 10/06/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
Objectives The incidence of type 2 diabetes mellitus has increased significantly in recent years. With the development of artificial intelligence applications in healthcare, they are used for diagnosis, therapeutic decision making, and outcome prediction, especially in type 2 diabetes mellitus. This study aimed to identify the artificial intelligence (AI) applications for type 2 diabetes mellitus care. Methods This is a review conducted in 2018. We searched the PubMed, Web of Science, and Embase scientific databases, based on a combination of related mesh terms. The article selection process was based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Finally, 31 articles were selected after inclusion and exclusion criteria were applied. Data gathering was done by using a data extraction form. Data were summarized and reported based on the study objectives. Results The main applications of AI for type 2 diabetes mellitus care were screening and diagnosis in different stages. Among all of the reviewed AI methods, machine learning methods with 71% (n = 22) were the most commonly applied techniques. Many applications were in multi method forms (23%). Among the machine learning algorithms applications, support vector machine (21%) and naive Bayesian (19%) were the most commonly used methods. The most important variables that were used in the selected studies were body mass index, fasting blood sugar, blood pressure, HbA1c, triglycerides, low-density lipoprotein, high-density lipoprotein, and demographic variables. Conclusions It is recommended to select optimal algorithms by testing various techniques. Support vector machine and naive Bayesian might achieve better performance than other applications due to the type of variables and targets in diabetes-related outcomes classification.
Collapse
Affiliation(s)
- Shahabeddin Abhari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Sharareh R Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehdi Ebrahimi
- Department of Internal Medicine, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hajar Hasannejadasl
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Garavand
- Department of Health Information Management and Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
313
|
Fiorini S, Hajati F, Barla A, Girosi F. Predicting diabetes second-line therapy initiation in the Australian population via time span-guided neural attention network. PLoS One 2019; 14:e0211844. [PMID: 31626666 PMCID: PMC6799900 DOI: 10.1371/journal.pone.0211844] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 09/18/2019] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION The first line of treatment for people with Diabetes mellitus is metformin. However, over the course of the disease metformin may fail to achieve appropriate glycemic control, and a second-line therapy may become necessary. In this paper we introduce Tangle, a time span-guided neural attention model that can accurately and timely predict the upcoming need for a second-line diabetes therapy from administrative data in the Australian adult population. The method is suitable for designing automatic therapy review recommendations for patients and their providers without the need to collect clinical measures. DATA We analyzed seven years of de-identified records (2008-2014) of the 10% publicly available linked sample of Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) electronic databases of Australia. METHODS By design, Tangle inherits the representational power of pre-trained word embedding, such as GloVe, to encode sequences of claims with the related MBS codes. Moreover, the proposed attention mechanism natively exploits the information hidden in the time span between two successive claims (measured in number of days). We compared the proposed method against state-of-the-art sequence classification methods. RESULTS Tangle outperforms state-of-the-art recurrent neural networks, including attention-based models. In particular, when the proposed time span-guided attention strategy is coupled with pre-trained embedding methods, the model performance reaches an Area Under the ROC Curve of 90%, an improvement of almost 10 percentage points over an attentionless recurrent architecture. IMPLEMENTATION Tangle is implemented in Python using Keras and it is hosted on GitHub at https://github.com/samuelefiorini/tangle.
Collapse
Affiliation(s)
| | - Farshid Hajati
- School of Information Technology and Engineering, MIT Sydney, Sydney, New South Wales, Australia
- Translational Health Research Institute, Western Sydney University, Penrith, New South Wales, Australia
- Capital Markets CRC, Sydney, New South Wales, Australia
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, Genoa, Italy
| | - Federico Girosi
- School of Information Technology and Engineering, MIT Sydney, Sydney, New South Wales, Australia
- Translational Health Research Institute, Western Sydney University, Penrith, New South Wales, Australia
- Capital Markets CRC, Sydney, New South Wales, Australia
- Digital Health CRC, Sydney, New South Wales, Australia
| |
Collapse
|
314
|
Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord 2019; 19:101. [PMID: 31615566 PMCID: PMC6794897 DOI: 10.1186/s12902-019-0436-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Accepted: 09/30/2019] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body's inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having Diabetes Mellitus based on patient demographic data and the laboratory results during their visits to medical facilities. METHODS Using the most recent records of 13,309 Canadian patients aged between 18 and 90 years, along with their laboratory information (age, sex, fasting blood glucose, body mass index, high-density lipoprotein, triglycerides, blood pressure, and low-density lipoprotein), we built predictive models using Logistic Regression and Gradient Boosting Machine (GBM) techniques. The area under the receiver operating characteristic curve (AROC) was used to evaluate the discriminatory capability of these models. We used the adjusted threshold method and the class weight method to improve sensitivity - the proportion of Diabetes Mellitus patients correctly predicted by the model. We also compared these models to other learning machine techniques such as Decision Tree and Random Forest. RESULTS The AROC for the proposed GBM model is 84.7% with a sensitivity of 71.6% and the AROC for the proposed Logistic Regression model is 84.0% with a sensitivity of 73.4%. The GBM and Logistic Regression models perform better than the Random Forest and Decision Tree models. CONCLUSIONS The ability of our model to predict patients with Diabetes using some commonly used lab results is high with satisfactory sensitivity. These models can be built into an online computer program to help physicians in predicting patients with future occurrence of diabetes and providing necessary preventive interventions. The model is developed and validated on the Canadian population which is more specific and powerful to apply on Canadian patients than existing models developed from US or other populations. Fasting blood glucose, body mass index, high-density lipoprotein, and triglycerides were the most important predictors in these models.
Collapse
Affiliation(s)
- Hang Lai
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| | - Huaxiong Huang
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| | - Karim Keshavjee
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M6 Canada
| | - Aziz Guergachi
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
- Ted Rogers School of Management - Information Technology Management, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
| | - Xin Gao
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| |
Collapse
|
315
|
Álvarez JD, Matias-Guiu JA, Cabrera-Martín MN, Risco-Martín JL, Ayala JL. An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders. BMC Bioinformatics 2019; 20:491. [PMID: 31601182 PMCID: PMC6788103 DOI: 10.1186/s12859-019-3027-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022] Open
Abstract
Background The analysis of health and medical data is crucial for improving the diagnosis precision, treatments and prevention. In this field, machine learning techniques play a key role. However, the amount of health data acquired from digital machines has high dimensionality and not all data acquired from digital machines are relevant for a particular disease. Primary Progressive Aphasia (PPA) is a neurodegenerative syndrome including several specific diseases, and it is a good model to implement machine learning analyses. In this work, we applied five feature selection algorithms to identify the set of relevant features from 18F-fluorodeoxyglucose positron emission tomography images of the main areas affected by PPA from patient records. On the other hand, we carried out classification and clustering algorithms before and after the feature selection process to contrast both results with those obtained in a previous work. We aimed to find the best classifier and the more relevant features from the WEKA tool to propose further a framework for automatic help on diagnosis. Dataset contains data from 150 FDG-PET imaging studies of 91 patients with a clinic prognosis of PPA, which were examined twice, and 28 controls. Our method comprises six different stages: (i) feature extraction, (ii) expertise knowledge supervision (iii) classification process, (iv) comparing classification results for feature selection, (v) clustering process after feature selection, and (vi) comparing clustering results with those obtained in a previous work. Results Experimental tests confirmed clustering results from a previous work. Although classification results for some algorithms are not decisive for reducing features precisely, Principal Components Analisys (PCA) results exhibited similar or even better performances when compared to those obtained with all features. Conclusions Although reducing the dimensionality does not means a general improvement, the set of features is almost halved and results are better or quite similar. Finally, it is interesting how these results expose a finer grain classification of patients according to the neuroanatomy of their disease.
Collapse
Affiliation(s)
- Josefa Díaz Álvarez
- Dep. of Computer Architecture and Communications, Universidad de Extremadura, Mérida-Badajoz, Spain.
| | - Jordi A Matias-Guiu
- Dep. of Neurology, Hospital Clinico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, Madrid, Spain
| | - María Nieves Cabrera-Martín
- Dep. of Neurology, Hospital Clinico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, Madrid, Spain
| | - José L Risco-Martín
- Dep. of Computer Architecture and Automation, Universidad Complutense, Madrid, Spain
| | - José L Ayala
- Dep. of Computer Architecture and Automation, Universidad Complutense, Madrid, Spain
| |
Collapse
|
316
|
Shinners L, Aggar C, Grace S, Smith S. Exploring healthcare professionals' understanding and experiences of artificial intelligence technology use in the delivery of healthcare: An integrative review. Health Informatics J 2019; 26:1225-1236. [PMID: 31566454 DOI: 10.1177/1460458219874641] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND The integration of artificial intelligence (AI) into our digital healthcare system is seen as a significant strategy to contain Australia's rising healthcare costs, support clinical decision making, manage chronic disease burden and support our ageing population. With the increasing roll-out of 'digital hospitals', electronic medical records, new data capture and analysis technologies, as well as a digitally enabled health consumer, the Australian healthcare workforce is required to become digitally literate to manage the significant changes in the healthcare landscape. To ensure that new innovations such as AI are inclusive of clinicians, an understanding of how the technology will impact the healthcare professions is imperative. METHOD In order to explore the complex phenomenon of healthcare professionals' understanding and experiences of AI use in the delivery of healthcare, an integrative review inclusive of quantitative and qualitative studies was undertaken in June 2018. RESULTS One study met all inclusion criteria. This study was an observational study which used a questionnaire to measure healthcare professional's intrinsic motivation in adoption behaviour when using an artificially intelligent medical diagnosis support system (AIMDSS). DISCUSSION The study found that healthcare professionals were less likely to use AI in the delivery of healthcare if they did not trust the technology or understand how it was used to improve patient outcomes or the delivery of care which is specific to the healthcare setting. The perception that AI would replace them in the healthcare setting was not evident. This may be due to the fact that AI is not yet at the forefront of technology use in healthcare setting. More research is needed to examine the experiences and perceptions of healthcare professionals using AI in the delivery of healthcare.
Collapse
|
317
|
Early detection and risk assessment for chronic disease with irregular longitudinal data analysis. J Biomed Inform 2019; 96:103231. [DOI: 10.1016/j.jbi.2019.103231] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 05/09/2019] [Accepted: 06/11/2019] [Indexed: 12/22/2022]
|
318
|
Mezzatesta S, Torino C, Meo PD, Fiumara G, Vilasi A. A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:9-15. [PMID: 31319965 DOI: 10.1016/j.cmpb.2019.05.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 04/15/2019] [Accepted: 05/09/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Patients with End- Stage Kidney Disease (ESKD) have a unique cardiovascular risk. This study aims at predicting, with a certain precision, death and cardiovascular diseases in dialysis patients. METHODS To achieve our aim, machine learning techniques have been used. Two datasets have been taken into consideration: the first is an Italian dataset obtained from the Istituto di Fisiologia Clinica of Consiglio Nazionale delle Ricerche of Reggio Calabria; the second is an American dataset provided by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) repository. From each one we obtained 5 datasets, according to the outcome of interest. We tested different types of algorithm (both linear and non-linear), but the final choice was to use Support Vector Machine. In particular, we obtained the best performances using the non-linear SVC with RBF kernel algorithm, optimizing it with GridSearch. The last is an algorithm useful to search the best combination of hyper-parameters (in our case, to find the best couple (C, γ)), in order to improve the accuracy of the algorithm. RESULTS The use of non-linear SVC with RBF kernel algorithm, optimized with GridSearch, allowed to obtain an accuracy of 95.25% in the Italian dataset and of 92.15% in the American dataset, in a timeframe of 2.5 years,in the prediction of Ischaemic Heart Disease. A worse performance was obtained for the other outcomes. CONCLUSIONS The machine learning-based approach applied in our study is able to predict, with a high accuracy, the outbreak of cardiovascular diseases in patients on dialysis.
Collapse
Affiliation(s)
- Sabrina Mezzatesta
- Department of Mathematics and Computer Science, Physical Sciences and Earth Sciences, University of Messina, Messina, Italy
| | - Claudia Torino
- Institute of Clinical Physiology - Reggio Calabria Unit, Laboratory of Bioinformatics, National Research Council, Italy
| | - Pasquale De Meo
- Department of Ancient and Modern Civilizations, University of Messina, Messina, Italy
| | - Giacomo Fiumara
- Department of Mathematics and Computer Science, Physical Sciences and Earth Sciences, University of Messina, Messina, Italy
| | - Antonio Vilasi
- Institute of Clinical Physiology - Reggio Calabria Unit, Laboratory of Bioinformatics, National Research Council, Italy.
| |
Collapse
|
319
|
Serdar MA, Serteser M, Ucal Y, Karpuzoglu HF, Aksungar FB, Coskun A, Kilercik M, Ünsal İ, Özpınar A. An Assessment of HbA1c in Diabetes Mellitus and Pre-diabetes Diagnosis: a Multi-centered Data Mining Study. Appl Biochem Biotechnol 2019; 190:44-56. [DOI: 10.1007/s12010-019-03080-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 07/05/2019] [Indexed: 10/26/2022]
|
320
|
Alizadehsani R, Abdar M, Roshanzamir M, Khosravi A, Kebria PM, Khozeimeh F, Nahavandi S, Sarrafzadegan N, Acharya UR. Machine learning-based coronary artery disease diagnosis: A comprehensive review. Comput Biol Med 2019; 111:103346. [PMID: 31288140 DOI: 10.1016/j.compbiomed.2019.103346] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 06/26/2019] [Accepted: 06/26/2019] [Indexed: 02/02/2023]
Abstract
Coronary artery disease (CAD) is the most common cardiovascular disease (CVD) and often leads to a heart attack. It annually causes millions of deaths and billions of dollars in financial losses worldwide. Angiography, which is invasive and risky, is the standard procedure for diagnosing CAD. Alternatively, machine learning (ML) techniques have been widely used in the literature as fast, affordable, and noninvasive approaches for CAD detection. The results that have been published on ML-based CAD diagnosis differ substantially in terms of the analyzed datasets, sample sizes, features, location of data collection, performance metrics, and applied ML techniques. Due to these fundamental differences, achievements in the literature cannot be generalized. This paper conducts a comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis. The impacts of various factors, such as dataset characteristics (geographical location, sample size, features, and the stenosis of each coronary artery) and applied ML techniques (feature selection, performance metrics, and method) are investigated in detail. Finally, the important challenges and shortcomings of ML-based CAD diagnosis are discussed.
Collapse
Affiliation(s)
- Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia.
| | - Moloud Abdar
- Département d'informatique, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Mohamad Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Abbas Khosravi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Parham M Kebria
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Fahime Khozeimeh
- Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saeid Nahavandi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Nizal Sarrafzadegan
- Faculty of Medicine, SPPH, University of British Columbia, Vancouver, BC, Canada; Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Khorram Ave, Isfahan, Iran
| | - U Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore; Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Singapore; Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Malaysia
| |
Collapse
|
321
|
Abstract
Artificial intelligence/Machine learning (AI/ML) is transforming all spheres of our life, including the healthcare system. Application of AI/ML has a potential to vastly enhance the reach of diabetes care thereby making it more efficient. The huge burden of diabetes cases in India represents a unique set of problems, and provides us with a unique opportunity in terms of potential availability of data. Harnessing this data using electronic medical records, by all physicians, can put India at the forefront of research in this area. Application of AI/ML would provide insights to our problems as well as may help us to devise tailor-made solutions for the same.
Collapse
Affiliation(s)
- Rajiv Singla
- Department of Endocrinology, Kalpavriksh Healthcare, Dwarka, India
| | - Ankush Singla
- Department of Health Informatics, Kalpavriksh Healthcare, Dwarka, India
| | - Yashdeep Gupta
- Department of Endocrinology, All India Institute of Medical Sciences, Delhi, India
| | - Sanjay Kalra
- Department of Endocrinology, BRIDE, Karnal, Haryana, India
| |
Collapse
|
322
|
Woldaregay AZ, Årsand E, Walderhaug S, Albers D, Mamykina L, Botsis T, Hartvigsen G. Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artif Intell Med 2019; 98:109-134. [DOI: 10.1016/j.artmed.2019.07.007] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 08/22/2018] [Accepted: 07/19/2019] [Indexed: 10/26/2022]
|
323
|
Idowu PA, Balogiun JA. Fuzzy Logic-Based Predictive Model for the Risk of Type 2 Diabetes Mellitus. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2019. [DOI: 10.4018/ijehmc.2019070104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article presents a predictive model that can be used for the early detection of Type 2 Diabetes Mellitus using fuzzy logic. In order to formulate the model, risk factors associated with the risk of T2DM were elicited. The predictive model was formulated using fuzzy triangular membership functions following which the rules needed for the inference engine was elicited from experts. The model was simulated using the MATLAB Fuzzy logic Toolbox. The results of the study showed that the sensitivity of 11.67% and 100% precision for the low risk was recorded for both cases, specificity of 41.67% compared to 48.33% for the moderate risk, while there was 0% and 13.33% for the high risk. In conclusion, this model will help the doctor to know what course of preventive actions for a patient with high risk and what advice to give to those with low and moderate risk so that the occurrences of the diseases can be prevented altogether and thereby reducing the number of people dying from Type 2 Diabetes Mellitus diseases worldwide.
Collapse
|
324
|
Hathaway QA, Roth SM, Pinti MV, Sprando DC, Kunovac A, Durr AJ, Cook CC, Fink GK, Cheuvront TB, Grossman JH, Aljahli GA, Taylor AD, Giromini AP, Allen JL, Hollander JM. Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics. Cardiovasc Diabetol 2019; 18:78. [PMID: 31185988 PMCID: PMC6560734 DOI: 10.1186/s12933-019-0879-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 05/29/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Diabetes mellitus is a chronic disease that impacts an increasing percentage of people each year. Among its comorbidities, diabetics are two to four times more likely to develop cardiovascular diseases. While HbA1c remains the primary diagnostic for diabetics, its ability to predict long-term, health outcomes across diverse demographics, ethnic groups, and at a personalized level are limited. The purpose of this study was to provide a model for precision medicine through the implementation of machine-learning algorithms using multiple cardiac biomarkers as a means for predicting diabetes mellitus development. METHODS Right atrial appendages from 50 patients, 30 non-diabetic and 20 type 2 diabetic, were procured from the WVU Ruby Memorial Hospital. Machine-learning was applied to physiological, biochemical, and sequencing data for each patient. Supervised learning implementing SHapley Additive exPlanations (SHAP) allowed binary (no diabetes or type 2 diabetes) and multiple classification (no diabetes, prediabetes, and type 2 diabetes) of the patient cohort with and without the inclusion of HbA1c levels. Findings were validated through Logistic Regression (LR), Linear Discriminant Analysis (LDA), Gaussian Naïve Bayes (NB), Support Vector Machine (SVM), and Classification and Regression Tree (CART) models with tenfold cross validation. RESULTS Total nuclear methylation and hydroxymethylation were highly correlated to diabetic status, with nuclear methylation and mitochondrial electron transport chain (ETC) activities achieving superior testing accuracies in the predictive model (~ 84% testing, binary). Mitochondrial DNA SNPs found in the D-Loop region (SNP-73G, -16126C, and -16362C) were highly associated with diabetes mellitus. The CpG island of transcription factor A, mitochondrial (TFAM) revealed CpG24 (chr10:58385262, P = 0.003) and CpG29 (chr10:58385324, P = 0.001) as markers correlating with diabetic progression. When combining the most predictive factors from each set, total nuclear methylation and CpG24 methylation were the best diagnostic measures in both binary and multiple classification sets. CONCLUSIONS Using machine-learning, we were able to identify novel as well as the most relevant biomarkers associated with type 2 diabetes mellitus by integrating physiological, biochemical, and sequencing datasets. Ultimately, this approach may be used as a guideline for future investigations into disease pathogenesis and novel biomarker discovery.
Collapse
Affiliation(s)
- Quincy A Hathaway
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Skyler M Roth
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Mark V Pinti
- West Virginia University School of Pharmacy, Morgantown, WV, 26505, USA
| | - Daniel C Sprando
- West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Amina Kunovac
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Andrya J Durr
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Chris C Cook
- Cardiovascular and Thoracic Surgery, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Garrett K Fink
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
| | - Tristen B Cheuvront
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Jasmine H Grossman
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Ghadah A Aljahli
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - Andrew D Taylor
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Andrew P Giromini
- West Virginia University School of Medicine, Morgantown, WV, 26505, USA
| | - Jessica L Allen
- Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown, WV, 26505, USA
| | - John M Hollander
- Division of Exercise Physiology, West Virginia University School of Medicine, PO Box 9227, 1 Medical Center Drive, Morgantown, WV, 26505, USA.
- Mitochondria, Metabolism & Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, WV, 26505, USA.
| |
Collapse
|
325
|
A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction. ELECTRONICS 2019. [DOI: 10.3390/electronics8060635] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Early diagnosis of diabetes mellitus (DM) is critical to prevent its serious complications. An ensemble of classifiers is an effective way to enhance classification performance, which can be used to diagnose complex diseases, such as DM. This paper proposes an ensemble framework to diagnose DM by optimally employing multiple classifiers based on bagging and random subspace techniques. The proposed framework combines seven of the most suitable and heterogeneous data mining techniques, each with a separate set of suitable features. These techniques are k-nearest neighbors, naïve Bayes, decision tree, support vector machine, fuzzy decision tree, artificial neural network, and logistic regression. The framework is designed accurately by selecting, for every sub-dataset, the most suitable feature set and the most accurate classifier. It was evaluated using a real dataset collected from electronic health records of Mansura University Hospitals (Mansura, Egypt). The resulting framework achieved 90% of accuracy, 90.2% of recall = 90.2%, and 94.9% of precision. We evaluated and compared the proposed framework with many other classification algorithms. An analysis of the results indicated that the proposed ensemble framework significantly outperforms all other classifiers. It is a successful step towards constructing a personalized decision support system, which could help physicians in daily clinical practice.
Collapse
|
326
|
Recent Development on Detection Methods for the Diagnosis of Diabetic Retinopathy. Symmetry (Basel) 2019. [DOI: 10.3390/sym11060749] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Diabetic retinopathy (DR) is a complication of diabetes that exists throughout the world. DR occurs due to a high ratio of glucose in the blood, which causes alterations in the retinal microvasculature. Without preemptive symptoms of DR, it leads to complete vision loss. However, early screening through computer-assisted diagnosis (CAD) tools and proper treatment have the ability to control the prevalence of DR. Manual inspection of morphological changes in retinal anatomic parts are tedious and challenging tasks. Therefore, many CAD systems were developed in the past to assist ophthalmologists for observing inter- and intra-variations. In this paper, a recent review of state-of-the-art CAD systems for diagnosis of DR is presented. We describe all those CAD systems that have been developed by various computational intelligence and image processing techniques. The limitations and future trends of current CAD systems are also described in detail to help researchers. Moreover, potential CAD systems are also compared in terms of statistical parameters to quantitatively evaluate them. The comparison results indicate that there is still a need for accurate development of CAD systems to assist in the clinical diagnosis of diabetic retinopathy.
Collapse
|
327
|
A novel single-sensor-based method for the detection of gait-cycle breakdown and freezing of gait in Parkinson's disease. J Neural Transm (Vienna) 2019; 126:1029-1036. [PMID: 31154512 DOI: 10.1007/s00702-019-02020-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 05/22/2019] [Indexed: 12/14/2022]
Abstract
Objective measurement of walking speed and gait deficits are an important clinical tool in chronic illness management. We previously reported in Parkinson's disease that different types of gait tests can now be implemented and administered in the clinic or at home using Ambulosono smartphone-sensor technology, whereby movement sensing protocols can be standardized under voice instruction. However, a common challenge that remains for such wearable sensor systems is how meaningful data can be extracted from seemingly "noisy" raw sensor data, and do so with a high level of accuracy and efficiency. Here, we describe a novel pattern recognition algorithm for the automated detection of gait-cycle breakdown and freezing episodes. Ambulosono-gait-cycle-breakdown-and-freezing-detection (Free-D) integrates a nonlinear m-dimensional phase-space data extraction method with machine learning and Monte Carlo analysis for model building and pattern generalization. We first trained Free-D using a small number of data samples obtained from thirty participants during freezing of gait tests. We then tested the accuracy of Free-D via Monte Carlo cross-validation. We found Free-D to be remarkably effective at detecting gait-cycle breakdown, with mode error rates of 0% and mean error rates < 5%. We also demonstrate the utility of Free-D by applying it to continuous holdout traces not used for either training or testing, and found it was able to identify gait-cycle breakdown and freezing events of varying duration. These results suggest that advanced artificial intelligence and automation tools can be developed to enhance the quality, efficiency, and the expansion of wearable sensor data processing capabilities to meet market and industry demand.
Collapse
|
328
|
Gautam R, Kaur P, Sharma M. A comprehensive review on nature inspired computing algorithms for the diagnosis of chronic disorders in human beings. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/s13748-019-00191-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
329
|
Towards more Accessible Precision Medicine: Building a more Transferable Machine Learning Model to Support Prognostic Decisions for Micro- and Macrovascular Complications of Type 2 Diabetes Mellitus. J Med Syst 2019; 43:185. [PMID: 31098679 DOI: 10.1007/s10916-019-1321-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 05/01/2019] [Indexed: 01/22/2023]
Abstract
Although machine learning models are increasingly being developed for clinical decision support for patients with type 2 diabetes, the adoption of these models into clinical practice remains limited. Currently, machine learning (ML) models are being constructed on local healthcare systems and are validated internally with no expectation that they would validate externally and thus, are rarely transferrable to a different healthcare system. In this work, we aim to demonstrate that (1) even a complex ML model built on a national cohort can be transferred to two local healthcare systems, (2) while a model constructed on a local healthcare system's cohort is difficult to transfer; (3) we examine the impact of training cohort size on the transferability; and (4) we discuss criteria for external validity. We built a model using our previously published Multi-Task Learning-based methodology on a national cohort extracted from OptumLabs® Data Warehouse and transferred the model to two local healthcare systems (i.e., University of Minnesota Medical Center and Mayo Clinic) for external evaluation. The model remained valid when applied to the local patient populations and performed as well as locally constructed models (concordance: .73-.92), demonstrating transferability. The performance of the locally constructed models reduced substantially when applied to each other's healthcare system (concordance: .62-.90). We believe that our modeling approach, in which a model is learned from a national cohort and is externally validated, produces a transferable model, allowing patients at smaller healthcare systems to benefit from precision medicine.
Collapse
|
330
|
Rodrigues CHP, Bruni AT. In silico toxicity as a tool for harm reduction: A study of new psychoactive amphetamines and cathinones in the context of criminal science. Sci Justice 2019; 59:234-247. [PMID: 31054814 DOI: 10.1016/j.scijus.2018.11.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 11/08/2018] [Accepted: 11/18/2018] [Indexed: 11/15/2022]
Abstract
The emergence of new psychoactive substances (NPS) has raised many issues in the context of law enforcement and public drug policies. In this scenario, interdisciplinary studies are crucial to the decision-making process in the field of criminal science. Unfortunately, information about how NPS affect people's health is lacking even though knowledge about the toxic potential of these substances is essential: the more information about these drugs, the greater the possibility of avoiding damage within the scope of a harm reduction policy. Traditional analytical methods may be inaccessible in the field of forensic science because they are relatively expensive and time-consuming. In this sense, less costly and faster in silico methodologies can be useful strategies. In this work, we submitted computer-calculated toxicity values of various amphetamines and cathinones to an unsupervised multivariate analysis, namely Principal Component Analysis (PCA), and to the supervised techniques Soft Independent Modeling of Class Analogy and Partial Least Square-Discriminant Analysis (SIMCA and PLS-DA) to evaluate how these two NPS groups behave. We studied how theoretical and experimental values are correlated by PLS regression. Although experimental data was available for a small amount of molecules, correlation values reproduced literature values. The in silico method efficiently provided information about the drugs. On the basis of our findings, the technical information presented here can be used in decision-making regarding harm reduction policies and help to fulfill the objectives of criminal science.
Collapse
Affiliation(s)
- Caio Henrique Pinke Rodrigues
- Departamento de Química, Faculdade de Filosofia Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Brazil
| | - Aline Thaís Bruni
- Departamento de Química, Faculdade de Filosofia Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Brazil; Instituto Nacional de Ciência e Tecnologia Forense (INCT Forense), Ribeirão Preto, SP, Brazil.
| |
Collapse
|
331
|
Research on Classification of Tibetan Medical Syndrome in Chronic Atrophic Gastritis. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9081664] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Classification association rules that integrate association rules with classification are playing an important role in data mining. However, the time cost on constructing the classification model, and predicting new instances, will be long, due to the large number of rules generated during the mining of association rules, which also will result in the large system consumption. Therefore, this paper proposed a classification model based on atomic classification association rules, and applied it to construct the classification model of a Tibetan medical syndrome for the common plateau disease called Chronic Atrophic Gastritis. Firstly, introduce the idea of “relative support”, and use the constraint-based Apriori algorithm to mine the strong atomic classification association rules between symptoms and syndrome, and the knowledge base of Tibetan medical clinics will be constructed. Secondly, build the classification model of the Tibetan medical syndrome after pruning and prioritizing rules, and the idea of “partial classification” and “first easy to post difficult” strategy are introduced to realize the prediction of this Tibetan medical syndrome. Finally, validate the effectiveness of the classification model, and compare with the CBA algorithm and four traditional classification algorithms. The experimental results showed that the proposed method can realize the construction and classification of the classification model of the Tibetan medical syndrome in a shorter time, with fewer but more understandable rules, while ensuring a higher accuracy with 92.8%.
Collapse
|
332
|
Triantafyllidis AK, Tsanas A. Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature. J Med Internet Res 2019; 21:e12286. [PMID: 30950797 PMCID: PMC6473205 DOI: 10.2196/12286] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 01/07/2019] [Accepted: 01/26/2019] [Indexed: 12/21/2022] Open
Abstract
Background Machine learning has attracted considerable research interest toward developing smart digital health interventions. These interventions have the potential to revolutionize health care and lead to substantial outcomes for patients and medical professionals. Objective Our objective was to review the literature on applications of machine learning in real-life digital health interventions, aiming to improve the understanding of researchers, clinicians, engineers, and policy makers in developing robust and impactful data-driven interventions in the health care domain. Methods We searched the PubMed and Scopus bibliographic databases with terms related to machine learning, to identify real-life studies of digital health interventions incorporating machine learning algorithms. We grouped those interventions according to their target (ie, target condition), study design, number of enrolled participants, follow-up duration, primary outcome and whether this had been statistically significant, machine learning algorithms used in the intervention, and outcome of the algorithms (eg, prediction). Results Our literature search identified 8 interventions incorporating machine learning in a real-life research setting, of which 3 (37%) were evaluated in a randomized controlled trial and 5 (63%) in a pilot or experimental single-group study. The interventions targeted depression prediction and management, speech recognition for people with speech disabilities, self-efficacy for weight loss, detection of changes in biopsychosocial condition of patients with multiple morbidity, stress management, treatment of phantom limb pain, smoking cessation, and personalized nutrition based on glycemic response. The average number of enrolled participants in the studies was 71 (range 8-214), and the average follow-up study duration was 69 days (range 3-180). Of the 8 interventions, 6 (75%) showed statistical significance (at the P=.05 level) in health outcomes. Conclusions This review found that digital health interventions incorporating machine learning algorithms in real-life studies can be useful and effective. Given the low number of studies identified in this review and that they did not follow a rigorous machine learning evaluation methodology, we urge the research community to conduct further studies in intervention settings following evaluation principles and demonstrating the potential of machine learning in clinical practice.
Collapse
Affiliation(s)
- Andreas K Triantafyllidis
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Lab of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Athanasios Tsanas
- Usher Institute of Population Health Sciences and Informatics, Medical School, University of Edinburgh, Edinburgh, United Kingdom.,Mathematical Institute, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
333
|
Adadi A, Adadi S, Berrada M. Gastroenterology Meets Machine Learning: Status Quo and Quo Vadis. Adv Bioinformatics 2019; 2019:1870975. [PMID: 31065266 PMCID: PMC6466966 DOI: 10.1155/2019/1870975] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 02/24/2019] [Indexed: 12/16/2022] Open
Abstract
Machine learning has undergone a transition phase from being a pure statistical tool to being one of the main drivers of modern medicine. In gastroenterology, this technology is motivating a growing number of studies that rely on these innovative methods to deal with critical issues related to this practice. Hence, in the light of the burgeoning research on the use of machine learning in gastroenterology, a systematic review of the literature is timely. In this work, we present the results gleaned through a systematic review of prominent gastroenterology literature using machine learning techniques. Based on the analysis of 88 journal articles, we delimit the scope of application, we discuss current limitations including bias, lack of transparency, accountability, and data availability, and we put forward future avenues.
Collapse
Affiliation(s)
- Amina Adadi
- Computer and Interdisciplinary Physics Laboratory, Sidi Mohamed Ben Abdellah University, Fez 30050, Morocco
| | - Safae Adadi
- Service of Hepatology and Gastroenterology, Hassan II University Hospital of Fez, Sidi Mohamed Ben Abdellah University, Fez, Morocco
| | - Mohammed Berrada
- Computer and Interdisciplinary Physics Laboratory, Sidi Mohamed Ben Abdellah University, Fez 30050, Morocco
| |
Collapse
|
334
|
Jiménez-Carvelo AM, González-Casado A, Bagur-González MG, Cuadros-Rodríguez L. Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity - A review. Food Res Int 2019; 122:25-39. [PMID: 31229078 DOI: 10.1016/j.foodres.2019.03.063] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/31/2022]
Abstract
In recent years, the variety and volume of data acquired by modern analytical instruments in order to conduct a better authentication of food has dramatically increased. Several pattern recognition tools have been developed to deal with the large volume and complexity of available trial data. The most widely used methods are principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), soft independent modelling by class analogy (SIMCA), k-nearest neighbours (kNN), parallel factor analysis (PARAFAC), and multivariate curve resolution-alternating least squares (MCR-ALS). Nevertheless, there are alternative data treatment methods, such as support vector machine (SVM), classification and regression tree (CART) and random forest (RF), that show a great potential and more advantages compared to conventional ones. In this paper, we explain the background of these methods and review and discuss the reported studies in which these three methods have been applied in the area of food quality and authenticity. In addition, we clarify the technical terminology used in this particular area of research.
Collapse
Affiliation(s)
- Ana M Jiménez-Carvelo
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain.
| | - Antonio González-Casado
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| | - M Gracia Bagur-González
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| | - Luis Cuadros-Rodríguez
- Department of Analytical Chemistry, Faculty of Science, University of Granada, C/ Fuentenueva s/n, E-18071 Granada, Spain
| |
Collapse
|
335
|
Singh N, Singh P. A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-169937] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Namrata Singh
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, Chhattisgarh, India
| | - Pradeep Singh
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, Chhattisgarh, India
| |
Collapse
|
336
|
Sohail A, Younas M, Bhatti Y, Li Z, Tunç S, Abid M. Analysis of Trabecular Bone Mechanics Using Machine Learning. Evol Bioinform Online 2019; 15:1176934318825084. [PMID: 30936677 PMCID: PMC6434438 DOI: 10.1177/1176934318825084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 12/17/2018] [Indexed: 12/20/2022] Open
Abstract
"Bone remodeling" is a dynamic process, and mutliphase analysis incorporated with the forecasting algorithm can help the biologists and orthopedics to interpret the laboratory generated results and to apply them in improving applications in the fields of "drug design, treatment, and therapy" of diseased bones. The metastasized bone microenvironment has always remained a challenging puzzle for the researchers. A multiphase computational model is interfaced with the artificial intelligence algorithm in a hybrid manner during this research. Trabecular surface remodeling is presented in this article, with the aid of video graphic footage, and the associated parametric thresholds are derived from artificial intelligence and clinical data.
Collapse
Affiliation(s)
- Ayesha Sohail
- Department of Mathematics, Comsats University Islamabad, Lahore, Pakistan
| | - Muhammad Younas
- Department of Mathematics, Comsats University Islamabad, Lahore, Pakistan
| | - Yousaf Bhatti
- Department of Mathematics, Comsats University Islamabad, Lahore, Pakistan
| | - Zhiwu Li
- Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau.,School of Electro-Mechanical Engineering, Xidian University, Xi'an, China
| | - Sümeyye Tunç
- Physiotherapy, IMU Vocational School, Istanbul Medipol University, Fatih, Istanbul, Turkey
| | - Muhammad Abid
- Interdisciplinary Research Centre, COMSATS University Islamabad, Wah Cantonment, Pakistan
| |
Collapse
|
337
|
Anuradha, Singh A, Gupta G. ANT_FDCSM: A novel fuzzy rule miner derived from ant colony meta-heuristic for diagnosis of diabetic patients. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-172240] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Anuradha
- Departments of CSE and IT, The NorthCap University, Gurgaon, India
| | - Akansha Singh
- School of Computing Science and Engineering, Galgotias University, Greater Noida, India
| | | |
Collapse
|
338
|
Pyenson B, Alston M, Gomberg J, Han F, Khandelwal N, Dei M, Son M, Vora J. Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency. JOURNAL OF HEALTH ECONOMICS AND OUTCOMES RESEARCH 2019; 6:32-46. [PMID: 32685578 PMCID: PMC7299452 DOI: 10.36469/9727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
BACKGROUND Exocrine pancreatic insufficiency (EPI) is a serious condition characterized by a lack of functional exocrine pancreatic enzymes and the resultant inability to properly digest nutrients. EPI can be caused by a variety of disorders, including chronic pancreatitis, pancreatic cancer, and celiac disease. EPI remains underdiagnosed because of the nonspecific nature of clinical symptoms, lack of an ideal diagnostic test, and the inability to easily identify affected patients using administrative claims data. OBJECTIVES To develop a machine learning model that identifies patients in a commercial medical claims database who likely have EPI but are undiagnosed. METHODS A machine learning algorithm was developed in Scikit-learn, a Python module. The study population, selected from the 2014 Truven MarketScan® Commercial Claims Database, consisted of patients with EPI-prone conditions. Patients were labeled with 290 condition category flags and split into actual positive EPI cases, actual negative EPI cases, and unlabeled cases. The study population was then randomly divided into a training subset and a testing subset. The training subset was used to determine the performance metrics of 27 models and to select the highest performing model, and the testing subset was used to evaluate performance of the best machine learning model. RESULTS The study population consisted of 2088 actual positive EPI cases, 1077 actual negative EPI cases, and 437 530 unlabeled cases. In the best performing model, the precision, recall, and accuracy were 0.91, 0.80, and 0.86, respectively. The best-performing model estimated that the number of patients likely to have EPI was about 12 times the number of patients directly identified as EPI-positive through a claims analysis in the study population. The most important features in assigning EPI probability were the presence or absence of diagnosis codes related to pancreatic and digestive conditions. CONCLUSIONS Machine learning techniques demonstrated high predictive power in identifying patients with EPI and could facilitate an enhanced understanding of its etiology and help to identify patients for possible diagnosis and treatment.
Collapse
Affiliation(s)
| | | | | | - Feng Han
- Milliman, New York, NY, during study
| | | | | | | | | |
Collapse
|
339
|
Wang Y, Wang Z, Zhang H. Identification of diagnostic biomarker in patients with gestational diabetes mellitus based on transcriptome-wide gene expression and pattern recognition. J Cell Biochem 2019; 120:1503-1510. [PMID: 30168213 DOI: 10.1002/jcb.27279] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 06/28/2018] [Indexed: 01/24/2023]
Abstract
Gestational diabetes mellitus (GDM) is becoming a growing threat for all pregnancies. In this study, we set up an automatic screening method combining both transcriptomic databases and support vector machine (SVM)-based pattern recognition to select biomarkers that can be used in predicting and preventing GDM for gravidas. We screened 63 samples (32 GDM samples and 31 normal controls) in GEO database for the GDM-specific biomarkers. Differentially expressed genes between patients with GDM and normal controls were picked out using edgeR package. Enrichment analysis was performed using database for annotation, visualization, and integrated discovery. The regulatory gene network was constructed based on the KEGG pathway database. Genes in the hub of the network were selected as specific biomarkers of GDM and further validated through document investigation. Finally, the GDM prediction model was verified using the SVMs. In total, 189 probes corresponding to 69 genes that differentially expressed between GDM and controls were screened out by edgeR package. Nineteen pathways were clustered by KEGG enrichment analysis and were integrated into a regulatory network containing 572 nodes and 1874 edges. The intersection of 50 hub genes extracted from the network and 69 differential genes picked out by edgeR was a collection of six genes, including members of HLA superfamily. In the SVM model, the six genes had a good capacity of predicting GDM in both the training data set (area under curve [AUC] is 0.781) and the testing data set (AUC is 0.710) and had been reported to be associated with GDM. We found that the collection of six genes can be potentially applied as a biomarker for GDM diagnosis.
Collapse
Affiliation(s)
- Yeping Wang
- Department of Obstetrics and Gynecology, Wenzhou People's Hospital, Wenzhou Maternal and Child Health Care Hospital, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Zuo Wang
- Department of Obstetrics and Gynecology, Wenzhou People's Hospital, Wenzhou Maternal and Child Health Care Hospital, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Hongping Zhang
- Department of Obstetrics and Gynecology, Wenzhou People's Hospital, Wenzhou Maternal and Child Health Care Hospital, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, Zhejiang, China
| |
Collapse
|
340
|
Marozas M, Sosunkevič S, Francaitė-Daugėlienė M, Veličkienė D, Lukoševičius A. Algorithm for diabetes risk evaluation from past gestational diabetes data. Technol Health Care 2019; 26:637-648. [PMID: 30040772 DOI: 10.3233/thc-181325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gestational diabetes mellitus (GDM) is defined as glucose intolerance that is diagnosed in pregnancy period, leading to possible complications for both mother and fetus during pregnancy. The aim of this study was to build an objective method to evaluate diabetes mellitus (DM) risk from past GDM data recorded 15 years ago and find a short list of most informative indicators. The dataset consists of demographic, lifestyle, clinical, genetic and pregnancy related information recorded 15 years ago. Due to the large time gap data are limited and have missing values (MVs). Follow-up tests were performed to see if DM or impaired metabolism has developed after pregnancy with previously diagnosed GDM. The research steps involve pre-processing data to evaluate MVs, finding most informative attributes and testing standard classification algorithms to combine in to most effective voting meta-algorithm. Initially the attributes and records with large number of MVs were rejected. A small percentage (2.04%) was imputed using regression based methods. The data set was prepared for two scenarios: classification in two classes (1-healthy; 2-impaired metabolism including DM) and three classes (1-healthy; 2-impaired metabolism; 3-DM). Voting meta-algorithm combining best algorithms of 21 from five different groups including Bayesian, regression, lazy, rule, and decision trees makes classification more objective and not depending on preferences. Relative frequency of occurrence (RFO) analysis of attributes combined with voting meta-algorithm helped finding optimal amount of attributes giving best possible classification result. The algorithm applied to two class data set with 12 selected attributes produced accuracy of 75.85 and AUC = 0.82 with standard error of 0.11. Similarly for three class dataset the 9 attributes were selected allowing to reach classification accuracy 63.77 and AUC = 0.76 with standard error of 0.1. Meta-algorithm based classification of limited anamnestic GDM related data for DM prediction is proving to be effective. Testing multiple algorithms and performing RFO analysis appears to be natural and objective way of selecting most informative attributes and evaluating their importance.
Collapse
Affiliation(s)
- Mindaugas Marozas
- Biomedical Engineering Institute, Kaunas University of Technology, Kaunas, Lithuania
| | - Sergej Sosunkevič
- Biomedical Engineering Institute, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Džilda Veličkienė
- Institute of Endocrinology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Arunas Lukoševičius
- Biomedical Engineering Institute, Kaunas University of Technology, Kaunas, Lithuania
| |
Collapse
|
341
|
Moonian O, Jodheea-Jutton A, Khedo KK, Baichoo S, Nagowah SD, Nagowah L, Mungloo-Dilmohamud Z, Cheerkoot-Jalim S. Recent advances in computational tools and resources for the self-management of type 2 diabetes. Inform Health Soc Care 2019; 45:77-95. [PMID: 30653364 DOI: 10.1080/17538157.2018.1559168] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Background: While healthcare systems are investing resources on type 2 diabetes patients, self-management is becoming the new trend for these patients. Due to the pervasiveness of computing devices, a number of computerized systems are emerging to support the self-management of patients.Objective: The primary objective of this review is to identify and categorize the computational tools that exist for the self-management of type 2 diabetes, and to identify challenges that need to be addressed.Results: The tools have been categorized into web applications, mobile applications, games and ubiquitous diabetes management systems. We provide a detailed description of the salient features of each category along with a comparison of the various tools, listing their challenges and practical implications. A list of platforms that can be used to develop new tools for the self-management of type 2 diabetes, namely mobile applications development, sensor development, cloud computing, social media, and machine learning and predictive analysis platforms, are also provided.Discussions: This paper identifies a number of challenges in the existing categories of computational tools and consequently presents possible avenues for future research. Failure to address these issues will negatively impact on the adoption rate of the self-management tools and applications.
Collapse
Affiliation(s)
- Oveeyen Moonian
- Department of Digital Technologies, FoICDT, University of Mauritius
| | | | - Kavi Kumar Khedo
- Department of Digital Technologies, FoICDT, University of Mauritius
| | | | | | - Leckraj Nagowah
- Department of Software and Information Systems, FoICDT, University of Mauritius
| | | | - Sudha Cheerkoot-Jalim
- Department of Information and Communication Technologies, FoICDT, University of Mauritius
| |
Collapse
|
342
|
Nirala N, Periyasamy R, Singh BK, Kumar A. Detection of type-2 diabetes using characteristics of toe photoplethysmogram by applying support vector machine. Biocybern Biomed Eng 2019. [DOI: 10.1016/j.bbe.2018.09.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
343
|
Bennett CC. REMOVED: Artificial intelligence for diabetes case management: The intersection of physical and mental health. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
344
|
Fiarni C, Sipayung EM, Maemunah S. Analysis and Prediction of Diabetes Complication Disease using Data Mining Algorithm. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.procs.2019.11.144] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
345
|
|
346
|
An S, Malhotra K, Dilley C, Han-Burgess E, Valdez JN, Robertson J, Clark C, Westover MB, Sun J. Predicting drug-resistant epilepsy - A machine learning approach based on administrative claims data. Epilepsy Behav 2018; 89:118-125. [PMID: 30412924 PMCID: PMC6461470 DOI: 10.1016/j.yebeh.2018.10.013] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/04/2018] [Accepted: 10/08/2018] [Indexed: 11/28/2022]
Abstract
Patients with drug-resistant epilepsy (DRE) are at high risk of morbidity and mortality, yet their referral to specialist care is frequently delayed. The ability to identify patients at high risk of DRE at the time of treatment initiation, and to subsequently steer their treatment pathway toward more personalized interventions, has high clinical utility. Here, we aim to demonstrate the feasibility of developing algorithms for predicting DRE using machine learning methods. Longitudinal, intersected data sourced from US pharmacy, medical, and adjudicated hospital claims from 1,376,756 patients from 2006 to 2015 were analyzed; 292,892 met inclusion criteria for epilepsy, and 38,382 were classified as having DRE using a proxy measure for drug resistance. Patients were characterized using 1270 features reflecting demographics, comorbidities, medications, procedures, epilepsy status, and payer status. Data from 175,735 randomly selected patients were used to train three algorithms and from the remainder to assess the trained models' predictive power. A model with only age and sex was used as a benchmark. The best model, random forest, achieved an area under the receiver operating characteristic curve (95% confidence interval [CI]) of 0.764 (0.759, 0.770), compared with 0.657 (0.651, 0.663) for the benchmark model. Moreover, predicted probabilities for DRE were well-calibrated with the observed frequencies in the data. The model predicted drug resistance approximately 2 years before patients in the test dataset had failed two antiepileptic drugs (AEDs). Machine learning models constructed using claims data predicted which patients are likely to fail ≥3 AEDs and are at risk of developing DRE at the time of the first AED prescription. The use of such models can ensure that patients with predicted DRE receive specialist care with potentially more aggressive therapeutic interventions from diagnosis, to help reduce the serious sequelae of DRE.
Collapse
Affiliation(s)
- Sungtae An
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | - Kunal Malhotra
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | | | | | - Jeffrey N Valdez
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA
| | | | | | - M Brandon Westover
- Massachusetts General Hospital, Department of Neurology, Boston, MA, USA
| | - Jimeng Sun
- Georgia Institute of Technology, College of Computing, Atlanta, GA, USA.
| |
Collapse
|
347
|
Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T. Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study. JMIR Diabetes 2018; 3:e10212. [PMID: 30478026 PMCID: PMC6288596 DOI: 10.2196/10212] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 08/16/2018] [Accepted: 10/17/2018] [Indexed: 01/10/2023] Open
Abstract
Background A 75-g oral glucose tolerance test (OGTT) provides important information about glucose metabolism, although the test is expensive and invasive. Complete OGTT information, such as 1-hour and 2-hour postloading plasma glucose and immunoreactive insulin levels, may be useful for predicting the future risk of diabetes or glucose metabolism disorders (GMD), which includes both diabetes and prediabetes. Objective We trained several classification models for predicting the risk of developing diabetes or GMD using data from thousands of OGTTs and a machine learning technique (XGBoost). The receiver operating characteristic (ROC) curves and their area under the curve (AUC) values for the trained classification models are reported, along with the sensitivity and specificity determined by the cutoff values of the Youden index. We compared the performance of the machine learning techniques with logistic regressions (LR), which are traditionally used in medical research studies. Methods Data were collected from subjects who underwent multiple OGTTs during comprehensive check-up medical examinations conducted at a single facility in Tokyo, Japan, from May 2006 to April 2017. For each examination, a subject was diagnosed with diabetes or prediabetes according to the American Diabetes Association guidelines. Given the data, 2 studies were conducted: predicting the risk of developing diabetes (study 1) or GMD (study 2). For each study, to apply supervised machine learning methods, the required label data was prepared. If a subject was diagnosed with diabetes or GMD at least once during the period, then that subject’s data obtained in previous trials were classified into the risk group (y=1). After data processing, 13,581 and 6760 OGTTs were analyzed for study 1 and study 2, respectively. For each study, a randomly chosen subset representing 80% of the data was used for training 9 classification models and the remaining 20% was used for evaluating the models. Three classification models, A to C, used XGBoost with various input variables, some including OGTT data. The other 6 classification models, D to I, used LR for comparison. Results For study 1, the AUC values ranged from 0.78 to 0.93. For study 2, the AUC values ranged from 0.63 to 0.78. The machine learning approach using XGBoost showed better performance compared with traditional LR methods. The AUC values increased when the full OGTT variables were included. In our analysis using a particular setting of input variables, XGBoost showed that the OGTT variables were more important than fasting plasma glucose or glycated hemoglobin. Conclusions A machine learning approach, XGBoost, showed better prediction accuracy compared with LR, suggesting that advanced machine learning methods are useful for detecting the early signs of diabetes or GMD. The prediction accuracy increased when all OGTT variables were added. This indicates that complete OGTT information is important for predicting the future risk of diabetes and GMD accurately.
Collapse
Affiliation(s)
- Katsutoshi Maeta
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Yu Nishiyama
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Kazutoshi Fujibayashi
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| | - Toshiaki Gunji
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Noriko Sasabe
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Kimiko Iijima
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Toshio Naito
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| |
Collapse
|
348
|
Valdés MG, Galván-Femenía I, Ripoll VR, Duran X, Yokota J, Gavaldà R, Rafael-Palou X, de Cid R. Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data. BMC SYSTEMS BIOLOGY 2018; 12:97. [PMID: 30458782 PMCID: PMC6245589 DOI: 10.1186/s12918-018-0615-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
BACKGROUND During the last decade, the interest to apply machine learning algorithms to genomic data has increased in many bioinformatics applications. Analyzing this type of data entails difficulties for managing high-dimensional data, class imbalance for knowledge extraction, identifying important features and classifying individuals. In this study, we propose a general framework to tackle these challenges with different machine learning algorithms and techniques. We apply the configuration of this framework on lung cancer patients, identifying genetic signatures for classifying response to drug treatment response. We intersect these relevant SNPs with the GWAS Catalog of the National Human Genome Research Institute and explore the Regulomedb, GTEx databases for functional analysis purposes. RESULTS The machine learning based solution proposed in this study is a scalable and flexible alternative to the classical uni-variate regression approach to analyze large-scale data. From 36 experiments executed using the machine learning framework design, we obtain good classification performance from the top 5 models with the highest cross-validation score and the smallest standard deviation. One thousand two hundred twenty four SNPs corresponding to the key features from the top 20 models (cross validation F1 mean >= 0.65) were compared with the GWAS Catalog finding no intersection with genome-wide significant reported hits. From these, new genetic signatures in MAE, CEP104, PRKCZ and ADRB2 show relevant biological regulatory functionality related to lung physiology. CONCLUSIONS We have defined a machine learning framework using data with an unbalanced large data-set of SNP-arrays and imputed genotyping data from a pharmacogenomics study in lung cancer patients subjected to first-line platinum-based treatment. This approach found genome signals with no genome-wide significance in the uni-variate regression approach (GWAS Catalog) that are valuable for classifying patients, only few of them with related biological function. The effect results of these variants can be explained by the recently proposed omnigenic model hypothesis, which states that complex traits can be influenced mostly by genes outside not only by the "core genes", mainly found by the genome-wide significant SNPs, but also by the rest of genes outside of the "core pathways" with apparent unrelated biological functionality.
Collapse
Affiliation(s)
- María Gabriela Valdés
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Iván Galván-Femenía
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| | - Vicent Ribas Ripoll
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Xavier Duran
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| | - Jun Yokota
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). CancerGenome Biology, Badalona, Spain
| | - Ricard Gavaldà
- Universitat Politècnica de Catalunya, Barcelona, Spain
- Barcelona Graduate School of Mathematics, BGSMath, Barcelona, Spain
| | - Xavier Rafael-Palou
- Eurecat. Technology Centre of Catalonia, Av. Diagonal 177, 9th floor, Barcelona, 08018 Spain
| | - Rafael de Cid
- PMPPC-IGTP. Programa de Medicina Predictiva i Personalitzada del Càncer - Institut Germans Trias i Pujol (IGTP). Genomes for Life - GCAT lab Group, Badalona, Spain
| |
Collapse
|
349
|
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet 2018; 9:515. [PMID: 30459809 PMCID: PMC6232260 DOI: 10.3389/fgene.2018.00515] [Citation(s) in RCA: 212] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 10/12/2018] [Indexed: 12/30/2022] Open
Abstract
Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause many complications. According to the growing morbidity in recent years, in 2040, the world’s diabetic patients will reach 642 million, which means that one of the ten adults in the future is suffering from diabetes. There is no doubt that this alarming figure needs great attention. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used decision tree, random forest and neural network to predict diabetes mellitus. The dataset is the hospital physical examination data in Luzhou, China. It contains 14 attributes. In this study, five-fold cross validation was used to examine the models. In order to verity the universal applicability of the methods, we chose some methods that have the better performance to conduct independent test experiments. We randomly selected 68994 healthy people and diabetic patients’ data, respectively as training set. Due to the data unbalance, we randomly extracted 5 times data. And the result is the average of these five experiments. In this study, we used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce the dimensionality. The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Kaiyang Qu
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Yamei Luo
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Dehui Yin
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Hua Tang
- Department of Pathophysiology, School of Basic Medicine, Southwest Medical University, Luzhou, China
| |
Collapse
|
350
|
Murphree DH, Arabmakki E, Ngufor C, Storlie CB, McCoy RG. Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes. Comput Biol Med 2018; 103:109-115. [PMID: 30347342 DOI: 10.1016/j.compbiomed.2018.10.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 01/11/2023]
Abstract
OBJECTIVE Metformin is the preferred first-line medication for management of type 2 diabetes and prediabetes. However, over a third of patients experience primary or secondary therapeutic failure. We developed machine learning models to predict which patients initially prescribed metformin will achieve and maintain control of their blood glucose after one year of therapy. MATERIALS AND METHODS We performed a retrospective analysis of administrative claims data for 12,147 commercially-insured adults and Medicare Advantage beneficiaries with prediabetes or diabetes. Several machine learning models were trained using variables available at the time of metformin initiation to predict achievement and maintenance of hemoglobin A1c (HbA1c) < 7.0% after one year of therapy. RESULTS AUC performances based on five-fold cross-validation ranged from 0.58 to 0.75. The most influential variables driving the predictions were baseline HbA1c, starting metformin dosage, and presence of diabetes with complications. CONCLUSIONS Machine learning models can effectively predict primary or secondary metformin treatment failure within one year. This information can help identify effective individualized treatment strategies. Most of the implemented models outperformed traditional logistic regression, highlighting the potential for applying machine learning to problems in medicine.
Collapse
Affiliation(s)
- Dennis H Murphree
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Elaheh Arabmakki
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Che Ngufor
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Curtis B Storlie
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Rozalina G McCoy
- Division of Community Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, 55905, USA; Division of Health Care Policy & Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA; Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|