1
|
Choong C, Brnabic A, Chinthammit C, Ravuri M, Terrell K, Kan H. Applying machine learning approaches for predicting obesity risk using US health administrative claims database. BMJ Open Diabetes Res Care 2024; 12:e004193. [PMID: 39327067 DOI: 10.1136/bmjdrc-2024-004193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
INTRODUCTION Body mass index (BMI) is inadequately recorded in US administrative claims databases. We aimed to validate the sensitivity and positive predictive value (PPV) of BMI-related diagnosis codes using an electronic medical records (EMR) claims-linked database. Additionally, we applied machine learning (ML) to identify features in US claims databases to predict obesity status. RESEARCH DESIGN AND METHODS This observational, retrospective analysis included 692 119 people ≥18 years of age, with ≥1 BMI reading in MarketScan Explorys Claims-EMR data (January 2013-December 2019). Claims-based obesity status was compared with EMR-based BMI (gold standard) to assess BMI-related diagnosis code sensitivity and PPV. Logistic regression (LR), penalized LR with L1 penalty (Least Absolute Shrinkage and Selection Operator), extreme gradient boosting (XGBoost) and random forest, with features drawn from insurance claims, were trained to predict obesity status (BMI≥30 kg/m2) from EMR as the gold standard. Model performance was compared using several metrics, including the area under the receiver operating characteristic curve. The best-performing model was applied to assess feature importance. Obesity risk scores were computed from the best model generated from the claims database and compared against the BMI recorded in the EMR. RESULTS The PPV of diagnosis codes from claims alone remained high over the study period (85.4-89.2%); sensitivity was low (16.8-44.8%). XGBoost performed the best at predicting obesity with the highest area under the curve (AUC; 79.4%) and the lowest Brier score. The number of obesity diagnoses and obesity diagnoses from inpatient settings were the most important predictors of obesity. XGBoost showed an AUC of 74.1% when trained without an obesity diagnosis. CONCLUSIONS Obesity prevalence is under-reported in claims databases. ML models, with or without explicit obesity, show promise in improving obesity prediction accuracy compared with obesity codes alone. Improved obesity status prediction may assist practitioners and payors to estimate the burden of obesity and investigate the potential unmet needs of current treatments.
Collapse
Affiliation(s)
- Casey Choong
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | - Alan Brnabic
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | | | - Meena Ravuri
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | | | - Hong Kan
- Eli Lilly and Company, Indianapolis, Indiana, USA
| |
Collapse
|
2
|
Mahfouz M, Mahfouz Y, Harmouche-Karaki M, Matta J, Younes H, Helou K, Finan R, Abi-Tayeh G, Meslimani M, Moussa G, Chahrour N, Osseiran C, Skaiki F, Narbonne JF. Utilizing machine learning to classify persistent organic pollutants in the serum of pregnant women: a predictive modeling approach. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:52980-52995. [PMID: 39168932 DOI: 10.1007/s11356-024-34684-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 08/07/2024] [Indexed: 08/23/2024]
Abstract
Polychlorinated biphenyls (PCBs), organochlorine pesticides (OCPs), polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs), and per- and poly-fluoroalkyl substances (PFAS) are persistent organic pollutants (POPs) that remain detrimental to critical subpopulations, namely pregnant women. Required tests for biomonitoring are quite expensive. Moreover, statistical models aiming to discover the relationships between pollutants levels and human characteristics have their limitations. Therefore, the objective of this study is to use machine learning predictive models to further examine the pollutants' predictors, while comparing them. Levels of 33 congeners were measured in the serum of 269 pregnant women, from whom data was collected regarding sociodemographic, dietary, environmental, and anthropometric characteristics. Several machine learning algorithms were compared using "Python" for each pollutant: support vector machine (SVM), random forest, XGBoost, and neural networks. The aforementioned characteristics were included in the model as features. Prediction, accuracy, precision, recall, F1-score, area under the ROC curve (AUC), sensitivity, and specificity were retrieved to compare the models between them and among pollutants. The highest performing model for all pollutants was Random Forest. Results showed a moderate to acceptable performance and discriminative power among all POPs, with OCPs' model performing slightly better than all other models. Top related features for each model were also presented using SHAP analysis, detailing the predictors' negative or positive impact on the model. In conclusion, developing such a tool is of major importance in a context of limited financial and research resources. Nevertheless, machine learning models should always be interpreted with caution by exploring all evaluation metrics.
Collapse
Affiliation(s)
- Maya Mahfouz
- Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon.
| | - Yara Mahfouz
- Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
| | - Mireille Harmouche-Karaki
- Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
| | - Joseph Matta
- Industrial Research Institute, Lebanese University Campus, Baabda, Hadath, Lebanon, P.O. Box 112806
| | - Hassan Younes
- Institut Polytechnique UniLaSalle, Collège Santé, Equipe PANASH, Membre de l'ULR 7519, Université d'Artois, 19 Rue Pierre Waguet, 60026, Beauvais, France
| | - Khalil Helou
- Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
| | - Ramzi Finan
- Hotel-Dieu de France, Saint Joseph University of Beirut Hospital, Blvd Alfred Naccache, Beirut, Lebanon, P.O. Box 166830
| | - Georges Abi-Tayeh
- Hotel-Dieu de France, Saint Joseph University of Beirut Hospital, Blvd Alfred Naccache, Beirut, Lebanon, P.O. Box 166830
| | | | - Ghada Moussa
- Department of Obstetrics and Gynecology, Chtoura Hospital, Beqaa, Lebanon
| | - Nada Chahrour
- Department of Obstetrics and Gynecology, SRH University Hospital, Nabatieh, Lebanon
| | - Camille Osseiran
- Department of Obstetrics and Gynecology, Kassab Hospital, Saida, Lebanon
| | - Farouk Skaiki
- Department of Molecular Biology, General Management, Al Karim Medical Laboratories, Saida, Lebanon
| | - Jean-François Narbonne
- Laboratoire de Physico-Toxico Chimie Des Systèmes Naturels, University of Bordeaux, 33405, Talence, CEDEX, France
| |
Collapse
|
3
|
Oss Boll H, Amirahmadi A, Ghazani MM, Morais WOD, Freitas EPD, Soliman A, Etminani F, Byttner S, Recamonde-Mendoza M. Graph neural networks for clinical risk prediction based on electronic health records: A survey. J Biomed Inform 2024; 151:104616. [PMID: 38423267 DOI: 10.1016/j.jbi.2024.104616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/21/2024] [Accepted: 02/23/2024] [Indexed: 03/02/2024]
Abstract
OBJECTIVE This study aims to comprehensively review the use of graph neural networks (GNNs) for clinical risk prediction based on electronic health records (EHRs). The primary goal is to provide an overview of the state-of-the-art of this subject, highlighting ongoing research efforts and identifying existing challenges in developing effective GNNs for improved prediction of clinical risks. METHODS A search was conducted in the Scopus, PubMed, ACM Digital Library, and Embase databases to identify relevant English-language papers that used GNNs for clinical risk prediction based on EHR data. The study includes original research papers published between January 2009 and May 2023. RESULTS Following the initial screening process, 50 articles were included in the data collection. A significant increase in publications from 2020 was observed, with most selected papers focusing on diagnosis prediction (n = 36). The study revealed that the graph attention network (GAT) (n = 19) was the most prevalent architecture, and MIMIC-III (n = 23) was the most common data resource. CONCLUSION GNNs are relevant tools for predicting clinical risk by accounting for the relational aspects among medical events and entities and managing large volumes of EHR data. Future studies in this area may address challenges such as EHR data heterogeneity, multimodality, and model interpretability, aiming to develop more holistic GNN models that can produce more accurate predictions, be effectively implemented in clinical settings, and ultimately improve patient care.
Collapse
Affiliation(s)
- Heloísa Oss Boll
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil; School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden.
| | - Ali Amirahmadi
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Mirfarid Musavian Ghazani
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Wagner Ourique de Morais
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Edison Pignaton de Freitas
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil
| | - Amira Soliman
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Farzaneh Etminani
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Stefan Byttner
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil; Bioinformatics Core, Hospital de Clínicas de Porto Alegre (HCPA), Av. Protásio Alves, 211, Bloco C, Porto Alegre, 90035-903, RS, Brazil
| |
Collapse
|
4
|
Bali V, Turzhitsky V, Schelfhout J, Paudel M, Hulbert E, Peterson-Brandt J, Hertzberg J, Kelly NR, Patel RH. Machine learning to identify chronic cough from administrative claims data. Sci Rep 2024; 14:2449. [PMID: 38291064 PMCID: PMC10828499 DOI: 10.1038/s41598-024-51522-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024] Open
Abstract
Accurate identification of patient populations is an essential component of clinical research, especially for medical conditions such as chronic cough that are inconsistently defined and diagnosed. We aimed to develop and compare machine learning models to identify chronic cough from medical and pharmacy claims data. In this retrospective observational study, we compared 3 machine learning algorithms based on XG Boost, logistic regression, and neural network approaches using a large claims and electronic health record database. Of the 327,423 patients who met the study criteria, 4,818 had chronic cough based on linked claims-electronic health record data. The XG Boost model showed the best performance, achieving a Receiver-Operator Characteristic Area Under the Curve (ROC-AUC) of 0.916. We selected a cutoff that favors a high positive predictive value (PPV) to minimize false positives, resulting in a sensitivity, specificity, PPV, and negative predictive value of 18.0%, 99.6%, 38.7%, and 98.8%, respectively on the held-out testing set (n = 82,262). Logistic regression and neural network models achieved slightly lower ROC-AUCs of 0.907 and 0.838, respectively. The XG Boost and logistic regression models maintained their robust performance in subgroups of individuals with higher rates of chronic cough. Machine learning algorithms are one way of identifying conditions that are not coded in medical records, and can help identify individuals with chronic cough from claims data with a high degree of classification value.
Collapse
Affiliation(s)
- Vishal Bali
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA.
| | - Vladimir Turzhitsky
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA
| | - Jonathan Schelfhout
- Center for Observational and Real-World Evidence (CORE), Merck & Co, Rahway, NJ, USA
| | - Misti Paudel
- Health Economics and Outcomes Research (HEOR), Optum Insight, Eden Prairie, MN, USA
| | - Erin Hulbert
- Health Economics and Outcomes Research (HEOR), Optum Insight, Eden Prairie, MN, USA
| | | | | | | | | |
Collapse
|
5
|
Jadhav P, Sears T, Floan G, Joskowitz K, Nienow S, Cruz S, David M, de Cos V, Choi P, Ignacio RC. Application of a Machine Learning Algorithm in Prediction of Abusive Head Trauma in Children. J Pediatr Surg 2024; 59:80-85. [PMID: 37858394 DOI: 10.1016/j.jpedsurg.2023.09.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 09/07/2023] [Indexed: 10/21/2023]
Abstract
PURPOSE We explored the application of a machine learning algorithm for the timely detection of potential abusive head trauma (AHT) using the first free-text note of an encounter and demographic information. METHODS First free-text physician notes and demographic information were collected for children under 5 years of age at a Level 1 Trauma Center. The control group, which included patients with head/neck injury, was compared to those with AHT diagnosed by the Child Protective Team. Differential scores accounted for words overrepresented in AHT patient vs. control notes. Sentiment scores were reflective of note positivity/negativity and subjectivity scores accounted for note subjectivity/objectivity. The composite scores reflected the patient's differential score modified by the subjectivity score. Composite, sentiment, and subjectivity scores combined with demographic information trained a Random Forest (RF) machine learning algorithm to predict AHT. RESULTS Final composite scores with demographic information were highly associated with AHT in a test dataset. The control group included 587 patients and the test group included 193 patients. Combining composite scores with demographic information into the RF model improved AHT classification area under the curve (AUC) from 0.68 to 0.78, with an overall accuracy of 84%. Feature importance analysis of our RF model revealed that composite score, sentiment, age, and subjectivity were the most impactful predictors of AHT. The sentiment was not significantly different between control and AHT notes (p = 0.87), while subjectivity trended higher for AHT notes (p = 0.081). CONCLUSION We conclude that a machine learning algorithm can recognize patterns within free-text notes and demographic information that aid in AHT detection in children. LEVEL OF EVIDENCE III.
Collapse
Affiliation(s)
- Priyanka Jadhav
- University of California San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Timothy Sears
- Department of Bioinformatics and Systems Biology Graduate Program, University of California San Diego, 9500 Gilman Drive, San Diego, CA, 92093, USA
| | - Gretchen Floan
- Department of General Surgery, Naval Medical Center San Diego, 34800 Bob Wilson Dr, San Diego, CA, 92134, USA
| | - Katie Joskowitz
- Rady Children's Hospital San Diego, 3020 Children's Way, San Diego, CA, 92123, USA
| | - Shalon Nienow
- Department of Pediatrics, Division of Child Abuse Pediatrics, University of California-San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA; Chadwick Center for Children and Families at Rady Childrens Hospital, 3665 Kearny Villa Road, Suite 500, San Diego, CA, 92123, USA
| | - Sheena Cruz
- University of California San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Maya David
- Tulane University School of Medicine, 1430 Tulane Ave, New Orleans, LA, 70112, USA
| | - Víctor de Cos
- University of California San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Pam Choi
- Department of General Surgery, Naval Medical Center San Diego, 34800 Bob Wilson Dr, San Diego, CA, 92134, USA
| | - Romeo C Ignacio
- University of California San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA; Division of Pediatric Surgery, Department of Surgery, University of California San Diego School of Medicine, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
6
|
Paramasivam G, Sanmugam A, Palem VV, Sevanan M, Sairam AB, Nachiappan N, Youn B, Lee JS, Nallal M, Park KH. Nanomaterials for detection of biomolecules and delivering therapeutic agents in theragnosis: A review. Int J Biol Macromol 2024; 254:127904. [PMID: 37939770 DOI: 10.1016/j.ijbiomac.2023.127904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 10/30/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023]
Abstract
Nanomaterials are emerging facts used to deliver therapeutic agents in living systems. Nanotechnology is used as a compliment by implementing different kinds of nanotechnological applications such as nano-porous structures, functionalized nanomaterials, quantum dots, carbon nanomaterials, and polymeric nanostructures. The applications are in the initial stage, which led to achieving several diagnoses and therapy in clinical practice. This review conveys the importance of nanomaterials in post-genomic employment, which includes the design of immunosensors, immune assays, and drug delivery. In this view, genomics is a molecular tool containing large databases that are useful in choosing an apt molecular inhibitor such as drug, ligand and antibody target in the drug delivery process. This study identifies the expression of genes and proteins in analysis and classification of diseases. Experimentally, the study analyses the design of a disease model. In particular, drug delivery is a boon area to treat cancer. The identified drugs enter different phase trails (Trails I, II, and III). The genomic information conveys more essential entities to the phase I trials and helps to move further for other trails such as trails-II and III. In such cases, the biomarkers play a crucial role by monitoring the unique pathological process. Genetic engineering with recombinant DNA techniques can be employed to develop genetically engineered disease models. Delivering drugs in a specific area is one of the challenging issues achieved using nanoparticles. Therefore, genomics is considered as a vast molecular tool to identify drugs in personalized medicine for cancer therapy.
Collapse
Affiliation(s)
- Gokul Paramasivam
- Department of Biotechnology, Saveetha School of Engineering, Saveetha Institute of Medical & Technical Sciences (SIMATS), Saveetha Nagar, Thandalam, Chennai 602105, Tamil Nadu, India.
| | - Anandhavelu Sanmugam
- Department of Applied Chemistry, Sri Venkateswara College of Engineering, Pennalur, Sriperumbudur 602117, Tamil Nadu, India
| | - Vishnu Vardhan Palem
- Department of Biotechnology, Saveetha School of Engineering, Saveetha Institute of Medical & Technical Sciences (SIMATS), Saveetha Nagar, Thandalam, Chennai 602105, Tamil Nadu, India
| | - Murugan Sevanan
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Karunya Nagar, Coimbatore 641114, Tamil Nadu, India
| | - Ananda Babu Sairam
- Department of Applied Chemistry, Sri Venkateswara College of Engineering, Pennalur, Sriperumbudur 602117, Tamil Nadu, India
| | - Nachiappan Nachiappan
- Department of Applied Chemistry, Sri Venkateswara College of Engineering, Pennalur, Sriperumbudur 602117, Tamil Nadu, India
| | - BuHyun Youn
- Department of Biological Sciences, Pusan National University, Busan 46241, Republic of Korea
| | - Jung Sub Lee
- Department of Orthopaedic Surgery, Biomedical Research Institute, Pusan National University Hospital, Busan 46241, Republic of Korea; School of Medicine, Pusan National University, Busan 46241, Republic of Korea
| | - Muthuchamy Nallal
- Department of Chemistry, Pusan National University, Busan 46241, Republic of Korea.
| | - Kang Hyun Park
- Department of Chemistry, Pusan National University, Busan 46241, Republic of Korea.
| |
Collapse
|
7
|
Sampa MB, Biswas T, Rahman MS, Aziz NHBA, Hossain MN, Aziz NAA. A Machine Learning Web App to Predict Diabetic Blood Glucose Based on a Basic Noninvasive Health Checkup, Sociodemographic Characteristics, and Dietary Information: Case Study. JMIR Diabetes 2023; 8:e49113. [PMID: 37999944 DOI: 10.2196/49113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 09/28/2023] [Accepted: 10/11/2023] [Indexed: 11/25/2023] Open
Abstract
BACKGROUND Over the past few decades, diabetes has become a serious public health concern worldwide, particularly in Bangladesh. The advancement of artificial intelligence can be reaped in the prediction of blood glucose levels for better health management. However, the practical validity of machine learning (ML) techniques for predicting health parameters using data from low- and middle-income countries, such as Bangladesh, is very low. Specifically, Bangladesh lacks research using ML techniques to predict blood glucose levels based on basic noninvasive clinical measurements and dietary and sociodemographic information. OBJECTIVE To formulate strategies for public health planning and the control of diabetes, this study aimed to develop a personalized ML model that predicts the blood glucose level of urban corporate workers in Bangladesh. METHODS Based on the basic noninvasive health checkup test results, dietary information, and sociodemographic characteristics of 271 employees of the Bangladeshi Grameen Bank complex, 5 well-known ML models, namely, linear regression, boosted decision tree regression, neural network, decision forest regression, and Bayesian linear regression, were used to predict blood glucose levels. Continuous blood glucose data were used in this study to train the model, which then used the trained data to predict new blood glucose values. RESULTS Boosted decision tree regression demonstrated the greatest predictive performance of all evaluated models (root mean squared error=2.30). This means that, on average, our model's predicted blood glucose level deviated from the actual blood glucose level by around 2.30 mg/dL. The mean blood glucose value of the population studied was 128.02 mg/dL (SD 56.92), indicating a borderline result for the majority of the samples (normal value: 140 mg/dL). This suggests that the individuals should be monitoring their blood glucose levels regularly. CONCLUSIONS This ML-enabled web application for blood glucose prediction helps individuals to self-monitor their health condition. The application was developed with communities in remote areas of low- and middle-income countries, such as Bangladesh, in mind. These areas typically lack health facilities and have an insufficient number of qualified doctors and nurses. The web-based application is a simple, practical, and effective solution that can be adopted by the community. Use of the web application can save money on medical expenses, time, and health management expenses. The created system also aids in achieving the Sustainable Development Goals, particularly in ensuring that everyone in the community enjoys good health and well-being and lowering total morbidity and mortality.
Collapse
Affiliation(s)
- Masuda Begum Sampa
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia
- Department of Computer Science and Engineering, Faculty of Science, Engineering and Technology, University of Science and Technology Chittagong, Chattogram, Bangladesh
| | - Topu Biswas
- Department of Computer Science and Engineering, Faculty of Science, Engineering and Technology, University of Science and Technology Chittagong, Chattogram, Bangladesh
| | - Md Siddikur Rahman
- Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
| | - Nor Hidayati Binti Abdul Aziz
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia
| | - Md Nazmul Hossain
- Department of Marketing, Faculty of Business Studies, University of Dhaka, Dhaka, Bangladesh
| | - Nor Azlina Ab Aziz
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia
| |
Collapse
|
8
|
Zhou D, Xie J, Wang J, Zong J, Fang Q, Luo F, Zhang T, Ma H, Cao L, Yin H, Yin S, Li S. Establishment of a differential diagnosis method and an online prediction platform for AOSD and sepsis based on gradient boosting decision trees algorithm. Arthritis Res Ther 2023; 25:220. [PMID: 37974244 PMCID: PMC10652592 DOI: 10.1186/s13075-023-03207-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023] Open
Abstract
OBJECTIVE The differential diagnosis between adult-onset Still's disease (AOSD) and sepsis has always been a challenge. In this study, a machine learning model for differential diagnosis of AOSD and sepsis was developed and an online platform was developed to facilitate the clinical application of the model. METHODS All data were collected from 42 AOSD patients and 50 sepsis patients admitted to Affiliated Hospital of Xuzhou Medical University from December 2018 to December 2021. In addition, 5 AOSD patients and 10 sepsis patients diagnosed in our hospital after March 2022 were collected for external validation. All models were built using the scikit-learn library (version 1.0.2) in Python (version 3.9.7), and feature selection was performed using the SHAP (Shapley Additive exPlanation) package developed in Python. RESULTS The results showed that the gradient boosting decision tree(GBDT) optimization model based on arthralgia, ferritin × lymphocyte count, white blood cell count, ferritin × platelet count, and α1-acid glycoprotein/creatine kinase could well identify AOSD and sepsis. The training set interaction test (AUC: 0.9916, ACC: 0.9457, Sens: 0.9556, Spec: 0.9578) and the external validation also achieved satisfactory results (AUC: 0.9800, ACC: 0.9333, Sens: 0.8000, Spec: 1.000). We named this discrimination method AIADSS (AI-assisted discrimination of Still's disease and Sepsis) and created an online service platform for practical operation, the website is http://cppdd.cn/STILL1/ . CONCLUSION We created a method for the identification of AOSD and sepsis based on machine learning. This method can provide a reference for clinicians to formulate the next diagnosis and treatment plan.
Collapse
Affiliation(s)
- Dongmei Zhou
- The First Clinical College of Xuzhou Medical University, Xuzhou, 221004, China.
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China.
| | - Jingzhi Xie
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Jiarui Wang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, 221004, China
| | - Juan Zong
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Quanquan Fang
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Fei Luo
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Ting Zhang
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Hua Ma
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Lina Cao
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China
| | - Hanqiu Yin
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China.
| | - Songlou Yin
- Department of Rheumatology and Immunology, Affiliated Hospital of Xuzhou Medical University, Jiangsu Province, China.
| | - Shuyan Li
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, 221004, China.
| |
Collapse
|
9
|
Winkelman J, Nguyen D, vanSonnenberg E, Kirk A, Lieberman S. Artificial Intelligence (AI) in pediatric endocrinology. J Pediatr Endocrinol Metab 2023; 36:903-908. [PMID: 37589444 DOI: 10.1515/jpem-2023-0287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 08/03/2023] [Indexed: 08/18/2023]
Abstract
Artificial Intelligence (AI) is integrating itself throughout the medical community. AI's ability to analyze complex patterns and interpret large amounts of data will have considerable impact on all areas of medicine, including pediatric endocrinology. In this paper, we review and update the current studies of AI in pediatric endocrinology. Specific topics that are addressed include: diabetes management, bone growth, metabolism, obesity, and puberty. Becoming knowledgeable and comfortable with AI will assist pediatric endocrinologists, the goal of the paper.
Collapse
Affiliation(s)
| | - Diep Nguyen
- University of Arizona College of Medicine Phoenix, Phoenix, USA
| | - Eric vanSonnenberg
- University of Arizona College of Medicine Phoenix, Phoenix, USA
- From the Departments of Radiology, University of Arizona College of Medicine Phoenix, Phoenix, USA
- Student Affairs, University of Arizona College of Medicine Phoenix, Phoenix, USA
| | - Alison Kirk
- University of Arizona College of Medicine Phoenix, Phoenix, USA
- Student Affairs, University of Arizona College of Medicine Phoenix, Phoenix, USA
- Pediatrics, University of Arizona College of Medicine Phoenix, Phoenix, USA
| | - Steven Lieberman
- University of Arizona College of Medicine Phoenix, Phoenix, USA
- Internal Medicine (Division of Endocrinology), University of Arizona College of Medicine Phoenix, Phoenix, USA
| |
Collapse
|
10
|
Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023; 177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. METHODS We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). RESULTS We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. DISCUSSION Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
Collapse
Affiliation(s)
- David Fraile Navarro
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Kiran Ijaz
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Dana Rezazadegan
- Department of Computer Science and Software Engineering. School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
| | - Hania Rahimi-Ardabili
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Mark Dras
- Department of Computing, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Shlomo Berkovsky
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
11
|
Sajjadi SF, Sacre JW, Chen L, Wild SH, Shaw JE, Magliano DJ. Algorithms to define diabetes type using data from administrative databases: A systematic review of the evidence. Diabetes Res Clin Pract 2023; 203:110859. [PMID: 37517777 DOI: 10.1016/j.diabres.2023.110859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 07/06/2023] [Accepted: 07/28/2023] [Indexed: 08/01/2023]
Abstract
AIMS To find the best-performing algorithms to distinguish type 1 and type 2 diabetes in administrative data. METHODS Embase and MEDLINE databases were searched from January 2000 until January 2023. Papers evaluating the performance of algorithms to define type 1 and type 2 diabetes by reporting diagnostic metrics against a range of reference standards were selected. Study quality was evaluated using the Quality Assessment of Diagnostic Accuracy Studies. RESULTS Of the 24 studies meeting the eligibility criteria, 19 demonstrated a low risk of bias and low concerns about the applicability of the study population across all domains. Algorithms considering multiple diabetes diagnostic codes alone were sensitive and specific approaches to classify diabetes type (both metrics >92.1% for type 1 diabetes; >86.9% for type 2 diabetes). Among the top 10-performing algorithms to detect type 1 and type 2 diabetes, 70% and 100% featured multiple criteria, respectively. Information on insulin use was more sensitive and specific for detecting diabetes type than were criteria based on use of oral hypoglycaemic agents. CONCLUSIONS Algorithms based on multiple diabetes diagnostic codes and insulin use are the most accurate approaches to distinguish type 1 from type 2 diabetes using administrative data. Approaches with more than one criterion may also increase sensitivity in distinguishing diabetes type.
Collapse
Affiliation(s)
- Seyedeh Forough Sajjadi
- Baker Heart and Diabetes Institute, Melbourne, Australia; Monash University, School of Public Health and Preventive Medicine, Melbourne, Australia.
| | - Julian W Sacre
- Baker Heart and Diabetes Institute, Melbourne, Australia; Monash University, School of Public Health and Preventive Medicine, Melbourne, Australia
| | - Lei Chen
- Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Sarah H Wild
- Usher Institute, University of Edinburgh, Teviot Place, Edinburgh EH8 9AG, Scotland
| | - Jonathan E Shaw
- Baker Heart and Diabetes Institute, Melbourne, Australia; Monash University, School of Public Health and Preventive Medicine, Melbourne, Australia
| | - Dianna J Magliano
- Baker Heart and Diabetes Institute, Melbourne, Australia; Monash University, School of Public Health and Preventive Medicine, Melbourne, Australia
| |
Collapse
|
12
|
Wu Y, Min H, Li M, Shi Y, Ma A, Han Y, Gan Y, Guo X, Sun X. Effect of Artificial Intelligence-based Health Education Accurately Linking System (AI-HEALS) for Type 2 diabetes self-management: protocol for a mixed-methods study. BMC Public Health 2023; 23:1325. [PMID: 37434126 DOI: 10.1186/s12889-023-16066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/06/2023] [Indexed: 07/13/2023] Open
Abstract
BACKGROUND Patients with type 2 diabetes (T2DM) have an increasing need for personalized and Precise management as medical technology advances. Artificial intelligence (AI) technologies on mobile devices are being developed gradually in a variety of healthcare fields. As an AI field, knowledge graph (KG) is being developed to extract and store structured knowledge from massive data sets. It has great prospects for T2DM medical information retrieval, clinical decision-making, and individual intelligent question and answering (QA), but has yet to be thoroughly researched in T2DM intervention. Therefore, we designed an artificial intelligence-based health education accurately linking system (AI-HEALS) to evaluate if the AI-HEALS-based intervention could help patients with T2DM improve their self-management abilities and blood glucose control in primary healthcare. METHODS This is a nested mixed-method study that includes a community-based cluster-randomized control trial and personal in-depth interviews. Individuals with T2DM between the ages of 18 and 75 will be recruited from 40-45 community health centers in Beijing, China. Participants will either receive standard diabetes primary care (SDPC) (control, 3 months) or SDPC plus AI-HEALS online health education program (intervention, 3 months). The AI-HEALS runs in the WeChat service platform, which includes a KBQA, a system of physiological indicators and lifestyle recording and monitoring, medication and blood glucose monitoring reminders, and automated, personalized message sending. Data on sociodemography, medical examination, blood glucose, and self-management behavior will be collected at baseline, as well as 1,3,6,12, and 18 months later. The primary outcome is to reduce HbA1c levels. Secondary outcomes include changes in self-management behavior, social cognition, psychology, T2DM skills, and health literacy. Furthermore, the cost-effectiveness of the AI-HEALS-based intervention will be evaluated. DISCUSSION KBQA system is an innovative and cost-effective technology for health education and promotion for T2DM patients, but it is not yet widely used in the T2DM interventions. This trial will provide evidence on the efficacy of AI and mHealth-based personalized interventions in primary care for improving T2DM outcomes and self-management behaviors. TRIAL REGISTRATION Biomedical Ethics Committee of Peking University: IRB00001052-22,058, 2022/06/06; Clinical Trials: ChiCTR2300068952, 02/03/2023.
Collapse
Affiliation(s)
- Yibo Wu
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Hewei Min
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Mingzi Li
- School of Nursing, Peking University, Beijing, China
| | - Yuhui Shi
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Aijuan Ma
- Beijing Center for Disease Control and Prevention, Beijing, China
| | - Yumei Han
- Beijing Medical Examination Center, Beijing, China
| | - Yadi Gan
- Daxing District Center for Disease Control and Prevention of Beijing, Beijing, China
| | - Xiaohui Guo
- Peking University First Hospital, Beijing, China
| | - Xinying Sun
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China.
| |
Collapse
|
13
|
Ying W. Phenomic Studies on Diseases: Potential and Challenges. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:285-299. [PMID: 36714223 PMCID: PMC9867904 DOI: 10.1007/s43657-022-00089-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 11/21/2022] [Accepted: 11/24/2022] [Indexed: 01/23/2023]
Abstract
The rapid development of such research field as multi-omics and artificial intelligence (AI) has made it possible to acquire and analyze the multi-dimensional big data of human phenomes. Increasing evidence has indicated that phenomics can provide a revolutionary strategy and approach for discovering new risk factors, diagnostic biomarkers and precision therapies of diseases, which holds profound advantages over conventional approaches for realizing precision medicine: first, the big data of patients' phenomes can provide remarkably richer information than that of the genomes; second, phenomic studies on diseases may expose the correlations among cross-scale and multi-dimensional phenomic parameters as well as the mechanisms underlying the correlations; and third, phenomics-based studies are big data-driven studies, which can significantly enhance the possibility and efficiency for generating novel discoveries. However, phenomic studies on human diseases are still in early developmental stage, which are facing multiple major challenges and tasks: first, there is significant deficiency in analytical and modeling approaches for analyzing the multi-dimensional data of human phenomes; second, it is crucial to establish universal standards for acquirement and management of phenomic data of patients; third, new methods and devices for acquirement of phenomic data of patients under clinical settings should be developed; fourth, it is of significance to establish the regulatory and ethical guidelines for phenomic studies on diseases; and fifth, it is important to develop effective international cooperation. It is expected that phenomic studies on diseases would profoundly and comprehensively enhance our capacity in prevention, diagnosis and treatment of diseases.
Collapse
Affiliation(s)
- Weihai Ying
- Med-X Research Institute and School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030 China
- Collaborative Innovation Center for Genetics and Development, Shanghai, 200043 China
| |
Collapse
|
14
|
Yun K, He T, Zhen S, Quan M, Yang X, Man D, Zhang S, Wang W, Han X. Development and validation of explainable machine-learning models for carotid atherosclerosis early screening. J Transl Med 2023; 21:353. [PMID: 37246225 DOI: 10.1186/s12967-023-04093-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/28/2023] [Indexed: 05/30/2023] Open
Abstract
BACKGROUND Carotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China. METHODS A total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model. RESULTS A total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839-0.880) in the internal validation dataset and 0.851 (95% CI 0.837-0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol. CONCLUSIONS The ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.
Collapse
Affiliation(s)
- Ke Yun
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
| | - Tao He
- Neusoft Research Institute, Neusoft Corporation, Shenyang, Liaoning Province, China
| | - Shi Zhen
- Department of Software Engineering, Northeastern University, Shenyang, Liaoning Province, China
| | - Meihui Quan
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
| | - Xiaotao Yang
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
| | - Dongliang Man
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
| | - Shuang Zhang
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
| | - Wei Wang
- Department of Physical Examination Center, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.
| | - Xiaoxu Han
- National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.
- Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.
- Laboratory Medicine Innovation Unit, Chinese Academy of Medical Sciences, Shenyang, Liaoning Province, China.
- NHC Key Laboratory of AIDS Immunology (China Medical University), The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.
| |
Collapse
|
15
|
Banaye Yazdipour A, Masoorian H, Ahmadi M, Mohammadzadeh N, Ayyoubzadeh SM. Predicting the toxicity of nanoparticles using artificial intelligence tools: a systematic review. Nanotoxicology 2023; 17:62-77. [PMID: 36883698 DOI: 10.1080/17435390.2023.2186279] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Nanoparticles have been used extensively in different scientific fields. Due to the possible destructive effects of nanoparticles on the environment or the biological systems, their toxicity evaluation is a crucial phase for studying nanomaterial safety. In the meantime, experimental approaches for toxicity assessment of various nanoparticles are expensive and time-consuming. Thus, an alternative technique, such as artificial intelligence (AI), could be valuable for predicting nanoparticle toxicity. Therefore, in this review, the AI tools were investigated for the toxicity assessment of nanomaterials. To this end, a systematic search was performed on PubMed, Web of Science, and Scopus databases. Articles were included or excluded based on pre-defined inclusion and exclusion criteria, and duplicate studies were excluded. Finally, twenty-six studies were included. The majority of the studies were conducted on metal oxide and metallic nanoparticles. In addition, Random Forest (RF) and Support Vector Machine (SVM) had the most frequency in the included studies. Most of the models demonstrated acceptable performance. Overall, AI could provide a robust, fast, and low-cost tool for the evaluation of nanoparticle toxicity.
Collapse
Affiliation(s)
- Alireza Banaye Yazdipour
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.,Students' Scientific Research Center (SSRC), Tehran University of Medical Sciences, Tehran, Iran
| | - Hoorie Masoorian
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahnaz Ahmadi
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Niloofar Mohammadzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
16
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
17
|
Dweekat OY, Lam SS. Optimized design of hybrid genetic algorithm with multilayer perceptron to predict patients with diabetes. Soft comput 2023. [DOI: 10.1007/s00500-023-07876-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
18
|
Keller MS, Qureshi N, Albertson E, Pevnick J, Brandt N, Bui A, Sarkisian CA. Comparing risk prediction models aimed at predicting hospitalizations for adverse drug events in community dwelling older adults: a protocol paper. RESEARCH SQUARE 2023:rs.3.rs-2429369. [PMID: 36711695 PMCID: PMC9882666 DOI: 10.21203/rs.3.rs-2429369/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Background The objective of this paper is to describe the creation, validation, and comparison of two risk prediction modeling approaches for community-dwelling older adults to identify individuals at highest risk for adverse drug event-related hospitalizations. One approach will use traditional statistical methods, the second will use a machine learning approach. Methods We will construct medication, clinical, health care utilization, and other variables known to be associated with adverse drug event-related hospitalizations. To create the cohort, we will include older adults (≥ 65 years of age) empaneled to a primary care physician within the Cedars-Sinai Health System primary care clinics with polypharmacy (≥ 5 medications) or at least 1 medication commonly implicated in ADEs (certain oral hypoglycemics, anti-coagulants, anti-platelets, and insulins). We will use a Fine-Gray Cox proportional hazards model for one risk modeling approach and DataRobot, a data science and analytics platform, to run and compare several widely used supervised machine learning algorithms, including Random Forest, Support Vector Machine, Extreme Gradient Boosting (XGBoost), Decision Tree, Naïve Bayes, and K-Nearest Neighbors. We will use a variety of metrics to compare model performance and to assess the risk of algorithmic bias. Discussion In conclusion, we hope to develop a pragmatic model that can be implemented in the primary care setting to risk stratify older adults to further optimize medication management.
Collapse
Affiliation(s)
| | | | | | | | | | - Alex Bui
- David Geffen School of Medicine: University of California Los Angeles David Geffen School of Medicine
| | - Catherine A Sarkisian
- David Geffen School of Medicine: University of California Los Angeles David Geffen School of Medicine
| |
Collapse
|
19
|
Hospital selection framework for remote MCD patients based on fuzzy q-rung orthopair environment. Neural Comput Appl 2023; 35:6185-6196. [PMID: 36415285 PMCID: PMC9672551 DOI: 10.1007/s00521-022-07998-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 10/25/2022] [Indexed: 11/18/2022]
Abstract
This research proposes a novel mobile health-based hospital selection framework for remote patients with multi-chronic diseases based on wearable body medical sensors that use the Internet of Things. The proposed framework uses two powerful multi-criteria decision-making (MCDM) methods, namely fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score method for criteria weighting and hospital ranking. The development of both methods is based on a Q-rung orthopair fuzzy environment to address the uncertainty issues associated with the case study in this research. The other MCDM issues of multiple criteria, various levels of significance and data variation are also addressed. The proposed framework comprises two main phases, namely identification and development. The first phase discusses the telemedicine architecture selected, patient dataset used and decision matrix integrated. The development phase discusses criteria weighting by q-ROFWZIC and hospital ranking by q-ROFDOSM and their sub-associated processes. Weighting results by q-ROFWZIC indicate that the time of arrival criterion is the most significant across all experimental scenarios with (0.1837, 0.183, 0.230, 0.276, 0.335) for (q = 1, 3, 5, 7, 10), respectively. Ranking results indicate that Hospital (H-4) is the best-ranked hospital in all experimental scenarios. Both methods were evaluated based on systematic ranking and sensitivity analysis, thereby confirming the validity of the proposed framework.
Collapse
|
20
|
Zhang S, Yin Q, Wang J. Elevator dynamic monitoring and early warning system based on machine learning algorithm. IET NETWORKS 2022. [DOI: 10.1049/ntw2.12077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Shuai Zhang
- Department of Information Engineering Weifang Engineering Vocational College Weifang Shandong China
| | - Qiangguo Yin
- Department of Information Engineering Weifang Engineering Vocational College Weifang Shandong China
| | - Jinlong Wang
- Department of Information Engineering Weifang Engineering Vocational College Weifang Shandong China
| |
Collapse
|
21
|
Hahn SJ, Kim S, Choi YS, Lee J, Kang J. Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study. EBioMedicine 2022; 86:104383. [PMID: 36462406 PMCID: PMC9713286 DOI: 10.1016/j.ebiom.2022.104383] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/09/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Previous work on predicting type 2 diabetes by integrating clinical and genetic factors has mostly focused on the Western population. In this study, we use genome-wide polygenic risk score (gPRS) and serum metabolite data for type 2 diabetes risk prediction in the Asian population. METHODS Data of 1425 participants from the Korean Genome and Epidemiology Study (KoGES) Ansan-Ansung cohort were used in this study. For gPRS analysis, genotypic and clinical information from KoGES health examinee (n = 58,701) and KoGES cardiovascular disease association (n = 8105) sub-cohorts were included. Linkage disequilibrium analysis identified 239,062 genetic variants that were used to determine the gPRS, while the metabolites were selected using the Boruta algorithm. We used bootstrapped cross-validation to evaluate logistic regression and random forest (RF)-based machine learning models. Finally, associations of gPRS and selected metabolites with the values of homeostatic model assessment of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) were further estimated. FINDINGS During the follow-up period (8.3 ± 2.8 years), 331 participants (23.2%) were diagnosed with type 2 diabetes. The areas under the curves of the RF-based models were 0.844, 0.876, and 0.883 for the model using only demographic and clinical factors, model including the gPRS, and model with both gPRS and metabolites, respectively. Incorporation of additional parameters in the latter two models improved the classification by 11.7% and 4.2% respectively. While gPRS was significantly associated with HOMA-B value, most metabolites had a significant association with HOMA-IR value. INTERPRETATION Incorporating both gPRS and metabolite data led to enhanced type 2 diabetes risk prediction by capturing distinct etiologies of type 2 diabetes development. An RF-based model using clinical factors, gPRS, and metabolites predicted type 2 diabetes risk more accurately than the logistic regression-based model. FUNDING This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (No. 2019M3E5D1A02070863 and 2022R1C1C1005458). This work was also supported by the 2020 Research Fund (1.200098.01) of UNIST (Ulsan National Institute of Science & Technology).
Collapse
Affiliation(s)
- Seok-Ju Hahn
- Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Suhyeon Kim
- Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Young Sik Choi
- Division of Endocrinology, Department of Internal Medicine, Kosin University College of Medicine, Kosin University Gospel Hospital, Busan 49267, Republic of Korea
| | - Junghye Lee
- Department of Industrial Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Corresponding author. Department of Industrial Engineering & Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulsan, 44919, Republic of Korea.
| | - Jihun Kang
- Department of Family Medicine, Kosin University College of Medicine, Kosin University Gospel Hospital, Busan 49267, Republic of Korea,Corresponding author. Department of Family Medicine, Kosin University College of Medicine, Kosin University Gospel Hospital, 262 Gamcheon-ro, Busan 49267, Republic of Korea.
| |
Collapse
|
22
|
Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022; 12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open
Abstract
This study identified the risk factors for type 2 diabetes (T2D) and proposed a machine learning (ML) technique for predicting T2D. The risk factors for T2D were identified by multiple logistic regression (MLR) using p-value (p<0.05). Then, five ML-based techniques, including logistic regression, naïve Bayes, J48, multilayer perceptron, and random forest (RF) were employed to predict T2D. This study utilized two publicly available datasets, derived from the National Health and Nutrition Examination Survey, 2009-2010 and 2011-2012. About 4922 respondents with 387 T2D patients were included in 2009-2010 dataset, whereas 4936 respondents with 373 T2D patients were included in 2011-2012. This study identified six risk factors (age, education, marital status, SBP, smoking, and BMI) for 2009-2010 and nine risk factors (age, race, marital status, SBP, DBP, direct cholesterol, physical activity, smoking, and BMI) for 2011-2012. RF-based classifier obtained 95.9% accuracy, 95.7% sensitivity, 95.3% F-measure, and 0.946 area under the curve.
Collapse
Affiliation(s)
- Md. Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - N.A.M Faisal Ahmed
- Institute of Education and Research, University of Rajshahi, Rajshahi, Bangladesh
| | | |
Collapse
|
23
|
Liu F, Demosthenes P. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol 2022; 22:287. [PMID: 36335315 PMCID: PMC9636688 DOI: 10.1186/s12874-022-01768-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/22/2022] [Indexed: 11/07/2022] Open
Abstract
Abstract
Background
The increased adoption of the internet, social media, wearable devices, e-health services, and other technology-driven services in medicine and healthcare has led to the rapid generation of various types of digital data, providing a valuable data source beyond the confines of traditional clinical trials, epidemiological studies, and lab-based experiments.
Methods
We provide a brief overview on the type and sources of real-world data and the common models and approaches to utilize and analyze real-world data. We discuss the challenges and opportunities of using real-world data for evidence-based decision making This review does not aim to be comprehensive or cover all aspects of the intriguing topic on RWD (from both the research and practical perspectives) but serves as a primer and provides useful sources for readers who interested in this topic.
Results and Conclusions
Real-world hold great potential for generating real-world evidence for designing and conducting confirmatory trials and answering questions that may not be addressed otherwise. The voluminosity and complexity of real-world data also call for development of more appropriate, sophisticated, and innovative data processing and analysis techniques while maintaining scientific rigor in research findings, and attentions to data ethics to harness the power of real-world data.
Collapse
|
24
|
Xia S, Zhang Y, Peng B, Hu X, Zhou L, Chen C, Lu C, Chen M, Pang C, Dai Y, Ji J. Detection of mild cognitive impairment in type 2 diabetes mellitus based on machine learning using privileged information. Neurosci Lett 2022; 791:136908. [DOI: 10.1016/j.neulet.2022.136908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/28/2022] [Accepted: 10/04/2022] [Indexed: 01/21/2023]
|
25
|
Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]
Abstract
Soft-computing and statistical learning models have gained substantial momentum in predicting type 2 diabetes mellitus (T2DM) disease. This paper reviews recent soft-computing and statistical learning models in T2DM using a meta-analysis approach. We searched for papers using soft-computing and statistical learning models focused on T2DM published between 2010 and 2021 on three different search engines. Of 1215 studies identified, 34 with 136952 patients met our inclusion criteria. The pooled algorithm's performance was able to predict T2DM with an overall accuracy of 0.86 (95% confidence interval [CI] of [0.82, 0.89]). The classification of diabetes prediction was significantly greater in models with a screening and diagnosis (pooled proportion [95% CI] = 0.91 [0.74, 0.97]) when compared to models with nephropathy (pooled proportion = 0.48 [0.76, 0.89] to 0.88 [0.83, 0.91]). For the prediction of T2DM, the decision trees (DT) models had a pooled accuracy of 0.88 [95% CI: 0.82, 0.92], and the neural network (NN) models had a pooled accuracy of 0.85 [95% CI: 0.79, 0.89]. Meta-regression did not provide any statistically significant findings for the heterogeneous accuracy in studies with different diabetes predictions, sample sizes, and impact factors. Additionally, ML models showed high accuracy for the prediction of T2DM. The predictive accuracy of ML algorithms in T2DM is promising, mainly through DT and NN models. However, there is heterogeneity among ML models. We compared the results and models and concluded that this evidence might help clinicians interpret data and implement optimum models for their dataset for T2DM prediction.
Collapse
Affiliation(s)
- Micheal O. Olusanya
- Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley 8300, South Africa
- Correspondence:
| | - Ropo Ebenezer Ogunsakin
- Biostatistics Unit, Discipline of Public Health Medicine, School of Nursing & Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Matthew Adekunle Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| |
Collapse
|
26
|
Aplicaciones de aprendizaje automático en salud. REVISTA MÉDICA CLÍNICA LAS CONDES 2022. [DOI: 10.1016/j.rmclc.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
27
|
Althomsons SP, Winglee K, Heilig CM, Talarico S, Silk B, Wortham J, Hill AN, Navin TR. Using Machine Learning Techniques and National Tuberculosis Surveillance Data to Predict Excess Growth in Genotyped Tuberculosis Clusters. Am J Epidemiol 2022; 191:1936-1943. [PMID: 35780450 PMCID: PMC10790200 DOI: 10.1093/aje/kwac117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 05/05/2022] [Accepted: 06/28/2022] [Indexed: 02/01/2023] Open
Abstract
The early identification of clusters of persons with tuberculosis (TB) that will grow to become outbreaks creates an opportunity for intervention in preventing future TB cases. We used surveillance data (2009-2018) from the United States, statistically derived definitions of unexpected growth, and machine-learning techniques to predict which clusters of genotype-matched TB cases are most likely to continue accumulating cases above expected growth within a 1-year follow-up period. We developed a model to predict which clusters are likely to grow on a training and testing data set that was generalizable to a validation data set. Our model showed that characteristics of clusters were more important than the social, demographic, and clinical characteristics of the patients in those clusters. For instance, the time between cases before unexpected growth was identified as the most important of our predictors. A faster accumulation of cases increased the probability of excess growth being predicted during the follow-up period. We have demonstrated that combining the characteristics of clusters and cases with machine learning can add to existing tools to help prioritize which clusters may benefit most from public health interventions. For example, consideration of an entire cluster, not only an individual patient, may assist in interrupting ongoing transmission.
Collapse
Affiliation(s)
- Sandy P. Althomsons
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Kathryn Winglee
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Charles M. Heilig
- Center for Surveillance, Epidemiology, and Laboratory Services, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Sarah Talarico
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Benjamin Silk
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Jonathan Wortham
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Andrew N. Hill
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| | - Thomas R. Navin
- Division of TB Elimination, National Center for HIV, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, United States
| |
Collapse
|
28
|
Kurasawa H, Waki K, Chiba A, Seki T, Hayashi K, Fujino A, Haga T, Noguchi T, Ohe K. Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e37951. [PMID: 38935955 PMCID: PMC11135228 DOI: 10.2196/37951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/19/2022] [Accepted: 09/02/2022] [Indexed: 06/29/2024]
Abstract
BACKGROUND Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided. OBJECTIVE This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk. METHODS This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot. RESULTS The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots. CONCLUSIONS A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD.
Collapse
Affiliation(s)
| | - Kayo Waki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Akihiro Chiba
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- NTT DOCOMO, INC, Tokyo, Japan
| | - Tomohisa Seki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | | | - Akinori Fujino
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
| | - Tsuneyuki Haga
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- NTT-AT IPS Corporation, Kanagawa, Japan
| | - Takashi Noguchi
- National Center for Child Health and Development, Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
29
|
Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022; 24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable. OBJECTIVE The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status. METHODS In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated. RESULTS NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features. CONCLUSIONS NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.
Collapse
Affiliation(s)
- Ayush Noori
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Colin Magdamo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Xiao Liu
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Tanish Tyagi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Zhaozhi Li
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Akhil Kondepudi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Haitham Alabsi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Emily Rudmann
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
| | - Douglas Wilcox
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Laura Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Gregory K Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Lidia Moura
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Sahar Zafar
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Nicole M Benson
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
- McLean Hospital, Belmont, MA, United States
| | - John Hsu
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
| | - John R Dickson
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Alberto Serrano-Pozo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Bradley T Hyman
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Deborah Blacker
- Harvard Medical School, Boston, MA, United States
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S Mukerji
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| |
Collapse
|
30
|
Thompson M, Hill BL, Rakocz N, Chiang JN, Geschwind D, Sankararaman S, Hofer I, Cannesson M, Zaitlen N, Halperin E. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. NPJ Genom Med 2022; 7:50. [PMID: 36008412 PMCID: PMC9411568 DOI: 10.1038/s41525-022-00320-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 07/18/2022] [Indexed: 12/20/2022] Open
Abstract
Inference of clinical phenotypes is a fundamental task in precision medicine, and has therefore been heavily investigated in recent years in the context of electronic health records (EHR) using a large arsenal of machine learning techniques, as well as in the context of genetics using polygenic risk scores (PRS). In this work, we considered the epigenetic analog of PRS, methylation risk scores (MRS), a linear combination of methylation states. We measured methylation across a large cohort (n = 831) of diverse samples in the UCLA Health biobank, for which both genetic and complete EHR data are available. We constructed MRS for 607 phenotypes spanning diagnoses, clinical lab tests, and medication prescriptions. When added to a baseline set of predictive features, MRS significantly improved the imputation of 139 outcomes, whereas the PRS improved only 22 (median improvement for methylation 10.74%, 141.52%, and 15.46% in medications, labs, and diagnosis codes, respectively, whereas genotypes only improved the labs at a median increase of 18.42%). We added significant MRS to state-of-the-art EHR imputation methods that leverage the entire set of medical records, and found that including MRS as a medical feature in the algorithm significantly improves EHR imputation in 37% of lab tests examined (median R2 increase 47.6%). Finally, we replicated several MRS in multiple external studies of methylation (minimum p-value of 2.72 × 10−7) and replicated 22 of 30 tested MRS internally in two separate cohorts of different ethnicity. Our publicly available results and weights show promise for methylation risk scores as clinical and scientific tools.
Collapse
Affiliation(s)
- Mike Thompson
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.
| | - Nadav Rakocz
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Jeffrey N Chiang
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel Geschwind
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.,Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA.,Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Ira Hofer
- Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Maxime Cannesson
- Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Eran Halperin
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
31
|
Wang M, Lin Z, Li R, Li Y, Su J. Predicting disease progress with imprecise lab test results. Artif Intell Med 2022; 132:102373. [DOI: 10.1016/j.artmed.2022.102373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 05/16/2022] [Accepted: 07/28/2022] [Indexed: 11/02/2022]
|
32
|
Freda PJ, Kranzler HR, Moore JH. Novel digital approaches to the assessment of problematic opioid use. BioData Min 2022; 15:14. [PMID: 35840990 PMCID: PMC9284824 DOI: 10.1186/s13040-022-00301-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 06/30/2022] [Indexed: 11/16/2022] Open
Abstract
The opioid epidemic continues to contribute to loss of life through overdose and significant social and economic burdens. Many individuals who develop problematic opioid use (POU) do so after being exposed to prescribed opioid analgesics. Therefore, it is important to accurately identify and classify risk factors for POU. In this review, we discuss the etiology of POU and highlight novel approaches to identifying its risk factors. These approaches include the application of polygenic risk scores (PRS) and diverse machine learning (ML) algorithms used in tandem with data from electronic health records (EHR), clinical notes, patient demographics, and digital footprints. The implementation and synergy of these types of data and approaches can greatly assist in reducing the incidence of POU and opioid-related mortality by increasing the knowledge base of patient-related risk factors, which can help to improve prescribing practices for opioid analgesics.
Collapse
Affiliation(s)
- Philip J Freda
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, CA, 90069, USA.
| | - Henry R Kranzler
- University of Pennsylvania, Center for Studies of Addiction, 3535 Market St., Suite 500 and Crescenz VAMC, 3800 Woodland Ave., Philadelphia, PA, 19104, USA
| | - Jason H Moore
- Cedars-Sinai Medical Center, Department of Computational Biomedicine, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, CA, 90069, USA
| |
Collapse
|
33
|
Application of machine learning methods for the prediction of true fasting status in patients performing blood tests. Sci Rep 2022; 12:11929. [PMID: 35831336 PMCID: PMC9279373 DOI: 10.1038/s41598-022-15161-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 06/20/2022] [Indexed: 11/28/2022] Open
Abstract
The fasting blood glucose (FBG) values extracted from electronic medical records (EMR) are assumed valid in existing research, which may cause diagnostic bias due to misclassification of fasting status. We proposed a machine learning (ML) algorithm to predict the fasting status of blood samples. This cross-sectional study was conducted using the EMR of a medical center from 2003 to 2018 and a total of 2,196,833 ontological FBGs from the outpatient service were enrolled. The theoretical true fasting status are identified by comparing the values of ontological FBG with average glucose levels derived from concomitant tested HbA1c based on multi-criteria. In addition to multiple logistic regression, we extracted 67 features to predict the fasting status by eXtreme Gradient Boosting (XGBoost). The discrimination and calibration of the prediction models were also assessed. Real-world performance was gauged by the prevalence of ineffective glucose measurement (IGM). Of the 784,340 ontologically labeled fasting samples, 77.1% were considered theoretical FBGs. The median (IQR) glucose and HbA1c level of ontological and theoretical fasting samples in patients without diabetes mellitus (DM) were 94.0 (87.0, 102.0) mg/dL and 5.6 (5.4, 5.9)%, and 92.0 (86.0, 99.0) mg/dL and 5.6 (5.4, 5.9)%, respectively. The XGBoost showed comparable calibration and AUROC of 0.887 than that of 0.868 in multiple logistic regression in the parsimonious approach and identified important predictors of glucose level, home-to-hospital distance, age, and concomitantly serum creatinine and lipid testing. The prevalence of IGM dropped from 27.8% based on ontological FBGs to 0.48% by using algorithm-verified FBGs. The proposed ML algorithm or multiple logistic regression model aids in verification of the fasting status.
Collapse
|
34
|
Yi H. Efficient machine learning algorithm for electroencephalogram modeling in brain–computer interfaces. Neural Comput Appl 2022. [DOI: 10.1007/s00521-020-04861-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
35
|
Yiadom MYAB, Gong W, Patterson BW, Baugh CW, Mills AM, Gavin N, Podolsky SR, Salazar G, Mumma BE, Tanski M, Hadley K, Azzo C, Dorner SC, Ulintz A, Liu D. Fallacy of Median Door‐to‐ECG Time: Hidden Opportunities for STEMI Screening Improvement. J Am Heart Assoc 2022; 11:e024067. [PMID: 35492001 PMCID: PMC9238601 DOI: 10.1161/jaha.121.024067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Background ST‐segment elevation myocardial infarction (STEMI) guidelines recommend screening arriving emergency department (ED) patients for an early ECG in those with symptoms concerning for myocardial ischemia. Process measures target median door‐to‐ECG (D2E) time of 10 minutes. Methods and Results This 3‐year descriptive retrospective cohort study, including 676 ED‐diagnosed patients with STEMI from 10 geographically diverse facilities across the United States, examines an alternative approach to quantifying performance: proportion of patients meeting the goal of D2E≤10 minutes. We also identified characteristics associated with D2E>10 minutes and estimated the proportion of patients with screening ECG occurring during intake, triage, and main ED care periods. We found overall median D2E was 7 minutes (IQR:4–16; range: 0–1407 minutes; range of ED medians: 5–11 minutes). Proportion of patients with D2E>10 minutes was 37.9% (ED range: 21.5%–57.1%). Patients with D2E>10 minutes, compared to those with D2E≤10 minutes, were more likely female (32.8% versus 22.6%, P=0.005), Black (23.4% versus 12.4%, P=0.005), non‐English speaking (24.6% versus 19.5%, P=0.032), diabetic (40.2% versus 30.2%, P=0.010), and less frequently reported chest pain (63.3% versus 87.4%, P<0.001). ECGs were performed during ED intake in 62.1% of visits, ED triage in 25.3%, and main ED care in 12.6%. Conclusions Examining D2E>10 minutes can identify opportunities to improve care for more ED patients with STEMI. Our findings suggest sex, race, language, and diabetes are associated with STEMI diagnostic delays. Moving the acquisition of ECGs completed during triage to intake could achieve the D2E≤10 minutes goal for 87.4% of ED patients with STEMI. Sophisticated screening, accounting for differential risk and diversity in STEMI presentations, may further improve timely detection.
Collapse
Affiliation(s)
| | - Wu Gong
- Department of Biostatistics Vanderbilt University Medical Center Nashville TN
| | - Brian W. Patterson
- Department of Emergency Medicine University of Wisconsin School of Medicine and Public Health Madison WI
| | - Christopher W. Baugh
- Department of Emergency Medicine Brigham and Women’s Hospital, Harvard Medical School Boston MA
| | - Angela M. Mills
- Department of Emergency Medicine Columbia University College of Physicians and Surgeons New York NY
| | - Nicholas Gavin
- Department of Emergency Medicine Columbia University College of Physicians and Surgeons New York NY
| | - Seth R. Podolsky
- Legacy Health Portland OR
- Elson S. Floyd College of Medicine at Washington State University Spokane WA
| | - Gilberto Salazar
- Department of Emergency Medicine Parkland HospitalUniversity of Texas Southwestern Medical Center Dallas TX
| | - Bryn E. Mumma
- Department of Emergency Medicine University of CaliforniaDavis, School of Medicine Sacramento CA
| | - Mary Tanski
- Department of Emergency Medicine Oregon Health & Sciences University Portland OR
| | - Kelsea Hadley
- School of Medicine American University of the Caribbean Cupecoy Sint Maarten
| | - Caitlin Azzo
- Department of Emergency Medicine University of Pennsylvania Philadelphia PA
| | - Stephen C. Dorner
- Department of Emergency Medicine Massachusetts General HospitalHarvard Medical School Boston MA
| | - Alexander Ulintz
- Department of Emergency Medicine Indiana University School of Medicine Indianapolis IN
| | - Dandan Liu
- Department of Biostatistics Vanderbilt University Medical Center Nashville TN
| |
Collapse
|
36
|
Shao W, Luo X, Zhang Z, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data. BMC Bioinformatics 2022; 23:140. [PMID: 35439945 PMCID: PMC9019947 DOI: 10.1186/s12859-022-04680-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. RESULTS The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. CONCLUSIONS To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.
Collapse
Affiliation(s)
- Wei Shao
- Indiana University School of Medicine, 1101 W 10th Street, Indianapolis, IN, 46202, USA
| | - Xiao Luo
- Purdue School of Engineering and Technology, IUPUI, ET 301L, 799 W. Michigan Street, Indianapolis, IN, 46202, USA.
| | - Zuoyi Zhang
- Indiana University School of Medicine, 1101 W 10th Street, Indianapolis, IN, 46202, USA
| | - Zhi Han
- Indiana University School of Medicine, 1101 W 10th Street, Indianapolis, IN, 46202, USA.,Regenstrief Institute, Inc., Indianapolis, IN, USA
| | - Vasu Chandrasekaran
- Center for Observational and Real-World Evidence, Merck & Co., Inc., Kenilworth, NJ, USA
| | - Vladimir Turzhitsky
- Center for Observational and Real-World Evidence, Merck & Co., Inc., Kenilworth, NJ, USA
| | - Vishal Bali
- Center for Observational and Real-World Evidence, Merck & Co., Inc., Kenilworth, NJ, USA
| | | | | | - Jarod Baker
- Regenstrief Institute, Inc., Indianapolis, IN, USA
| | - Carmen La Rosa
- Center for Observational and Real-World Evidence, Merck & Co., Inc., Kenilworth, NJ, USA
| | - Jessica Weaver
- Center for Observational and Real-World Evidence, Merck & Co., Inc., Kenilworth, NJ, USA
| | - Paul Dexter
- Indiana University School of Medicine, 1101 W 10th Street, Indianapolis, IN, 46202, USA.,Regenstrief Institute, Inc., Indianapolis, IN, USA.,Eskenazi Health, Indianapolis, IN, USA
| | - Kun Huang
- Indiana University School of Medicine, 1101 W 10th Street, Indianapolis, IN, 46202, USA. .,Regenstrief Institute, Inc., Indianapolis, IN, USA.
| |
Collapse
|
37
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
38
|
Perez-Lebel A, Varoquaux G, Le Morvan M, Josse J, Poline JB. Benchmarking missing-values approaches for predictive models on health databases. Gigascience 2022; 11:6568998. [PMID: 35426912 PMCID: PMC9012100 DOI: 10.1093/gigascience/giac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 11/30/2021] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Background
As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values. These large databases are well suited to train machine learning models, e.g., for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative—rather than generative—modeling and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics.
Results
Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: 4 electronic health record datasets, 1 population brain imaging database, 1 health survey, and 2 intensive care surveys. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing-values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values—with missing incorporated attribute—leads to robust, fast, and well-performing predictive modeling.
Conclusions
Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.
Collapse
Affiliation(s)
- Alexandre Perez-Lebel
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada
- Inria Saclay – Île-de-France, Parietal team, 1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau, France
- Mila - Quebec Artificial Intelligence Institute, 6666 Saint-Urbain Street, Montréal, QC H2S 3H1, Canada
| | - Gaël Varoquaux
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada
- Inria Saclay – Île-de-France, Parietal team, 1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau, France
- Mila - Quebec Artificial Intelligence Institute, 6666 Saint-Urbain Street, Montréal, QC H2S 3H1, Canada
| | - Marine Le Morvan
- Inria Saclay – Île-de-France, Parietal team, 1 Rue Honoré d'Estienne d'Orves, 91120 Palaiseau, France
| | - Julie Josse
- Inria Montpellier, Bâtiment 5, 860 Rue de St-Priest, 34090 Montpellier, France
- IDESP Institut Desbrest d’Épidémiologie et de Santé Publique, Campus Santé, IURC, 641 avenue du Doyen Gaston Giraud, 34090 Montpellier, France
| | - Jean-Baptiste Poline
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada
| |
Collapse
|
39
|
Cardozo G, Pintarelli GB, Andreis GR, Lopes ACW, Marques JLB. Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]
Abstract
Most patients with diabetes mellitus are asymptomatic, which leads to delayed and more complex treatment. At the same time, most individuals are routinely subjected to standard clinical laboratory examinations, which create large health datasets over a lifetime. Computer processing has been used to search for health anomalies and predict diseases using clinical examinations. This work studied machine learning models to support the screening of diabetes through routine laboratory tests using data from laboratory tests of 62,496 patients. The classification and regression models used were the K-nearest neighbor, support vector machines, Bayes naïve, random forest models, and artificial neural networks. Glycated hemoglobin, a test used for diabetes diagnosis, was used as the target. Regression models calculated glycated hemoglobin directly and were later classified. The performance of classification computer models has been studied under various subdataset partitions and combinations (e.g., healthy, prediabetic, and diabetes, as well as no healthy and no diabetes). The best single performance was achieved with the artificial neural network model when detecting prediabetes or diabetes. The artificial neural network classification model scored 78.1%, 78.7%, and 78.4% for sensitivity, precision, and F1 scores, respectively, when identifying no healthy group. Other models also had good results, depending on what is desired. Machine learning-based models can predict glycated hemoglobin values from routine laboratory tests and can be used as a screening tool to refer a patient for further testing.
Collapse
Affiliation(s)
- Glauco Cardozo
- Academic Department of Health and Services, Federal Institute of Santa Catarina, Florianopolis, SC 88020-300, Brazil
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | - Guilherme Brasil Pintarelli
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | - Guilherme Rettore Andreis
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | | | - Jefferson Luiz Brum Marques
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| |
Collapse
|
40
|
Wu C, Zhou T, Tian Y, Wu J, Li J, Liu Z. A method for the early prediction of chronic diseases based on short sequential medical data. Artif Intell Med 2022; 127:102262. [DOI: 10.1016/j.artmed.2022.102262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/18/2022] [Accepted: 02/23/2022] [Indexed: 11/30/2022]
|
41
|
Development of a visual attention based decision support system for autism spectrum disorder screening. Int J Psychophysiol 2022; 173:69-81. [PMID: 35007668 DOI: 10.1016/j.ijpsycho.2022.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 12/14/2021] [Accepted: 01/04/2022] [Indexed: 11/24/2022]
Abstract
Visual attention of young children with autism spectrum disorder (ASD) has been well documented in the literature for the past 20 years. In this study, we developed a Decision Support System (DSS) that uses machine learning (ML) techniques to identify young children with ASD from typically developing (TD) children. Study participants included 26 to 36 months old young children with ASD (n = 61) and TD children (n = 72). The results showed that the proposed DSS achieved up to 87.5% success rate in the early assessment of ASD in young children. Findings suggested that visual attention is a unique, promising biomarker for early assessment of ASD. Study results were discussed, and suggestions for future research were provided.
Collapse
|
42
|
Haneef R, Tijhuis M, Thiébaut R, Májek O, Pristaš I, Tolenan H, Gallay A. Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques. Arch Public Health 2022; 80:9. [PMID: 34983651 PMCID: PMC8725299 DOI: 10.1186/s13690-021-00770-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/17/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. METHOD We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. RESULTS We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. CONCLUSIONS This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.
Collapse
Affiliation(s)
- Romana Haneef
- Department of Non-Communicable Diseases and Injuries, Santé Publique France, Saint-Maurice, France.
| | - Mariken Tijhuis
- National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Rodolphe Thiébaut
- Bordeaux University, Bordeaux School of Public Health, Bordeaux, France.,INSERM / INRIA SISTM team, Bordeaux Population health, Bordeaux, France.,Medical Information Department, Bordeaux University Hospital, Bordeaux, France
| | - Ondřej Májek
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic.,Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Ivan Pristaš
- National Institute of public health, division of health informatics and biostatistics, Zagreb, Croatia
| | - Hanna Tolenan
- Finnish Institute for Health and Welfare (THL), Helsinki, Finland
| | - Anne Gallay
- Department of Non-Communicable Diseases and Injuries, Santé Publique France, Saint-Maurice, France
| |
Collapse
|
43
|
McKnite AM, Job KM, Nelson R, Sherwin CM, Watt KM, Brewer SC. Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database. INFORMATICS IN MEDICINE UNLOCKED 2022; 34. [PMID: 36405250 PMCID: PMC9674326 DOI: 10.1016/j.imu.2022.101104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institutions. This makes it challenging to identify subpopulations of patients and limits the widespread use of multi-institutional databases. In this study, we leveraged machine learning to identify patterns in medication usage among hospitalized pediatric patients receiving renal replacement therapy and created a predictive model that successfully differentiated between intermittent (iHD) and continuous renal replacement therapy (CRRT) hemodialysis patients. We trained six machine learning algorithms (logistical regression, Naïve Bayes, k-nearest neighbor, support vector machine, random forest, and gradient boosted trees) using patient records from a multi-center database (n = 533) and prescribed medication ingredients (n = 228) as features to discriminate between the two hemodialysis types. Predictive skill was assessed using a 5-fold cross-validation, and the algorithms showed a range of performance from 0.7 balanced accuracy (logistical regression) to 0.86 (random forest). The two best performing models were further tested using an independent single-center dataset and achieved 84–87% balanced accuracy. This model overcomes issues inherent within large databases and will allow us to utilize and combine historical records, significantly increasing population size and diversity within both iHD and CRRT populations for future clinical studies. Our work demonstrates the utility of using medications alone to accurately differentiate subpopulations of patients in large datasets, allowing codes to be transferred between different coding systems. This framework has the potential to be used to distinguish other subpopulations of patients where discriminatory ICD codes are not available, permitting more detailed insights and new lines of research.
Collapse
|
44
|
Khademi F, Rabbani M, Motameni H, Akbari E. A weighted ensemble classifier based on WOA for classification of diabetes. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06481-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
45
|
Brady V, Whisenant M, Wang X, Ly VK, Zhu G, Aguilar D, Wu H. Characterization of Symptoms and Symptom Clusters for Type 2 Diabetes Using a Large Nationwide Electronic Health Record Database. Diabetes Spectr 2022; 35:159-170. [PMID: 35668892 PMCID: PMC9160545 DOI: 10.2337/ds21-0064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE A variety of symptoms may be associated with type 2 diabetes and its complications. Symptoms in chronic diseases may be described in terms of prevalence, severity, and trajectory and often co-occur in groups, known as symptom clusters, which may be representative of a common etiology. The purpose of this study was to characterize type 2 diabetes-related symptoms using a large nationwide electronic health record (EHR) database. METHODS We acquired the Cerner Health Facts, a nationwide EHR database. The type 2 diabetes cohort (n = 1,136,301 patients) was identified using a rule-based phenotype method. A multistep procedure was then used to identify type 2 diabetes-related symptoms based on International Classification of Diseases, 9th and 10th revisions, diagnosis codes. Type 2 diabetes-related symptoms and co-occurring symptom clusters, including their temporal patterns, were characterized based the longitudinal EHR data. RESULTS Patients had a mean age of 61.4 years, 51.2% were female, and 70.0% were White. Among 1,136,301 patients, there were 8,008,276 occurrences of 59 symptoms. The most frequently reported symptoms included pain, heartburn, shortness of breath, fatigue, and swelling, which occurred in 21-60% of the patients. We also observed over-represented type 2 diabetes symptoms, including difficulty speaking, feeling confused, trouble remembering, weakness, and drowsiness/sleepiness. Some of these are rare and difficult to detect by traditional patient-reported outcomes studies. CONCLUSION To the best of our knowledge, this is the first study to use a nationwide EHR database to characterize type 2 diabetes-related symptoms and their temporal patterns. Fifty-nine symptoms, including both over-represented and rare diabetes-related symptoms, were identified.
Collapse
Affiliation(s)
- Veronica Brady
- Cizik School of Nursing, The University of Texas Health Science Center at Houston, Houston, TX
| | - Meagan Whisenant
- Cizik School of Nursing, The University of Texas Health Science Center at Houston, Houston, TX
| | - Xueying Wang
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - Vi K. Ly
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - Gen Zhu
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
| | - David Aguilar
- McGovern School of Medicine, The University of Texas Health Science Center at Houston, Houston, TX
| | - Hulin Wu
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX
- Corresponding author: Hulin Wu,
| |
Collapse
|
46
|
Different Data Mining Approaches Based Medical Text Data. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:1285167. [PMID: 34912530 PMCID: PMC8668297 DOI: 10.1155/2021/1285167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/18/2021] [Indexed: 12/15/2022]
Abstract
The amount of medical text data is increasing dramatically. Medical text data record the progress of medicine and imply a large amount of medical knowledge. As a natural language, they are characterized by semistructured, high-dimensional, high data volume semantics and cannot participate in arithmetic operations. Therefore, how to extract useful knowledge or information from the total available data is very important task. Using various techniques of data mining can extract valuable knowledge or information from data. In the current study, we reviewed different approaches to apply for medical text data mining. The advantages and shortcomings for each technique compared to different processes of medical text data were analyzed. We also explored the applications of algorithms for providing insights to the users and enabling them to use the resources for the specific challenges in medical text data. Further, the main challenges in medical text data mining were discussed. Findings of this paper are benefit for helping the researchers to choose the reasonable techniques for mining medical text data and presenting the main challenges to them in medical text data mining.
Collapse
|
47
|
Sharma T, Shah M. A comprehensive review of machine learning techniques on diabetes detection. Vis Comput Ind Biomed Art 2021; 4:30. [PMID: 34862560 PMCID: PMC8642577 DOI: 10.1186/s42492-021-00097-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 10/29/2021] [Indexed: 12/14/2022] Open
Abstract
Diabetes mellitus has been an increasing concern owing to its high morbidity, and the average age of individual affected by of individual affected by this disease has now decreased to mid-twenties. Given the high prevalence, it is necessary to address with this problem effectively. Many researchers and doctors have now developed detection techniques based on artificial intelligence to better approach problems that are missed due to human errors. Data mining techniques with algorithms such as - density-based spatial clustering of applications with noise and ordering points to identify the cluster structure, the use of machine vision systems to learn data on facial images, gain better features for model training, and diagnosis via presentation of iridocyclitis for detection of the disease through iris patterns have been deployed by various practitioners. Machine learning classifiers such as support vector machines, logistic regression, and decision trees, have been comparative discussed various authors. Deep learning models such as artificial neural networks and recurrent neural networks have been considered, with primary focus on long short-term memory and convolutional neural network architectures in comparison with other machine learning models. Various parameters such as the root-mean-square error, mean absolute errors, area under curves, and graphs with varying criteria are commonly used. In this study, challenges pertaining to data inadequacy and model deployment are discussed. The future scope of such methods has also been discussed, and new methods are expected to enhance the performance of existing models, allowing them to attain greater insight into the conditions on which the prevalence of the disease depends.
Collapse
Affiliation(s)
- Toshita Sharma
- Department of Electronics and Communication Technology, Nirma University, 382481, Ahmedabad, Gujarat, India
| | - Manan Shah
- Department of Chemical Engineering, School of Technology, Pandit Deendayal Energy University, 382426, Gandhinagar, Gujarat, India.
| |
Collapse
|
48
|
Tao K, Li J, Li J, Shan W, Yan H, Lu Y. Estimation of Heart Rate Using Regression Models and Artificial Neural Network in Middle-Aged Adults. Front Physiol 2021; 12:742754. [PMID: 34658928 PMCID: PMC8514712 DOI: 10.3389/fphys.2021.742754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/07/2021] [Indexed: 11/13/2022] Open
Abstract
Purpose: Heart rate is the most commonly used indicator in clinical medicine to assess the functionality of the cardiovascular system. Most studies have focused on age-based equations to estimate the maximal heart rate, neglecting multiple factors that affect the accuracy of the prediction. Methods: We studied 121 middle-aged adults at an average age of 57.2years with an average body mass index (BMI) of 25.9. The participants performed on a power bike with a starting wattage of 0W that was increased by 25W every 3min until the experiment terminated. Ambulatory blood pressure and electrocardiography were monitored through gas metabolic analyzers for safety concerns. Six descriptive characteristics of participants were observed, which were further analyzed using a multivariate regression model and an artificial neural network (ANN). Results: The input variables for the multivariate regression model and ANN were selected by correlation for the reduction of dimension. The accuracy of estimation by multivariate regression model and ANN was 9.74 and 9.42%, respectively, which outperformed the traditional age-based model (with an accuracy of 10.31%). Conclusion: This study provides comprehensive approaches to estimate the maximal heart rate using multiple indicators, revealing that both the multivariate regression model and ANN incorporated with age, resting heart rate (RHR), and second-order heart rate (SOHR) are more accurate than univariate models.
Collapse
Affiliation(s)
- Kuan Tao
- School of Sports Engineering, Beijing Sport University, Beijing, China
| | - Jiahao Li
- School of Sport Medicine and Physical Therapy, Beijing Sport University, Beijing, China
| | - Jiajin Li
- School of Sport Medicine and Physical Therapy, Beijing Sport University, Beijing, China
| | - Wei Shan
- China Institute of Sport and Health Science, Beijing Sport University, Beijing, China
| | - Huiping Yan
- School of Sport Medicine and Physical Therapy, Beijing Sport University, Beijing, China
| | - Yifan Lu
- School of Sport Medicine and Physical Therapy, Beijing Sport University, Beijing, China.,Key Laboratory of Sports and Physical Fitness of the Ministry of Education, Beijing Sport University, Beijing, China
| |
Collapse
|
49
|
Luo X, Gandhi P, Zhang Z, Shao W, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Applying interpretable deep learning models to identify chronic cough patients using EHR data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 210:106395. [PMID: 34525412 DOI: 10.1016/j.cmpb.2021.106395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/30/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Chronic cough (CC) affects approximately 10% of adults. Many disease states are associated with chronic cough, such as asthma, upper airway cough syndrome, bronchitis, and gastroesophageal reflux disease. The lack of an ICD code specific for chronic cough makes it challenging to identify such patients from electronic health records (EHRs). For clinical and research purposes, computational methods using EHR data are urgently needed to identify chronic cough cases. This research aims to investigate the data representations and deep learning algorithms for chronic cough prediction. METHODS Utilizing real-world EHR data from a large academic healthcare system from October 2005 to September 2015, we investigated Natural Language Representation of the EHR data and systematically evaluated deep learning and traditional machine learning models to predict chronic cough patients. We built these machine learning models using structured data (medication and diagnosis) and unstructured data (clinical notes). RESULTS The sensitivity and specificity of a transformer-based deep learning algorithm, specifically BERT with attention model, was 0.856 and 0.866, respectively, using structured data (medication and diagnosis). Sensitivity and specificity improved to 0.952 and 0.930 when we combined structured data with symptoms extracted from clinical notes. We further found that the attention mechanism of deep learning models can be used to extract important features that drive the prediction decisions. Compared with our previously published rule-based algorithm, the deep learning algorithm can identify more chronic cough patients with structured data. CONCLUSIONS By applying deep learning models, chronic cough patients can be reliably identified for prospective or retrospective research through medication and diagnosis data, widely available in EHR and electronic claims data, thus improving the generalizability of the patient identification algorithm. Deep learning models can identify chronic cough patients with even higher sensitivity and specificity when structured and unstructured EHR data are utilized. We anticipate language-based data representation and deep learning models developed in this research could also be productively used for other disease prediction and case identification.
Collapse
Affiliation(s)
- Xiao Luo
- Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
| | - Priyanka Gandhi
- Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
| | - Zuoyi Zhang
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
| | - Wei Shao
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
| | - Zhi Han
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Vasu Chandrasekaran
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Vladimir Turzhitsky
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Vishal Bali
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Anna R Roberts
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Megan Metzger
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Jarod Baker
- Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| | - Carmen La Rosa
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Jessica Weaver
- Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
| | - Paul Dexter
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States; Eskenazi Health, 720 Eskenazi Ave, Indianapolis, IN 46202, United States.
| | - Kun Huang
- Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
| |
Collapse
|
50
|
Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021; 28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs. MATERIALS AND METHODS We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms. RESULTS Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations. DISCUSSION The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease. CONCLUSION Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.
Collapse
Affiliation(s)
- Hossein Estiri
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Zachary H Strasser
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|