1
|
Bandyopadhyay A, Albashayreh A, Zeinali N, Fan W, Gilbertson-White S. Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity. JAMIA Open 2024; 7:ooae082. [PMID: 39282082 PMCID: PMC11397936 DOI: 10.1093/jamiaopen/ooae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 08/09/2024] [Accepted: 09/05/2024] [Indexed: 09/18/2024] Open
Abstract
Objective This study uses electronic health record (EHR) data to predict 12 common cancer symptoms, assessing the efficacy of machine learning (ML) models in identifying symptom influencers. Materials and Methods We analyzed EHR data of 8156 adults diagnosed with cancer who underwent cancer treatment from 2017 to 2020. Structured and unstructured EHR data were sourced from the Enterprise Data Warehouse for Research at the University of Iowa Hospital and Clinics. Several predictive models, including logistic regression, random forest (RF), and XGBoost, were employed to forecast symptom development. The performances of the models were evaluated by F1-score and area under the curve (AUC) on the testing set. The SHapley Additive exPlanations framework was used to interpret these models and identify the predictive risk factors associated with fatigue as an exemplar. Results The RF model exhibited superior performance with a macro average AUC of 0.755 and an F1-score of 0.729 in predicting a range of cancer-related symptoms. For instance, the RF model achieved an AUC of 0.954 and an F1-score of 0.914 for pain prediction. Key predictive factors identified included clinical history, cancer characteristics, treatment modalities, and patient demographics depending on the symptom. For example, the odds ratio (OR) for fatigue was significantly influenced by allergy (OR = 2.3, 95% CI: 1.8-2.9) and colitis (OR = 1.9, 95% CI: 1.5-2.4). Discussion Our research emphasizes the critical integration of multimorbidity and patient characteristics in modeling cancer symptoms, revealing the considerable influence of chronic conditions beyond cancer itself. Conclusion We highlight the potential of ML for predicting cancer symptoms, suggesting a pathway for integrating such models into clinical systems to enhance personalized care and symptom management.
Collapse
Affiliation(s)
- Anindita Bandyopadhyay
- Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States
| | - Alaa Albashayreh
- College of Nursing, University of Iowa, Iowa City, IA 52242, United States
| | - Nahid Zeinali
- Department of Informatics, University of Iowa, Iowa City, IA 52242, United States
| | - Weiguo Fan
- Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States
| | | |
Collapse
|
2
|
Lam HYI, Ong XE, Mutwil M. Large language models in plant biology. TRENDS IN PLANT SCIENCE 2024; 29:1145-1155. [PMID: 38797656 DOI: 10.1016/j.tplants.2024.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/29/2024] [Accepted: 04/30/2024] [Indexed: 05/29/2024]
Abstract
Large language models (LLMs), such as ChatGPT, have taken the world by storm. However, LLMs are not limited to human language and can be used to analyze sequential data, such as DNA, protein, and gene expression. The resulting foundation models can be repurposed to identify the complex patterns within the data, resulting in powerful, multipurpose prediction tools able to predict the state of cellular systems. This review outlines the different types of LLMs and showcases their recent uses in biology. Since LLMs have not yet been embraced by the plant community, we also cover how these models can be deployed for the plant kingdom.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Xing Er Ong
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
3
|
Kamis A, Gadia N, Luo Z, Ng SX, Thumbar M. Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods. JMIR AI 2024; 3:e58455. [PMID: 39207843 PMCID: PMC11393512 DOI: 10.2196/58455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 07/09/2024] [Accepted: 07/10/2024] [Indexed: 09/04/2024]
Abstract
BACKGROUND Lung disease is a severe problem in the United States. Despite the decreasing rates of cigarette smoking, chronic obstructive pulmonary disease (COPD) continues to be a health burden in the United States. In this paper, we focus on COPD in the United States from 2016 to 2019. OBJECTIVE We gathered a diverse set of non-personally identifiable information from public data sources to better understand and predict COPD rates at the core-based statistical area (CBSA) level in the United States. Our objective was to compare linear models with machine learning models to obtain the most accurate and interpretable model of COPD. METHODS We integrated non-personally identifiable information from multiple Centers for Disease Control and Prevention sources and used them to analyze COPD with different types of methods. We included cigarette smoking, a well-known contributing factor, and race/ethnicity because health disparities among different races and ethnicities in the United States are also well known. The models also included the air quality index, education, employment, and economic variables. We fitted models with both multiple linear regression and machine learning methods. RESULTS The most accurate multiple linear regression model has variance explained of 81.1%, mean absolute error of 0.591, and symmetric mean absolute percentage error of 9.666. The most accurate machine learning model has variance explained of 85.7%, mean absolute error of 0.456, and symmetric mean absolute percentage error of 6.956. Overall, cigarette smoking and household income are the strongest predictor variables. Moderately strong predictors include education level and unemployment level, as well as American Indian or Alaska Native, Black, and Hispanic population percentages, all measured at the CBSA level. CONCLUSIONS This research highlights the importance of using diverse data sources as well as multiple methods to understand and predict COPD. The most accurate model was a gradient boosted tree, which captured nonlinearities in a model whose accuracy is superior to the best multiple linear regression. Our interpretable models suggest ways that individual predictor variables can be used in tailored interventions aimed at decreasing COPD rates in specific demographic and ethnographic communities. Gaps in understanding the health impacts of poor air quality, particularly in relation to climate change, suggest a need for further research to design interventions and improve public health.
Collapse
Affiliation(s)
- Arnold Kamis
- Brandeis International Business School, Brandeis University, Waltham, MA, United States
| | - Nidhi Gadia
- Brandeis International Business School, Brandeis University, Waltham, MA, United States
| | - Zilin Luo
- Brandeis International Business School, Brandeis University, Waltham, MA, United States
| | - Shu Xin Ng
- Brandeis International Business School, Brandeis University, Waltham, MA, United States
| | - Mansi Thumbar
- Brandeis International Business School, Brandeis University, Waltham, MA, United States
| |
Collapse
|
4
|
Shao Y, Huang B, Du L, Wang P, Li Z, Liu Z, Zhou L, Song Y, Chen X, Fang Z. Reliable automatic sleep stage classification based on hybrid intelligence. Comput Biol Med 2024; 173:108314. [PMID: 38513392 DOI: 10.1016/j.compbiomed.2024.108314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/10/2024] [Accepted: 03/12/2024] [Indexed: 03/23/2024]
Abstract
Sleep staging is a vital aspect of sleep assessment, serving as a critical tool for evaluating the quality of sleep and identifying sleep disorders. Manual sleep staging is a laborious process, while automatic sleep staging is seldom utilized in clinical practice due to issues related to the inadequate accuracy and interpretability of classification results in automatic sleep staging models. In this work, a hybrid intelligent model is presented for automatic sleep staging, which integrates data intelligence and knowledge intelligence, to attain a balance between accuracy, interpretability, and generalizability in the sleep stage classification. Specifically, it is built on any combination of typical electroencephalography (EEG) and electrooculography (EOG) channels, including a temporal fully convolutional network based on the U-Net architecture and a multi-task feature mapping structure. The experimental results show that, compared to current interpretable automatic sleep staging models, our model achieves a Macro-F1 score of 0.804 on the ISRUC dataset and 0.780 on the Sleep-EDFx dataset. Moreover, we use knowledge intelligence to address issues of excessive jumps and unreasonable sleep stage transitions in the coarse sleep graphs obtained by the model. We also explore the different ways knowledge intelligence affects coarse sleep graphs by combining different sleep graph correction methods. Our research can offer convenient support for sleep physicians, indicating its significant potential in improving the efficiency of clinical sleep staging.
Collapse
Affiliation(s)
- Yizi Shao
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Bokai Huang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Lidong Du
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, Beijing, China.
| | - Peng Wang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, Beijing, China.
| | - Zhenfeng Li
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, Beijing, China.
| | - Zhe Liu
- Hunan VentMed Medical Technology Co., Ltd, Shaoyang, China.
| | - Lei Zhou
- Qingpu Branch of Zhongshan Hospital, Fudan University, Shanghai, China.
| | - Yuanlin Song
- Zhongshan Hospital Fudan University, Shanghai, China.
| | - Xianxiang Chen
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, Beijing, China.
| | - Zhen Fang
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, Beijing, China.
| |
Collapse
|
5
|
Lin E, Lin CH, Lane HY. Inference of social cognition in schizophrenia patients with neurocognitive domains and neurocognitive tests using automated machine learning. Asian J Psychiatr 2024; 91:103866. [PMID: 38128351 DOI: 10.1016/j.ajp.2023.103866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/07/2023] [Accepted: 12/09/2023] [Indexed: 12/23/2023]
Abstract
AIM It has been suggested that single neurocognitive domain or neurocognitive test can be used to determine the overall cognitive function in schizophrenia using machine learning algorithms. It is unknown whether social cognition in schizophrenia patients can be estimated with machine learning based on neurocognitive domains or neurocognitive tests. METHODS To predict social cognition in schizophrenia, we applied an automated machine learning (AutoML) framework resulting from the analysis of predictive factors such as six neurocognitive domain scores and nine neurocognitive test scores of 380 schizophrenia patients in the Taiwanese population. Four clinical parameters (i.e., age, gender, subgroup, and education) were also used as predictive factors. We utilized an AutoML framework called Tree-based Pipeline Optimization Tool (TPOT) to generate predictive pipelines automatically. RESULTS The analysis revealed that all neurocognitive domains and tests except the reasoning and problem solving domain/test showed significant associations with social cognition. In addition, a TPOT-generated pipeline can best predict social cognition in schizophrenia using seven predictive factors, including five neurocognitive domains (i.e., speed of processing, sustained attention, working memory, verbal learning and memory, and visual learning and memory) and two clinical parameters (i.e., age and gender). This predictive pipeline consists of machine learning algorithms such as function transformers, an approximate feature map, independent component analysis, and linear regression. CONCLUSION The study indicates that an AutoML framework such as TPOT may provide a promising way to produce truly effective machine learning pipelines for predicting social cognition in schizophrenia using neurocognitive domains and/or neurocognitive tests.
Collapse
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA; Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan; Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung, Taiwan; School of Medicine, Chang Gung University, Taoyuan, Taiwan.
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan; Department of Psychiatry, China Medical University Hospital, Taichung, Taiwan; Brain Disease Research Center, China Medical University Hospital, Taichung, Taiwan; Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung, Taiwan.
| |
Collapse
|
6
|
Chulián S, Stolz BJ, Martínez-Rubio Á, Blázquez Goñi C, Rodríguez Gutiérrez JF, Caballero Velázquez T, Molinos Quintana Á, Ramírez Orellana M, Castillo Robleda A, Fuster Soler JL, Minguela Puras A, Martínez Sánchez MV, Rosa M, Pérez-García VM, Byrne HM. The shape of cancer relapse: Topological data analysis predicts recurrence in paediatric acute lymphoblastic leukaemia. PLoS Comput Biol 2023; 19:e1011329. [PMID: 37578973 PMCID: PMC10468039 DOI: 10.1371/journal.pcbi.1011329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 08/30/2023] [Accepted: 07/05/2023] [Indexed: 08/16/2023] Open
Abstract
Although children and adolescents with acute lymphoblastic leukaemia (ALL) have high survival rates, approximately 15-20% of patients relapse. Risk of relapse is routinely estimated at diagnosis by biological factors, including flow cytometry data. This high-dimensional data is typically manually assessed by projecting it onto a subset of biomarkers. Cell density and "empty spaces" in 2D projections of the data, i.e. regions devoid of cells, are then used for qualitative assessment. Here, we use topological data analysis (TDA), which quantifies shapes, including empty spaces, in data, to analyse pre-treatment ALL datasets with known patient outcomes. We combine these fully unsupervised analyses with Machine Learning (ML) to identify significant shape characteristics and demonstrate that they accurately predict risk of relapse, particularly for patients previously classified as 'low risk'. We independently confirm the predictive power of CD10, CD20, CD38, and CD45 as biomarkers for ALL diagnosis. Based on our analyses, we propose three increasingly detailed prognostic pipelines for analysing flow cytometry data from ALL patients depending on technical and technological availability: 1. Visual inspection of specific biological features in biparametric projections of the data; 2. Computation of quantitative topological descriptors of such projections; 3. A combined analysis, using TDA and ML, in the four-parameter space defined by CD10, CD20, CD38 and CD45. Our analyses readily extend to other haematological malignancies.
Collapse
Affiliation(s)
- Salvador Chulián
- Department of Mathematics, Universidad de Cádiz, Puerto Real (Cádiz), Spain
- Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Bernadette J. Stolz
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
- Laboratory for Topology and Neuroscience, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Álvaro Martínez-Rubio
- Department of Mathematics, Universidad de Cádiz, Puerto Real (Cádiz), Spain
- Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Cristina Blázquez Goñi
- Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Paediatric Haematology and Oncology, Hospital Universitario de Jerez, Jerez de la Frontera (Cádiz), Spain
- Department of Haematology, Hospital Universitario Vírgen del Rocío, Instituto de Biomedicina de Sevilla (IBIS), Sevilla, Spain
| | - Juan F. Rodríguez Gutiérrez
- Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Paediatric Haematology and Oncology, Hospital Universitario de Jerez, Jerez de la Frontera (Cádiz), Spain
| | - Teresa Caballero Velázquez
- Department of Haematology, Hospital Universitario Vírgen del Rocío, Instituto de Biomedicina de Sevilla (IBIS), Sevilla, Spain
- CSIC, University of Sevilla, Sevilla, Spain
| | - Águeda Molinos Quintana
- Department of Haematology, Hospital Universitario Vírgen del Rocío, Instituto de Biomedicina de Sevilla (IBIS), Sevilla, Spain
- CSIC, University of Sevilla, Sevilla, Spain
| | - Manuel Ramírez Orellana
- Department of Paediatric Haematology and Oncology, Hospital Infantil Universitario Niño Jesús - Instituto Investigación Sanitaria La Princesa, Madrid, Spain
| | - Ana Castillo Robleda
- Department of Paediatric Haematology and Oncology, Hospital Infantil Universitario Niño Jesús - Instituto Investigación Sanitaria La Princesa, Madrid, Spain
| | - José Luis Fuster Soler
- Department of Paediatric Haematology and Oncology, Hospital Clínico Universitario Virgen de la Arrixaca - Instituto Murciano de Investigación Biosanitaria (IMIB), Murcia, Spain
| | - Alfredo Minguela Puras
- Immunology Service, Hospital Clínico Universitario Virgen de la Arrixaca - Instituto Murciano de Investigación Biosanitaria (IMIB), Murcia, Spain
| | - María V. Martínez Sánchez
- Immunology Service, Hospital Clínico Universitario Virgen de la Arrixaca - Instituto Murciano de Investigación Biosanitaria (IMIB), Murcia, Spain
| | - María Rosa
- Department of Mathematics, Universidad de Cádiz, Puerto Real (Cádiz), Spain
- Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Víctor M. Pérez-García
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Ciudad Real, Spain
- Instituto de Matemática Aplicada a la Ciencia y la Ingeniería (IMACI), Universidad de Castilla-La Mancha, Ciudad Real, Spain
- ETSI Industriales, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Helen M. Byrne
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
7
|
Ouifak H, Idri A. On the performance and interpretability of Mamdani and Takagi-Sugeno-Kang based neuro-fuzzy systems for medical diagnosis. SCIENTIFIC AFRICAN 2023. [DOI: 10.1016/j.sciaf.2023.e01610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023] Open
|
8
|
Bull JA, Byrne HM. Quantification of spatial and phenotypic heterogeneity in an agent-based model of tumour-macrophage interactions. PLoS Comput Biol 2023; 19:e1010994. [PMID: 36972297 PMCID: PMC10079237 DOI: 10.1371/journal.pcbi.1010994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 04/06/2023] [Accepted: 03/04/2023] [Indexed: 03/29/2023] Open
Abstract
We introduce a new spatial statistic, the weighted pair correlation function (wPCF). The wPCF extends the existing pair correlation function (PCF) and cross-PCF to describe spatial relationships between points marked with combinations of discrete and continuous labels. We validate its use through application to a new agent-based model (ABM) which simulates interactions between macrophages and tumour cells. These interactions are influenced by the spatial positions of the cells and by macrophage phenotype, a continuous variable that ranges from anti-tumour to pro-tumour. By varying model parameters that regulate macrophage phenotype, we show that the ABM exhibits behaviours which resemble the 'three Es of cancer immunoediting': Equilibrium, Escape, and Elimination. We use the wPCF to analyse synthetic images generated by the ABM. We show that the wPCF generates a 'human readable' statistical summary of where macrophages with different phenotypes are located relative to both blood vessels and tumour cells. We also define a distinct 'PCF signature' that characterises each of the three Es of immunoediting, by combining wPCF measurements with the cross-PCF describing interactions between vessels and tumour cells. By applying dimension reduction techniques to this signature, we identify its key features and train a support vector machine classifier to distinguish between simulation outputs based on their PCF signature. This proof-of-concept study shows how multiple spatial statistics can be combined to analyse the complex spatial features that the ABM generates, and to partition them into interpretable groups. The intricate spatial features produced by the ABM are similar to those generated by state-of-the-art multiplex imaging techniques which distinguish the spatial distribution and intensity of multiple biomarkers in biological tissue regions. Applying methods such as the wPCF to multiplex imaging data would exploit the continuous variation in biomarker intensities and generate more detailed characterisation of the spatial and phenotypic heterogeneity in tissue samples.
Collapse
Affiliation(s)
- Joshua A. Bull
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Helen M. Byrne
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
- Ludwig Institute for Cancer Research, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Lu SC, Swisher CL, Chung C, Jaffray D, Sidey-Gibbons C. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol 2023; 13:1129380. [PMID: 36925929 PMCID: PMC10013157 DOI: 10.3389/fonc.2023.1129380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 02/14/2023] [Indexed: 03/04/2023] Open
Abstract
Machine learning-based tools are capable of guiding individualized clinical management and decision-making by providing predictions of a patient's future health state. Through their ability to model complex nonlinear relationships, ML algorithms can often outperform traditional statistical prediction approaches, but the use of nonlinear functions can mean that ML techniques may also be less interpretable than traditional statistical methodologies. While there are benefits of intrinsic interpretability, many model-agnostic approaches now exist and can provide insight into the way in which ML systems make decisions. In this paper, we describe how different algorithms can be interpreted and introduce some techniques for interpreting complex nonlinear algorithms.
Collapse
Affiliation(s)
- Sheng-Chieh Lu
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Christine L Swisher
- The Ronin Project, San Mateo, CA, United States.,The Lawrence J. Ellison Institute for Transformative Medicine, Los Angeles, CA, United States
| | - Caroline Chung
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - David Jaffray
- Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Chris Sidey-Gibbons
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
10
|
Lenatti M, Carlevaro A, Guergachi A, Keshavjee K, Mongelli M, Paglialonga A. A novel method to derive personalized minimum viable recommendations for type 2 diabetes prevention based on counterfactual explanations. PLoS One 2022; 17:e0272825. [PMCID: PMC9671330 DOI: 10.1371/journal.pone.0272825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/02/2022] [Indexed: 11/19/2022] Open
Abstract
Despite the growing availability of artificial intelligence models for predicting type 2 diabetes, there is still a lack of personalized approaches to quantify minimum viable changes in biomarkers that may help reduce the individual risk of developing disease. The aim of this article is to develop a new method, based on counterfactual explanations, to generate personalized recommendations to reduce the one-year risk of type 2 diabetes. Ten routinely collected biomarkers extracted from Electronic Medical Records of 2791 patients at low risk and 2791 patients at high risk of type 2 diabetes were analyzed. Two regions characterizing the two classes of patients were estimated using a Support Vector Data Description classifier. Counterfactual explanations (i.e., minimal changes in input features able to change the risk class) were generated for patients at high risk and evaluated using performance metrics (availability, validity, actionability, similarity, and discriminative power) and a qualitative survey administered to seven expert clinicians. Results showed that, on average, the requested minimum viable changes implied a significant reduction of fasting blood sugar, systolic blood pressure, and triglycerides and a significant increase of high-density lipoprotein in patients at risk of diabetes. A significant reduction in body mass index was also recommended in most of the patients at risk, except in females without hypertension. In general, greater changes were recommended in hypertensive patients compared to non-hypertensive ones. The experts were overall satisfied with the proposed approach although in some cases the proposed recommendations were deemed insufficient to reduce the risk in a clinically meaningful way. Future research will focus on a larger set of biomarkers and different comorbidities, also incorporating clinical guidelines whenever possible. Development of additional mathematical and clinical validation approaches will also be of paramount importance.
Collapse
Affiliation(s)
- Marta Lenatti
- Institute of Electronics, Information Engineering and Telecommunications (IEIIT), National Research Council of Italy (CNR), Rome, Italy
| | - Alberto Carlevaro
- Institute of Electronics, Information Engineering and Telecommunications (IEIIT), National Research Council of Italy (CNR), Rome, Italy
- Department of Electrical, Electronics and Telecommunications Engineering and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Aziz Guergachi
- Ted Rogers School of Management, Toronto Metropolitan University, Toronto, Canada
- Ted Rogers School of Information Technology Management, Toronto Metropolitan University, Toronto, Canada
- Department of Mathematics and Statistics, York University, Toronto, Canada
| | - Karim Keshavjee
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- * E-mail:
| | - Maurizio Mongelli
- Institute of Electronics, Information Engineering and Telecommunications (IEIIT), National Research Council of Italy (CNR), Rome, Italy
| | - Alessia Paglialonga
- Institute of Electronics, Information Engineering and Telecommunications (IEIIT), National Research Council of Italy (CNR), Rome, Italy
| |
Collapse
|