1
|
Kryukov M, Moriarty KP, Villamea M, O'Dwyer I, Chow O, Dormont F, Hernandez R, Bar-Joseph Z, Rufino B. Proxy endpoints - bridging clinical trials and real world data. J Biomed Inform 2024; 158:104723. [PMID: 39299565 DOI: 10.1016/j.jbi.2024.104723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/15/2024] [Accepted: 09/03/2024] [Indexed: 09/22/2024]
Abstract
OBJECTIVE Disease severity scores, or endpoints, are routinely measured during Randomized Controlled Trials (RCTs) to closely monitor the effect of treatment. In real-world clinical practice, although a larger set of patients is observed, the specific RCT endpoints are often not captured, which makes it hard to utilize real-world data (RWD) to evaluate drug efficacy in larger populations. METHODS To overcome this challenge, we developed an ensemble technique which learns proxy models of disease endpoints in RWD. Using a multi-stage learning framework applied to RCT data, we first identify features considered significant drivers of disease available within RWD. To create endpoint proxy models, we use Explainable Boosting Machines (EBMs) which allow for both end-user interpretability and modeling of non-linear relationships. RESULTS We demonstrate our approach on two diseases, rheumatoid arthritis (RA) and atopic dermatitis (AD). As we show, our combined feature selection and prediction method achieves good results for both disease areas, improving upon prior methods proposed for predictive disease severity scoring. CONCLUSION Having disease severity over time for a patient is important to further disease understanding and management. Our results open the door to more use cases in the space of RA and AD such as treatment effect estimates or prognostic scoring on RWD. Our framework may be extended beyond RA and AD to other diseases where the severity score is not well measured in electronic health records.
Collapse
Affiliation(s)
- Maxim Kryukov
- Data & Computational Science, R&D, Sanofi, Barcelona, Spain.
| | - Kathleen P Moriarty
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| | | | - Ingrid O'Dwyer
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| | - Ohn Chow
- Clinical Immunology and Inflammation, R&D, Sanofi, 450 Water St, MA, Cambridge, 02141, MA, United States.
| | - Flavio Dormont
- Clinical Real World Evidence, R&D, Sanofi, 46 Av. de la Grande Armée, Paris, 75017, Île-de-France, France.
| | - Ramon Hernandez
- Clinical Real World Evidence, R&D, Sanofi, 46 Av. de la Grande Armée, Paris, 75017, Île-de-France, France.
| | - Ziv Bar-Joseph
- Data & Computational Science, R&D, Sanofi, 450 Water St, MA, Cambridge, 02141, MA, United States.
| | - Brandon Rufino
- Data & Computational Science, R&D, Sanofi, 240 Richmond Street West, 3rd Floor, Toronto, M5V 1V6, Ontario, Canada.
| |
Collapse
|
2
|
Zuo D, Yang L, Jin Y, Qi H, Liu Y, Ren L. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Med Inform Decis Mak 2023; 23:276. [PMID: 38031071 PMCID: PMC10688055 DOI: 10.1186/s12911-023-02377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open
Abstract
Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer.
Collapse
Affiliation(s)
- Duo Zuo
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Lexin Yang
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Yu Jin
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- Tongji University Cancer Center, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Huan Qi
- China Mobile Group Tianjin Company Limited, Tianjin, 300308, China
| | - Yahui Liu
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Li Ren
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China.
- National Clinical Research Center for Cancer, Tianjin, 300060, China.
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China.
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China.
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China.
| |
Collapse
|
3
|
Wang DC, Xu WD, Qin Z, Fu L, Lan YY, Liu XY, Huang AF. Systemic lupus erythematosus with high disease activity identification based on machine learning. Inflamm Res 2023; 72:1909-1918. [PMID: 37725103 DOI: 10.1007/s00011-023-01793-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/21/2023] Open
Abstract
OBJECTIVE Clinical evaluation of systemic lupus erythematosus (SLE) disease activity is limited and inconsistent, and high disease activity significantly, seriously impacts on SLE patients. This study aims to generate a machine learning model to identify SLE patients with high disease activity. METHOD A total of 1014 SLE patients with low disease activity and 453 SLE patients with high disease activity were included. A total of 94 clinical, laboratory data and 17 meteorological indicators were collected. After data preprocessing, we use mutual information and multisurf to evaluate and select the importance of features. The selected features are used for machine learning modeling. Performance of the model is evaluated and verified by a series of binary classification indicators. RESULTS We screened out hematuria, proteinuria, pyuria, low complement, precipitation, sunlight and other features for model construction by integrated feature selection. After hyperparameter optimization, the LGB has the best performance (ROC: AUC = 0.930; PRC: AUC = 0.911, APS = 0.913; balance accuracy: 0.856), and the worst is the naive bayes (ROC: AUC = 0.849; PRC: AUC = 0.719, APS = 0.714; balance accuracy: 0.705). Finally, the selection of features has good consistency in the composite feature importance bar plot. CONCLUSION We identify SLE patients with high disease activity by a simple machine learning pipeline, especially the LGB model based on the characteristics of proteinuria, hematuria, pyuria and other feathers screened out by collective feature selection.
Collapse
Affiliation(s)
- Da-Cheng Wang
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - Wang-Dong Xu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China.
| | - Zhen Qin
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Lu Fu
- Laboratory Animal Center, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - You-Yu Lan
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Xiao-Yan Liu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - An-Fang Huang
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China.
| |
Collapse
|