1
|
Sun B, Yew PY, Chi CL, Song M, Loth M, Zhang R, Straka RJ. Development and application of pharmacological statin-associated muscle symptoms phenotyping algorithms using structured and unstructured electronic health records data. JAMIA Open 2023; 6:ooad087. [PMID: 37881784 PMCID: PMC10597587 DOI: 10.1093/jamiaopen/ooad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/03/2023] [Accepted: 10/03/2023] [Indexed: 10/27/2023] Open
Abstract
Importance Statins are widely prescribed cholesterol-lowering medications in the United States, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation. Objectives In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview. Materials and Methods We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the published SAMS-Clinical Index tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best-performing algorithm to the statin cohort to identify SAMS. Results We identified 16 889 patients who started statins in the Fairview EHR system from 2010 to 2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, and use of immunosuppressants or fibrates. Discussion and Conclusion Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort to enable further analysis which can lead to the development of a SAMS risk prediction model.
Collapse
Affiliation(s)
- Boguang Sun
- Department of Experimental and Clinical Pharmacology, University of Minnesota College of Pharmacy, Minneapolis, MN 55455, United States
| | - Pui Ying Yew
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
| | - Chih-Lin Chi
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
- School of Nursing, University of Minnesota, Minneapolis, MN 55455, United States
| | - Meijia Song
- School of Nursing, University of Minnesota, Minneapolis, MN 55455, United States
| | - Matt Loth
- Center for Learning Health System Sciences, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
- Center for Learning Health System Sciences, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Robert J Straka
- Department of Experimental and Clinical Pharmacology, University of Minnesota College of Pharmacy, Minneapolis, MN 55455, United States
| |
Collapse
|
2
|
Peng L, Luo G, Walker A, Zaiman Z, Jones EK, Gupta H, Kersten K, Burns JL, Harle CA, Magoc T, Shickel B, Steenburg SD, Loftus T, Melton GB, Gichoya JW, Sun J, Tignanelli CJ. Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals. J Am Med Inform Assoc 2022; 30:54-63. [PMID: 36214629 PMCID: PMC9619688 DOI: 10.1093/jamia/ocac188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 08/31/2022] [Accepted: 10/07/2022] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. MATERIALS AND METHODS We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). RESULTS We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. CONCLUSION FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.
Collapse
Affiliation(s)
- Le Peng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Gaoxiang Luo
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew Walker
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Zachary Zaiman
- Department of Computer Science, Emory University, Atlanta, Georgia, USA
| | - Emma K Jones
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| | - Hemant Gupta
- Fairview Health Services, Minneapolis, Minnesota, USA
| | | | - John L Burns
- The School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - Christopher A Harle
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Tanja Magoc
- University of Florida College of Medicine, Gainesville, Florida, USA
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, Florida, USA
- Intelligent Critical Care Center, University of Florida, Gainesville, Florida, USA
| | - Scott D Steenburg
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Tyler Loftus
- Intelligent Critical Care Center, University of Florida, Gainesville, Florida, USA
- Department of Surgery, University of Florida, Gainesville, Florida, USA
| | - Genevieve B Melton
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
- Fairview Health Services, Minneapolis, Minnesota, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, Minnesota, USA
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | | | - Ju Sun
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Christopher J Tignanelli
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, Minnesota, USA
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
3
|
Cai M, Yue M, Chen T, Liu J, Forno E, Lu X, Billiar T, Celedón J, McKennan C, Chen W, Wang J. Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics 2022; 38:3004-3010. [PMID: 35438146 PMCID: PMC9991889 DOI: 10.1093/bioinformatics/btac279] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/22/2022] [Accepted: 04/13/2022] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. RESULTS To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. AVAILABILITY AND IMPLEMENTATION EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. The RNA microarray data from the TRAUMA study are available and can be accessed in GEO (GSE36809). The demographic and clinical phenotypes can be shared on reasonable request to the corresponding authors. The RNA-seq data from the EVAPR study cannot be shared publicly due to the privacy of individuals that participated in the clinical research in compliance with the IRB approval at the University of Pittsburgh. The RNA microarray data from the FHS study are available from dbGaP (phs000007.v32.p13). The RNA-seq data from ROS study is downloaded from AD Knowledge Portal. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manqi Cai
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Molin Yue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Tianmeng Chen
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Jinling Liu
- Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology, Rolla, MO 65409, USA
- Department of Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65409, USA
| | - Erick Forno
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| | - Timothy Billiar
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Juan Celedón
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Chris McKennan
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wei Chen
- Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|