1
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Meid AD, Wirbka L, Groll A, Haefeli WE. Can Machine Learning from Real-World Data Support Drug Treatment Decisions? A Prediction Modeling Case for Direct Oral Anticoagulants. Med Decis Making 2021; 42:587-598. [PMID: 34911402 PMCID: PMC9189725 DOI: 10.1177/0272989x211064604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Decision making for the "best" treatment is particularly challenging in situations in which individual patient response to drugs can largely differ from average treatment effects. By estimating individual treatment effects (ITEs), we aimed to demonstrate how strokes, major bleeding events, and a composite of both could be reduced by model-assisted recommendations for a particular direct oral anticoagulant (DOAC). METHODS In German claims data for the calendar years 2014-2018, we selected 29 901 new users of the DOACs rivaroxaban and apixaban. Random forests considered binary events within 1 y to estimate ITEs under each DOAC according to the X-learner algorithm with 29 potential effect modifiers; treatment recommendations were based on these estimated ITEs. Model performance was evaluated by the c-for-benefit statistics, absolute risk reduction (ARR), and absolute risk difference (ARD) by trial emulation. RESULTS A significant proportion of patients would be recommended a different treatment option than they actually received. The stroke model significantly discriminated patients for higher benefit and thus indicated improved decisions by reduced outcomes (c-for-benefit: 0.56; 95% confidence interval [0.52; 0.60]). In the group with apixaban recommendation, the model also improved the composite endpoint (ARR: 1.69 % [0.39; 2.97]). In trial emulations, model-assisted recommendations significantly reduced the composite event rate (ARD: -0.78 % [-1.40; -0.03]). CONCLUSIONS If prescribers are undecided about the potential benefits of different treatment options, ITEs can support decision making, especially if evidence is inconclusive, risk-benefit profiles of therapeutic alternatives differ significantly, and the patients' complexity deviates from "typical" study populations. In the exemplary case for DOACs and potentially in other situations, the significant impact could also become practically relevant if recommendations were available in an automated way as part of decision making.HighlightsIt was possible to calculate individual treatment effects (ITEs) from routine claims data for rivaroxaban and apixaban, and the characteristics between the groups with recommendation for one or the other option differed significantly.ITEs resulted in recommendations that were significantly superior to usual (observed) treatment allocations in terms of absolute risk reduction, both separately for stroke and in the composite endpoint of stroke and major bleeding.When similar patients from routine data were selected (precision cohorts) for patients with a strong recommendation for one option or the other, those similar patients under the respective recommendation showed a significantly better prognosis compared with the alternative option.Many steps may still be needed on the way to clinical practice, but the principle of decision support developed from routine data may point the way toward future decision-making processes.
Collapse
Affiliation(s)
- Andreas D Meid
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| | - Lucas Wirbka
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| | | | - Andreas Groll
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| | - Walter E Haefeli
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| |
Collapse
|
3
|
Zhong H, Loukides G, Pissis SP. Clustering demographics and sequences of diagnosis codes. IEEE J Biomed Health Inform 2021; 26:2351-2359. [PMID: 34797768 DOI: 10.1109/jbhi.2021.3129461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A Relational-Sequential dataset (or RS-dataset for short) contains records comprised of a patients values in demographic attributes and their sequence of diagnosis codes. The task of clustering an RS-dataset is helpful for analyses ranging from pattern mining to classification. However, existing methods are not appropriate to perform this task. Thus, we initiate a study of how an RS-dataset can be clustered effectively and efficiently. We formalize the task of clustering an RS-dataset as an optimization problem. At the heart of the problem is a distance measure we design to quantify the pairwise similarity between records of an RS-dataset. Our measure uses a tree structure that encodes hierarchical relationships between records, based on their demographics, as well as an edit-distance-like measure that captures both the sequentiality and the semantic similarity of diagnosis codes. We also develop an algorithm which first identifies k representative records (centers), for a given k, and then constructs clusters, each containing one center and the records that are closer to the center compared to other centers. Experiments using two Electronic Health Record datasets demonstrate that our algorithm constructs compact and well-separated clusters, which preserve meaningful relationships between demographics and sequences of diagnosis codes, while being efficient and scalable.
Collapse
|
4
|
Personalized treatment options for chronic diseases using precision cohort analytics. Sci Rep 2021; 11:1139. [PMID: 33441956 PMCID: PMC7806725 DOI: 10.1038/s41598-021-80967-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/31/2020] [Indexed: 12/15/2022] Open
Abstract
To support point-of-care decision making by presenting outcomes of past treatment choices for cohorts of similar patients based on observational data from electronic health records (EHRs), a machine-learning precision cohort treatment option (PCTO) workflow consisting of (1) data extraction, (2) similarity model training, (3) precision cohort identification, and (4) treatment options analysis was developed. The similarity model is used to dynamically create a cohort of similar patients, to inform clinical decisions about an individual patient. The workflow was implemented using EHR data from a large health care provider for three different highly prevalent chronic diseases: hypertension (HTN), type 2 diabetes mellitus (T2DM), and hyperlipidemia (HL). A retrospective analysis demonstrated that treatment options with better outcomes were available for a majority of cases (75%, 74%, 85% for HTN, T2DM, HL, respectively). The models for HTN and T2DM were deployed in a pilot study with primary care physicians using it during clinic visits. A novel data-analytic workflow was developed to create patient-similarity models that dynamically generate personalized treatment insights at the point-of-care. By leveraging both knowledge-driven treatment guidelines and data-driven EHR data, physicians can incorporate real-world evidence in their medical decision-making process when considering treatment options for individual patients.
Collapse
|
5
|
Meid AD, Ruff C, Wirbka L, Stoll F, Seidling HM, Groll A, Haefeli WE. Using the Causal Inference Framework to Support Individualized Drug Treatment Decisions Based on Observational Healthcare Data. Clin Epidemiol 2020; 12:1223-1234. [PMID: 33173350 PMCID: PMC7646479 DOI: 10.2147/clep.s274466] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/08/2020] [Indexed: 01/02/2023] Open
Abstract
When healthcare professionals have the choice between several drug treatments for their patients, they often experience considerable decision uncertainty because many decisions simply have no single “best” choice. The challenges are manifold and include that guideline recommendations focus on randomized controlled trials whose populations do not necessarily correspond to specific patients in everyday treatment. Further reasons may be insufficient evidence on outcomes, lack of direct comparison of distinct options, and the need to individually balance benefits and risks. All these situations will occur in routine care, its outcomes will be mirrored in routine data, and could thus be used to guide decisions. We propose a concept to facilitate decision-making by exploiting this wealth of information. Our working example for illustration assumes that the response to a particular (drug) treatment can substantially differ between individual patients depending on their characteristics (heterogeneous treatment effects, HTE), and that decisions will be more precise if they are based on real-world evidence of HTE considering this information. However, such methods must account for confounding by indication and effect measure modification, eg, by adequately using machine learning methods or parametric regressions to estimate individual responses to pharmacological treatments. The better a model assesses the underlying HTE, the more accurate are predicted probabilities of treatment response. After probabilities for treatment-related benefit and harm have been calculated, decision rules can be applied and patient preferences can be considered to provide individual recommendations. Emulated trials in observational data are a straightforward technique to predict the effects of such decision rules when applied in routine care. Prediction-based decision rules from routine data have the potential to efficiently supplement clinical guidelines and support healthcare professionals in creating personalized treatment plans using decision support tools.
Collapse
Affiliation(s)
- Andreas D Meid
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany
| | - Carmen Ruff
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany
| | - Lucas Wirbka
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany
| | - Felicitas Stoll
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany
| | - Hanna M Seidling
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany.,Cooperation Unit Clinical Pharmacy, University of Heidelberg, Heidelberg 69120, Germany
| | - Andreas Groll
- Department of Statistics, TU Dortmund University, Dortmund 44227, Germany
| | - Walter E Haefeli
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69120, Germany.,Cooperation Unit Clinical Pharmacy, University of Heidelberg, Heidelberg 69120, Germany
| |
Collapse
|