1
|
Jafari E, Blackman MH, Karnes JH, Van Driest SL, Crawford DC, Choi L, McDonough CW. Using electronic health records for clinical pharmacology research: Challenges and considerations. Clin Transl Sci 2024; 17:e13871. [PMID: 38943244 PMCID: PMC11213823 DOI: 10.1111/cts.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 07/01/2024] Open
Abstract
Electronic health records (EHRs) contain a vast array of phenotypic data on large numbers of individuals, often collected over decades. Due to the wealth of information, EHR data have emerged as a powerful resource to make first discoveries and identify disparities in our healthcare system. While the number of EHR-based studies has exploded in recent years, most of these studies are directed at associations with disease rather than pharmacotherapeutic outcomes, such as drug response or adverse drug reactions. This is largely due to challenges specific to deriving drug-related phenotypes from the EHR. There is great potential for EHR-based discovery in clinical pharmacology research, and there is a critical need to address specific challenges related to accurate and reproducible derivation of drug-related phenotypes from the EHR. This review provides a detailed evaluation of challenges and considerations for deriving drug-related data from EHRs. We provide an examination of EHR-based computable phenotypes and discuss cutting-edge approaches to map medication information for clinical pharmacology research, including medication-based computable phenotypes and natural language processing. We also discuss additional considerations such as data structure, heterogeneity and missing data, rare phenotypes, and diversity within the EHR. By further understanding the complexities associated with conducting clinical pharmacology research using EHR-based data, investigators will be better equipped to design thoughtful studies with more reproducible results. Progress in utilizing EHRs for clinical pharmacology research should lead to significant advances in our ability to understand differential drug response and predict adverse drug reactions.
Collapse
Affiliation(s)
- Eissa Jafari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
- Department of Pharmacy Practice, College of PharmacyJazan UniversityJazanSaudi Arabia
| | - Marisa H. Blackman
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jason H. Karnes
- Department of Pharmacy Practice and ScienceUniversity of Arizona R. Ken Coit College of PharmacyTucsonArizonaUSA
| | - Sara L. Van Driest
- Department of PediatricsVanderbilt University Medical Center (VUMC)NashvilleTennesseeUSA
- Present address:
All of US Research Program, National Institutes of HealthBethesdaMarylandUSA
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
- Department of Genetics and Genome Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
| | - Leena Choi
- Department of Biostatistics and Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Caitrin W. McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
2
|
Shuey MM, Lee KM, Keaton J, Khankari NK, Breeyear JH, Walker VM, Miller DR, Heberer KR, Reaven PD, Clarke SL, Lee J, Lynch JA, Vujkovic M, Edwards TL. A genetically supported drug repurposing pipeline for diabetes treatment using electronic health records. EBioMedicine 2023; 94:104674. [PMID: 37399599 PMCID: PMC10328805 DOI: 10.1016/j.ebiom.2023.104674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 06/06/2023] [Accepted: 06/07/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND The identification of new uses for existing drug therapies has the potential to identify treatments for comorbid conditions that have the added benefit of glycemic control while also providing a rapid, low-cost approach to drug (re)discovery. METHODS We developed and tested a genetically-informed drug-repurposing pipeline for diabetes management. This approach mapped genetically-predicted gene expression signals from the largest genome-wide association study for type 2 diabetes mellitus to drug targets using publicly available databases to identify drug-gene pairs. These drug-gene pairs were then validated using a two-step approach: 1) a self-controlled case-series (SCCS) using electronic health records from a discovery and replication population, and 2) Mendelian randomization (MR). FINDINGS After filtering on sample size, 20 candidate drug-gene pairs were validated and various medications demonstrated evidence of glycemic regulation including two anti-hypertensive classes: angiotensin-converting enzyme inhibitors as well as calcium channel blockers (CCBs). The CCBs demonstrated the strongest evidence of glycemic reduction in both validation approaches (SCCS HbA1c and glucose reduction: -0.11%, p = 0.01 and -0.85 mg/dL, p = 0.02, respectively; MR: OR = 0.84, 95% CI = 0.81, 0.87, p = 5.0 x 10-25). INTERPRETATION Our results support CCBs as a strong candidate medication for blood glucose reduction in addition to cardiovascular disease reduction. Further, these results support the adaptation of this approach for use in future drug-repurposing efforts for other conditions. FUNDING National Institutes of Health, Medical Research Council Integrative Epidemiology Unit at the University of Bristol, UK Medical Research Council, American Heart Association, and Department of Veterans Affairs (VA) Informatics and Computing Infrastructure and VA Cooperative Studies Program.
Collapse
Affiliation(s)
- Megan M Shuey
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kyung Min Lee
- VA Informatics and Computer Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Jacob Keaton
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA; Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nikhil K Khankari
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joseph H Breeyear
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA; Nashville VA Medical Center, Nashville, TN, USA
| | - Venexia M Walker
- Medical Research Council, Integrative Epidemiology Unit, University of Bristol, Bristol, UK; Bristol Medical School, UK; Population Health Sciences, University of Bristol, Bristol, UK; Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Donald R Miller
- Center for Healthcare Organization and Implementation Research, Bedford VA Healthcare System, Bedford, MA, USA; Center for Population Health, Department of Biomedical and Nutritional Sciences, University of Massachusetts, Lowell, MA, USA
| | - Kent R Heberer
- VA Palo Alto Health Care System, Palo Alto, CA, USA; Departments of Medicine and Endocrinology, Stanford University School of Medicine, Stanford, CA, USA
| | - Peter D Reaven
- Phoenix VA Health Care System, Phoenix, AZ, USA; College of Medicine, University of Arizona, Phoenix, AZ, USA
| | - Shoa L Clarke
- Departments of Medicine and Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Jennifer Lee
- VA Palo Alto Health Care System, Palo Alto, CA, USA
| | - Julie A Lynch
- VA Informatics and Computer Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA; School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Marijana Vujkovic
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA; Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
| | - Todd L Edwards
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA; Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Nashville VA Medical Center, Nashville, TN, USA.
| |
Collapse
|
3
|
Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun 2021; 12:711. [PMID: 33514699 PMCID: PMC7846756 DOI: 10.1038/s41467-021-20910-4] [Citation(s) in RCA: 98] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 12/28/2020] [Indexed: 12/20/2022] Open
Abstract
Sepsis is a leading cause of death in hospitals. Early prediction and diagnosis of sepsis, which is critical in reducing mortality, is challenging as many of its signs and symptoms are similar to other less critical conditions. We develop an artificial intelligence algorithm, SERA algorithm, which uses both structured data and unstructured clinical notes to predict and diagnose sepsis. We test this algorithm with independent, clinical notes and achieve high predictive accuracy 12 hours before the onset of sepsis (AUC 0.94, sensitivity 0.87 and specificity 0.87). We compare the SERA algorithm against physician predictions and show the algorithm's potential to increase the early detection of sepsis by up to 32% and reduce false positives by up to 17%. Mining unstructured clinical notes is shown to improve the algorithm's accuracy compared to using only clinical measures for early warning 12 to 48 hours before the onset of sepsis.
Collapse
|
4
|
Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, Zhao Y, Sohn S, Liu H. Clinical concept extraction: A methodology review. J Biomed Inform 2020; 109:103526. [PMID: 32768446 PMCID: PMC7746475 DOI: 10.1016/j.jbi.2020.103526] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 07/30/2020] [Accepted: 08/02/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - David Chen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Huan He
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Kevin J Peterson
- Department of Information Technology, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| |
Collapse
|
5
|
Liu S, Nie W, Gao D, Yang H, Yan J, Hao T. Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01160-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
6
|
Shuey M, Perkins B, Nian H, Yu C, Luther JM, Brown N. Retrospective cohort study to characterise the blood pressure response to spironolactone in patients with apparent therapy-resistant hypertension using electronic medical record data. BMJ Open 2020; 10:e033100. [PMID: 32461291 PMCID: PMC7259833 DOI: 10.1136/bmjopen-2019-033100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
OBJECTIVE Identify blood pressure (BP) response to spironolactone in patients with apparent therapy-resistant hypertension (aTRH) using electronic medical records (EMRs) in order to estimate response in a real-world clinical setting. DESIGN Developed an algorithm to determine BP and electrolyte response to spironolactone for use in a retrospective cohort study. SETTING An academic medical centre in Nashville, Tennessee. POPULATION Patients with aTRH prescribed spironolactone. MAIN OUTCOME MEASURES Baseline BP and BP response, determined as the change in mean systolic BP (SBP) and diastolic BP (DBP) following spironolactone initiation. Additional response measures were serum sodium, potassium and creatinine, estimated glomerular filtration rate, haemoglobin A1c (HbA1c), glucose, high-density lipoprotein, low-density lipoprotein and triglycerides. Demographic characteristics included race, age, gender, body mass index (BMI), diabetes mellitus, chronic kidney disease stage 3, ischaemic heart disease and smoking. RESULTS The mean decreases in SBP and DBP were 8.1 and 3.4 mm Hg, consistent with clinical trial data. Using a mean decrease in SBP of 5 mm Hg or in DBP of 2 mm Hg to define 'responders', 30.3% of patients did not respond. In univariable analyses, responders had higher BMI, baseline SBP, DBP, sodium and HbA1c, and lower creatinine. In multivariable analysis, responders were older and had significantly higher BMI and baseline SBP and DBP, and lower potassium. Increases in potassium and creatinine following spironolactone were larger in responders. When BP was evaluated as a continuous variable, decreases in SBP and DBP correlated with baseline BP, decrease in sodium and increases in potassium and creatinine following spironolactone. The decrease in SBP was associated with decreasing glucose in European Americans. CONCLUSIONS We developed an algorithm to assess BP response to a commonly prescribed medication for aTRH using EMRs. Electrolyte changes associated with the BP response to spironolactone are consistent with its mechanism of action of blocking the mineralocorticoid receptor and decreasing epithelial sodium channel activity.
Collapse
Affiliation(s)
- Megan Shuey
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Bradley Perkins
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Hui Nian
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Chang Yu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - James M Luther
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Nancy Brown
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Medicine, Yale School of Medicine, New Haven, CT, United States
| |
Collapse
|
7
|
Fu S, Leung LY, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inform Decis Mak 2020; 20:60. [PMID: 32228556 PMCID: PMC7106829 DOI: 10.1186/s12911-020-1072-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 03/12/2020] [Indexed: 01/14/2023] Open
Abstract
Background The rapid adoption of electronic health records (EHRs) holds great promise for advancing medicine through practice-based knowledge discovery. However, the validity of EHR-based clinical research is questionable due to poor research reproducibility caused by the heterogeneity and complexity of healthcare institutions and EHR systems, the cross-disciplinary nature of the research team, and the lack of standard processes and best practices for conducting EHR-based clinical research. Method We developed a data abstraction framework to standardize the process for multi-site EHR-based clinical studies aiming to enhance research reproducibility. The framework was implemented for a multi-site EHR-based research project, the ESPRESSO project, with the goal to identify individuals with silent brain infarctions (SBI) at Tufts Medical Center (TMC) and Mayo Clinic. The heterogeneity of healthcare institutions, EHR systems, documentation, and process variation in case identification was assessed quantitatively and qualitatively. Result We discovered a significant variation in the patient populations, neuroimaging reporting, EHR systems, and abstraction processes across the two sites. The prevalence of SBI for patients over age 50 for TMC and Mayo is 7.4 and 12.5% respectively. There is a variation regarding neuroimaging reporting where TMC are lengthy, standardized and descriptive while Mayo’s reports are short and definitive with more textual variations. Furthermore, differences in the EHR system, technology infrastructure, and data collection process were identified. Conclusion The implementation of the framework identified the institutional and process variations and the heterogeneity of EHRs across the sites participating in the case study. The experiment demonstrates the necessity to have a standardized process for data abstraction when conducting EHR-based clinical studies.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | | | | | | | | | | | | | - Paul R Kingsbury
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - David M Kent
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
8
|
Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Applicability of Machine Learning Methods to Multi-label Medical Text Classification. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7303696 DOI: 10.1007/978-3-030-50423-6_38] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Structuring medical text using international standards allows to improve interoperability and quality of predictive modelling. Medical text classification task facilitates information extraction. In this work we investigate the applicability of several machine learning models and classifier chains (CC) to medical unstructured text classification. The experimental study was performed on a corpus of 11671 manually labeled Russian medical notes. The results showed that using CC strategy allows to improve classification performance. Ensemble of classifier chains based on linear SVC showed the best result: 0.924 micro F-measure, 0.872 micro precision and 0.927 micro recall.
Collapse
|
9
|
Sinnott JA, Cai F, Yu S, Hejblum BP, Hong C, Kohane IS, Liao KP. PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc 2019; 25:1359-1365. [PMID: 29788308 DOI: 10.1093/jamia/ocy056] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Accepted: 04/23/2018] [Indexed: 12/24/2022] Open
Abstract
Objective Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. Methods The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. Results Among n = 1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (≥1 code), 0.123 (≥2 codes), and 0.142 (≥3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p = .001. Conclusions PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.
Collapse
Affiliation(s)
| | - Fiona Cai
- Stuyvesant High School, New York City, NY, USA
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Boris P Hejblum
- Univ. Bordeaux, ISPED, Inserm BPH 1219, Inria SISTM, Bordeaux, France
| | - Chuan Hong
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Children's Hospital Boston, Boston, MA, USA
| | - Katherine P Liao
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA, USA
| |
Collapse
|
10
|
Oni-Orisan A, Hoffmann TJ, Ranatunga D, Medina MW, Jorgenson E, Schaefer C, Krauss RM, Iribarren C, Risch N. Characterization of Statin Low-Density Lipoprotein Cholesterol Dose-Response Using Electronic Health Records in a Large Population-Based Cohort. CIRCULATION-GENOMIC AND PRECISION MEDICINE 2019; 11:e002043. [PMID: 30354326 DOI: 10.1161/circgen.117.002043] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
BACKGROUND Low-density lipoprotein cholesterol (LDL-C) response to statin therapy has not been fully elucidated in real-world populations. The primary objective of this study was to characterize statin LDL-C dose-response and its heritability in a large, multiethnic population of statin users. METHODS We determined the effect of statin dosing on lipid measures utilizing electronic health records in 33 139 statin users from the Kaiser Permanente GERA cohort (Genetic Epidemiology Research on Adult Health and Aging). The relationship between statin defined daily dose and lipid parameter response (percent change) was determined. RESULTS Defined daily dose and LDL-C response was associated in a log-linear relationship (β, -6.17; SE, 0.09; P<10-300) which remained significant after adjusting for prespecified covariates (adjusted β, -5.59; SE, 0.12; P<10-300). Statin type, sex, age, smoking status, diabetes mellitus, and East Asian race/ethnicity were significant independent predictors of statin-induced changes in LDL-C. Based on a variance-component method within the subset of statin users who had at least 1 first-degree relative who was also a statin user (n=1036), heritability of statin LDL-C response was estimated at 11.7% (SE, 8.6%; P=0.087). CONCLUSIONS Using electronic health record data, we observed a statin LDL-C dose-response consistent with the rule of 6% from prior clinical trial data. Clinical and demographic predictors of statin LDL-C response exhibited highly significant but modest effects. Finally, statin-induced changes in LDL-C were not found to be strongly inherited. Ultimately, these findings demonstrate (1) the utility of electronic health records as a reliable source to generate robust phenotypes for pharmacogenomic research and (2) the potential role of statin precision medicine in lipid management.
Collapse
Affiliation(s)
- Akinyemi Oni-Orisan
- Department of Clinical Pharmacy (A.O.), University of California, San Francisco, CA.,Institute for Human Genetics (A.O., T.J.H., N.R.), University of California, San Francisco, CA
| | - Thomas J Hoffmann
- Institute for Human Genetics (A.O., T.J.H., N.R.), University of California, San Francisco, CA.,Department of Epidemiology and Biostatistics (T.J.H., C.I., N.R.), University of California, San Francisco, CA
| | - Dilrini Ranatunga
- Kaiser Permanente Northern California Division of Research, Oakland, CA (D.R., E.J., C.S., C.I., N.R.)
| | - Marisa W Medina
- Children's Hospital Oakland Research Institute, Oakland, CA (M.W.M., R.M.K.)
| | - Eric Jorgenson
- Kaiser Permanente Northern California Division of Research, Oakland, CA (D.R., E.J., C.S., C.I., N.R.)
| | - Catherine Schaefer
- Kaiser Permanente Northern California Division of Research, Oakland, CA (D.R., E.J., C.S., C.I., N.R.)
| | - Ronald M Krauss
- Department of Medicine (R.M.K.), University of California, San Francisco, CA.,Children's Hospital Oakland Research Institute, Oakland, CA (M.W.M., R.M.K.)
| | - Carlos Iribarren
- Department of Epidemiology and Biostatistics (T.J.H., C.I., N.R.), University of California, San Francisco, CA.,Kaiser Permanente Northern California Division of Research, Oakland, CA (D.R., E.J., C.S., C.I., N.R.)
| | - Neil Risch
- Institute for Human Genetics (A.O., T.J.H., N.R.), University of California, San Francisco, CA.,Department of Epidemiology and Biostatistics (T.J.H., C.I., N.R.), University of California, San Francisco, CA.,Kaiser Permanente Northern California Division of Research, Oakland, CA (D.R., E.J., C.S., C.I., N.R.)
| |
Collapse
|
11
|
Bottinor WJ, Shuey MM, Manouchehri A, Farber-Eger EH, Xu M, Nair D, Salem JE, Wang TJ, Brittain EL. Renin-Angiotensin-Aldosterone System Modulates Blood Pressure Response During Vascular Endothelial Growth Factor Receptor Inhibition. JACC: CARDIOONCOLOGY 2019; 1:14-23. [PMID: 32984850 PMCID: PMC7513950 DOI: 10.1016/j.jaccao.2019.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Objectives This study postulated that antihypertensive therapy with renin-angiotensin-aldosterone system (RAAS) inhibition may mitigate vascular endothelial growth factor inhibitor (VEGFi)–mediated increases in blood pressure more effectively than other antihypertensive medications in patients receiving VEGFi therapy. Background VEGFi therapy is commonly used in the treatment of cancer. One common side effect of VEGFi therapy is elevated blood pressure. Evidence suggests that the RAAS may be involved in VEGFi-mediated increases in blood pressure. Methods This retrospective cohort analysis was performed using a de-identified version of the electronic health record at Vanderbilt University Medical Center in Nashville, Tennessee. Subjects with cancer who were exposed to VEGFi therapy were identified, and blood pressure and medication data were extracted. Changes in mean systolic and diastolic blood pressure in response to VEGFi therapy in patients receiving RAAS inhibitor (RAASi) therapy before VEGFi initiation were compared with changes in mean systolic and diastolic blood pressure in patients not receiving RAASi therapy before VEGFi initiation. Results Mean systolic and diastolic blood pressure rose in both groups after VEGFi use; however, patients who had RAASi therapy before VEGFi initiation had a significantly lower increase in systolic blood pressure as compared with patients with no RAASi therapy (2.46 mm Hg [95% confidence interval: 0.7 to 4.2] compared with 4.56 mm Hg [95% confidence interval: 3.5 to 5.6], respectively; p = 0.034). Conclusions In a real-world clinical population, RAASi therapy before VEGFi initiation may ameliorate VEGFi-mediated increases in blood pressure. Randomized clinical trials are needed to further our understanding of the role of RAASi therapy in VEGFi-mediated increases in blood pressure.
Collapse
Affiliation(s)
- Wendy J Bottinor
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Megan M Shuey
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Ali Manouchehri
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Eric H Farber-Eger
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Meng Xu
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee
| | - Devika Nair
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Joe-Elie Salem
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee.,Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee.,Department of Clinical Pharmacology, University of the Sorbonne, Assistance Publique Hôpitaux de Paris, Institut National de la Santé et de la Recherche Médicale CIC 14-21, Pitié-Salpêtrière Hospital, Paris, France
| | - Thomas J Wang
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Evan L Brittain
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
| |
Collapse
|
12
|
Fu S, Leung LY, Wang Y, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports. JMIR Med Inform 2019; 7:e12109. [PMID: 31066686 PMCID: PMC6524454 DOI: 10.2196/12109] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/26/2019] [Accepted: 03/30/2019] [Indexed: 01/25/2023] Open
Abstract
Background Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. Objective This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. Methods Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports
randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing. Results Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. Conclusions We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, United States
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Anne-Olivia Raulli
- Department of Neurology, Tufts Medical Center, Boston, MA, United States
| | - David F Kallmes
- Department of Radiology, Mayo Clinic, Rochester, MN, United States
| | | | - Kristoff B Nelson
- Department of Neurology, Tufts Medical Center, Boston, MA, United States
| | - Michael S Clark
- Department of Radiology, Mayo Clinic, Rochester, MN, United States
| | | | - Paul R Kingsbury
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - David M Kent
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
13
|
Weissenkampen JD, Jiang Y, Eckert S, Jiang B, Li B, Liu DJ. Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits. CURRENT PROTOCOLS IN HUMAN GENETICS 2019; 101:e83. [PMID: 30849219 PMCID: PMC6455968 DOI: 10.1002/cphg.83] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
| | - Yu Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Scott Eckert
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN
| | - Dajiang J. Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| |
Collapse
|
14
|
Chan A, Chien I, Moseley E, Salman S, Kaminer Bourland S, Lamas D, Walling AM, Tulsky JA, Lindvall C. Deep learning algorithms to identify documentation of serious illness conversations during intensive care unit admissions. Palliat Med 2019; 33:187-196. [PMID: 30427267 DOI: 10.1177/0269216318810421] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background: Timely documentation of care preferences is an endorsed quality indicator for seriously ill patients admitted to intensive care units. Clinicians document their conversations about these preferences as unstructured free text in clinical notes from electronic health records. Aim: To apply deep learning algorithms for automated identification of serious illness conversations documented in physician notes during intensive care unit admissions. Design: Using a retrospective dataset of physician notes, clinicians annotated all text documenting patient care preferences (goals of care or code status limitations), communication with family, and full code status. Clinician-coded text was used to train algorithms to identify documentation and to validate algorithms. The validated algorithms were deployed to assess the percentage of intensive care unit admissions of patients aged ⩾75 that had care preferences documented within the first 48 h. Setting/participants: Patients admitted to one of five intensive care units. Results: Algorithm performance was calculated by comparing machine-identified documentation to clinician-coded documentation. For detecting care preference documentation at the note level, the algorithm had F1-score of 0.92 (95% confidence interval, 0.89 to 0.95), sensitivity of 93.5% (95% confidence interval, 90.0% to 98.0%), and specificity of 91.0% (95% confidence interval, 86.4% to 95.3%). Applied to 1350 admissions of patients aged ⩾75, we found that 64.7% of patient intensive care unit admissions had care preferences documented within the first 48 h. Conclusion: Deep learning algorithms identified patient care preference documentation with sensitivity and specificity approaching that of clinicians and computed in a tiny fraction of time. Future research should determine the generalizability of these methods in multiple healthcare systems.
Collapse
Affiliation(s)
- Alex Chan
- 1 Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA.,2 Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Isabel Chien
- 1 Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA.,3 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Edward Moseley
- 4 College of Science and Mathematics, University of Massachusetts Boston, Boston, MA, USA
| | - Saad Salman
- 2 Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | | | - Daniela Lamas
- 5 Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Anne M Walling
- 6 Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,7 Palliative Care, VA Greater Los Angeles Healthcare System, Los Angeles, CA, USA
| | - James A Tulsky
- 1 Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA.,8 Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Charlotta Lindvall
- 1 Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, MA, USA.,8 Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| |
Collapse
|
15
|
Dietrich G, Krebs J, Liman L, Fette G, Ertl M, Kaspar M, Störk S, Puppe F. Replicating medication trend studies using ad hoc information extraction in a clinical data warehouse. BMC Med Inform Decis Mak 2019; 19:15. [PMID: 30658633 PMCID: PMC6339317 DOI: 10.1186/s12911-018-0729-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/21/2018] [Indexed: 11/16/2022] Open
Abstract
Background Medication trend studies show the changes of medication over the years and may be replicated using a clinical Data Warehouse (CDW). Even nowadays, a lot of the patient information, like medication data, in the EHR is stored in the format of free text. As the conventional approach of information extraction (IE) demands a high developmental effort, we used ad hoc IE instead. This technique queries information and extracts it on the fly from texts contained in the CDW. Methods We present a generalizable approach of ad hoc IE for pharmacotherapy (medications and their daily dosage) presented in hospital discharge letters. We added import and query features to the CDW system, like error tolerant queries to deal with misspellings and proximity search for the extraction of the daily dosage. During the data integration process in the CDW, negated, historical and non-patient context data are filtered. For the replication studies, we used a drug list grouped by ATC (Anatomical Therapeutic Chemical Classification System) codes as input for queries to the CDW. Results We achieve an F1 score of 0.983 (precision 0.997, recall 0.970) for extracting medication from discharge letters and an F1 score of 0.974 (precision 0.977, recall 0.972) for extracting the dosage. We replicated three published medical trend studies for hypertension, atrial fibrillation and chronic kidney disease. Overall, 93% of the main findings could be replicated, 68% of sub-findings, and 75% of all findings. One study could be completely replicated with all main and sub-findings. Conclusion A novel approach for ad hoc IE is presented. It is very suitable for basic medical texts like discharge letters and finding reports. Ad hoc IE is by definition more limited than conventional IE and does not claim to replace it, but it substantially exceeds the search capabilities of many CDWs and it is convenient to conduct replication studies fast and with high quality.
Collapse
Affiliation(s)
- Georg Dietrich
- Computer Science, Unviversity of Würzburg, Am Hubland, Würzburg, 97074, Germany.
| | - Jonathan Krebs
- Computer Science, Unviversity of Würzburg, Am Hubland, Würzburg, 97074, Germany
| | - Leon Liman
- Computer Science, Unviversity of Würzburg, Am Hubland, Würzburg, 97074, Germany
| | - Georg Fette
- Computer Science, Unviversity of Würzburg, Am Hubland, Würzburg, 97074, Germany.,Comprehensive Heart Failure Center, University and University Hospital Hospital of Würzburg, Am Schwarzenberg 15, Würzburg, 97078, Germany
| | - Maximilian Ertl
- Service Center Medical Informatics, University Hospital of Würzburg, Schweinfurter Strasse 4, Würzburg, 97078, Germany
| | - Mathias Kaspar
- Comprehensive Heart Failure Center, University and University Hospital Hospital of Würzburg, Am Schwarzenberg 15, Würzburg, 97078, Germany
| | - Stefan Störk
- Comprehensive Heart Failure Center, University and University Hospital Hospital of Würzburg, Am Schwarzenberg 15, Würzburg, 97078, Germany
| | - Frank Puppe
- Computer Science, Unviversity of Würzburg, Am Hubland, Würzburg, 97074, Germany
| |
Collapse
|
16
|
Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 2019; 19:1. [PMID: 30616584 PMCID: PMC6322223 DOI: 10.1186/s12911-018-0723-6] [Citation(s) in RCA: 138] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 12/10/2018] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these human efforts. METHODS We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance. RESULTS CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks. CONCLUSION The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Elizabeth J. Atkinson
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Shreyasee Amin
- Division of Rheumatology, Department of Medicine, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st ST SW, Rochester, MN 55905 USA
| |
Collapse
|
17
|
Smoller JW. The use of electronic health records for psychiatric phenotyping and genomics. Am J Med Genet B Neuropsychiatr Genet 2018; 177:601-612. [PMID: 28557243 PMCID: PMC6440216 DOI: 10.1002/ajmg.b.32548] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2017] [Accepted: 04/20/2017] [Indexed: 12/22/2022]
Abstract
The widespread adoption of electronic health record (EHRs) in healthcare systems has created a vast and continuously growing resource of clinical data and provides new opportunities for population-based research. In particular, the linking of EHRs to biospecimens and genomic data in biobanks may help address what has become a rate-limiting study for genetic research: the need for large sample sizes. The principal roadblock to capitalizing on these resources is the need to establish the validity of phenotypes extracted from the EHR. For psychiatric genetic research, this represents a particular challenge given that diagnosis is based on patient reports and clinician observations that may not be well-captured in billing codes or narrative records. This review addresses the opportunities and pitfalls in EHR-based phenotyping with a focus on their application to psychiatric genetic research. A growing number of studies have demonstrated that diagnostic algorithms with high positive predictive value can be derived from EHRs, especially when structured data are supplemented by text mining approaches. Such algorithms enable semi-automated phenotyping for large-scale case-control studies. In addition, the scale and scope of EHR databases have been used successfully to identify phenotypic subgroups and derive algorithms for longitudinal risk prediction. EHR-based genomics are particularly well-suited to rapid look-up replication of putative risk genes, studies of pleiotropy (phenomewide association studies or PheWAS), investigations of genetic networks and overlap across the phenome, and pharmacogenomic research. EHR phenotyping has been relatively under-utilized in psychiatric genomic research but may become a key component of efforts to advance precision psychiatry.
Collapse
Affiliation(s)
- Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
18
|
Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges. Pharmacotherapy 2018; 38:822-841. [PMID: 29884988 DOI: 10.1002/phar.2151] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The safety of medication use has been a priority in the United States since the late 1930s. Recently, it has gained prominence due to the increasing amount of data suggesting that a large amount of patient harm is preventable and can be mitigated with effective risk strategies that have not been sufficiently adopted. Adverse events from medications are part of clinical practice, but the ability to identify a patient's risk and to minimize that risk must be a priority. The ability to identify adverse events has been a challenge due to limitations of available data sources, which are often free text. The use of natural language processing (NLP) may help to address these limitations. NLP is the artificial intelligence domain of computer science that uses computers to manipulate unstructured data (i.e., narrative text or speech data) in the context of a specific task. In this narrative review, we illustrate the fundamentals of NLP and discuss NLP's application to medication safety in four data sources: electronic health records, Internet-based data, published literature, and reporting systems. Given the magnitude of available data from these sources, a growing area is the use of computer algorithms to help automatically detect associations between medications and adverse effects. The main benefit of NLP is in the time savings associated with automation of various medication safety tasks such as the medication reconciliation process facilitated by computers, as well as the potential for near-real-time identification of adverse events for postmarketing surveillance such as those posted on social media that would otherwise go unanalyzed. NLP is limited by a lack of data sharing between health care organizations due to insufficient interoperability capabilities, inhibiting large-scale adverse event monitoring across populations. We anticipate that future work in this area will focus on the integration of data sources from different domains to improve the ability to identify potential adverse events more quickly and to improve clinical decision support with regard to a patient's estimated risk for specific adverse events at the time of medication prescription or review.
Collapse
Affiliation(s)
- Adrian Wong
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts.,Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah
| | | | - Li Zhou
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
19
|
Shuey MM, Gandelman JS, Chung CP, Nian H, Yu C, Denny JC, Brown NJ. Characteristics and treatment of African-American and European-American patients with resistant hypertension identified using the electronic health record in an academic health centre: a case-control study. BMJ Open 2018; 8:e021640. [PMID: 29950471 PMCID: PMC6020960 DOI: 10.1136/bmjopen-2018-021640] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE To identify patients with hypertension with resistant and controlled blood pressure (BP) using electronic health records (EHRs) in order to elucidate practices in the real-world clinical treatment of hypertension and to enable future genetic studies. DESIGN Using EHRs, we developed and validated algorithms to identify patients with resistant and controlled hypertension. SETTING An academic medical centre in Nashville, Tennessee. POPULATION European-American (EA) and African-American (AA) patients with hypertension. MAIN OUTCOME MEASURES Demographic characteristics: race, age, gender, body mass index, outpatient BPs and the history of diabetes mellitus, chronic kidney disease stage 3, ischaemic heart disease, transient ischaemic attack, atrial fibrillation and sleep apnoea. MEDICATION TREATMENT All antihypertensive medication classes prescribed to a patient at the time of classification and ever prescribed following classification. RESULTS The algorithms had performance metrics exceeding 92%. The prevalence of resistant hypertension in the total hypertensive population was 7.3% in EA and 10.5% in AA. At diagnosis, AA were younger, heavier, more often female and had a higher incidence of type 2 diabetes and higher BPs than EA. AA with resistant hypertension were more likely to be treated with vasodilators, dihydropyridine calcium channel blockers and alpha-2 agonists while EA were more likely to be treated with angiotensin receptor blockers, renin inhibitors and beta blockers. Mineralocorticoid receptor antagonists use was increased in patients treated with more than four antihypertensive medications compared with patients treated with three (12.4% vs 2.6% in EA, p<0.001; 12.3% vs 2.8% in AA, p<0.001). The number of patients treated with a mineralocorticoid receptor antagonist increased to 37.4% in EA and 41.2% in AA over a mean follow-up period of 7.4 and 8.7 years, respectively. CONCLUSIONS Clinical treatment of resistant hypertension differs in EA and AA patients. These results demonstrate the feasibility of identifying resistant hypertension using an EHR.
Collapse
Affiliation(s)
- Megan M Shuey
- Department of Pharmacology, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Jocelyn S Gandelman
- Department of Medicine, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Cecilia P Chung
- Department of Medicine, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Hui Nian
- Department of Biostatistics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Chang Yu
- Department of Biostatistics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Nancy J Brown
- Department of Pharmacology, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| |
Collapse
|
20
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 316] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
21
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
22
|
Sanchez Bocanegra CL, Sevillano Ramos JL, Rizo C, Civit A, Fernandez-Luque L. HealthRecSys: A semantic content-based recommender system to complement health videos. BMC Med Inform Decis Mak 2017; 17:63. [PMID: 28506225 PMCID: PMC5433022 DOI: 10.1186/s12911-017-0431-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 03/24/2017] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The Internet, and its popularity, continues to grow at an unprecedented pace. Watching videos online is very popular; it is estimated that 500 h of video are uploaded onto YouTube, a video-sharing service, every minute and that, by 2019, video formats will comprise more than 80% of Internet traffic. Health-related videos are very popular on YouTube, but their quality is always a matter of concern. One approach to enhancing the quality of online videos is to provide additional educational health content, such as websites, to support health consumers. This study investigates the feasibility of building a content-based recommender system that links health consumers to reputable health educational websites from MedlinePlus for a given health video from YouTube. METHODS The dataset for this study includes a collection of health-related videos and their available metadata. Semantic technologies (such as SNOMED-CT and Bio-ontology) were used to recommend health websites from MedlinePlus. A total of 26 healths professionals participated in evaluating 253 recommended links for a total of 53 videos about general health, hypertension, or diabetes. The relevance of the recommended health websites from MedlinePlus to the videos was measured using information retrieval metrics such as the normalized discounted cumulative gain and precision at K. RESULTS The majority of websites recommended by our system for health videos were relevant, based on ratings by health professionals. The normalized discounted cumulative gain was between 46% and 90% for the different topics. CONCLUSIONS Our study demonstrates the feasibility of using a semantic content-based recommender system to enrich YouTube health videos. Evaluation with end-users, in addition to healthcare professionals, will be required to identify the acceptance of these recommendations in a nonsimulated information-seeking context.
Collapse
Affiliation(s)
| | | | | | - Anton Civit
- Department of Architecture and Computer Technology Universidad de Sevilla, Seville, Spain
| | - Luis Fernandez-Luque
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar Foundation, PO Box 5825, Doha, Qatar.
| |
Collapse
|
23
|
Teixeira PL, Wei WQ, Cronin RM, Mo H, VanHouten JP, Carroll RJ, LaRose E, Bastarache LA, Rosenbloom ST, Edwards TL, Roden DM, Lasko TA, Dart RA, Nikolai AM, Peissig PL, Denny JC. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc 2016; 24:162-171. [PMID: 27497800 DOI: 10.1093/jamia/ocw071] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 04/03/2016] [Accepted: 04/07/2016] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. MATERIALS AND METHODS We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. RESULTS Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. CONCLUSION This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.
Collapse
Affiliation(s)
- Pedro L Teixeira
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Jacob P VanHouten
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Eric LaRose
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Lisa A Bastarache
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Todd L Edwards
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Thomas A Lasko
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Richard A Dart
- Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 N Oak Ave-MLR, Marshfield, WI 54449, USA
| | - Anne M Nikolai
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA .,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
24
|
Perera G, Broadbent M, Callard F, Chang CK, Downs J, Dutta R, Fernandes A, Hayes RD, Henderson M, Jackson R, Jewell A, Kadra G, Little R, Pritchard M, Shetty H, Tulloch A, Stewart R. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 2016; 6:e008721. [PMID: 26932138 PMCID: PMC4785292 DOI: 10.1136/bmjopen-2015-008721] [Citation(s) in RCA: 316] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
PURPOSE The South London and Maudsley National Health Service (NHS) Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register and its Clinical Record Interactive Search (CRIS) application were developed in 2008, generating a research repository of real-time, anonymised, structured and open-text data derived from the electronic health record system used by SLaM, a large mental healthcare provider in southeast London. In this paper, we update this register's descriptive data, and describe the substantial expansion and extension of the data resource since its original development. PARTICIPANTS Descriptive data were generated from the SLaM BRC Case Register on 31 December 2014. Currently, there are over 250,000 patient records accessed through CRIS. FINDINGS TO DATE Since 2008, the most significant developments in the SLaM BRC Case Register have been the introduction of natural language processing to extract structured data from open-text fields, linkages to external sources of data, and the addition of a parallel relational database (Structured Query Language) output. Natural language processing applications to date have brought in new and hitherto inaccessible data on cognitive function, education, social care receipt, smoking, diagnostic statements and pharmacotherapy. In addition, through external data linkages, large volumes of supplementary information have been accessed on mortality, hospital attendances and cancer registrations. FUTURE PLANS Coupled with robust data security and governance structures, electronic health records provide potentially transformative information on mental disorders and outcomes in routine clinical care. The SLaM BRC Case Register continues to grow as a database, with approximately 20,000 new cases added each year, in addition to extension of follow-up for existing cases. Data linkages and natural language processing present important opportunities to enhance this type of research resource further, achieving both volume and depth of data. However, research projects still need to be carefully tailored, so that they take into account the nature and quality of the source information.
Collapse
Affiliation(s)
- Gayan Perera
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | | | | | - Chin-Kuo Chang
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Johnny Downs
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Rina Dutta
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Andrea Fernandes
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Richard D Hayes
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Max Henderson
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Richard Jackson
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Amelia Jewell
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Giouliana Kadra
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Ryan Little
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Megan Pritchard
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Hitesh Shetty
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Alex Tulloch
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| | - Robert Stewart
- King's College London (Institute of Psychiatry, Psychology and Neuroscience), London, UK
| |
Collapse
|
25
|
Laper SM, Restrepo NA, Crawford DC. THE CHALLENGES IN USING ELECTRONIC HEALTH RECORDS FOR PHARMACOGENOMICS AND PRECISION MEDICINE RESEARCH. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:369-80. [PMID: 26776201 PMCID: PMC4720980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Access and utilization of electronic health records with extensive medication lists and genetic profiles is rapidly advancing discoveries in pharmacogenomics. In this study, we analyzed ~116,000 variants on the Illumina Metabochip for response to antihypertensive and lipid lowering medications in African American adults from BioVU, the Vanderbilt University Medical Center's biorepository linked to de-identified electronic health records. Our study population included individuals who were prescribed an antihypertensive or lipid lowering medication, and who had both pre- and post-medication blood pressure or low-density lipoprotein cholesterol (LDL-C) measurements, respectively. Among those with pre- and post-medication systolic and diastolic blood pressure measurements (n=2,268), the average change in systolic and diastolic blood pressure was -0.6 mg Hg and -0.8 mm Hg, respectively. Among those with pre- and post-medication LDL-C measurements (n=1,244), the average change in LDL-C was -26.3 mg/dL. SNPs were tested for an association with change and percent change in blood pressure or blood levels of LDL-C. After adjustment for multiple testing, we did not observe any significant associations, and we were not able to replicate previously reported associations, such as in APOE and LPA, from the literature. The present study illustrates the benefits and challenges with using electronic health records linked to biorepositories for pharmacogenomic studies.
Collapse
Affiliation(s)
- Sarah M. Laper
- Eastern Virginia Medical School, Norfolk, VA, 23507, USA
| | - Nicole A. Restrepo
- Center for Human Genetics Research, Vanderbilt University, 519 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Suite 2527, Cleveland, OH 44106, USA
| |
Collapse
|
26
|
Warner JL, Levy MA, Neuss MN, Warner JL, Levy MA, Neuss MN. ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data. J Oncol Pract 2015; 12:157-8; e169-7. [PMID: 26306621 DOI: 10.1200/jop.2015.004622] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs). Such documentation results in tedious and time-consuming abstraction efforts by tumor registrars and other secondary users. This information may be amenable to extraction by automated methods. METHODS We developed a natural language processing algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR. These methods were developed in a training set of patients with lung cancer, independently validated in a test set of patients with lung cancer, and compared with the gold standard of Vanderbilt Cancer Registry–determined stage (when available). RESULTS In the combined data set of 2,323 patients (training set, n = 1,103; validation set, n = 1,220), 751,880 documents were analyzed. A stage statement was extracted from 2,239 (98.6%) patient EHRs (median, 24 documents per patient). Stage discordance was common, affecting 83.6% of these EHRs. Nevertheless, algorithmically derived stage accuracy was high in the validation set (κ = 0.906; 95% CI, 0.873 to 0.939), when including notes generated within 14 weeks from diagnosis. CONCLUSION Accurate stage determination can be achieved through automated methods applied to narrative text, despite the frequent presence of discordance in such data. Our results also indicate that stage can be automatically captured in a shorter timeframe than the 6-month window used by cancer registries, as early as 5 weeks from diagnosis. These methods may be generalizable to large narrative cancer data sets.
Collapse
Affiliation(s)
- Jeremy L Warner
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| | - Mia A Levy
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| | - Michael N Neuss
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| | - Jeremy L Warner
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| | - Mia A Levy
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| | - Michael N Neuss
- Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
27
|
Abstract
Consensus practice guidelines and the implementation of clinical therapeutic advances are usually based on the results of large, randomized clinical trials (RCTs). However, RCTs generally inform us on an average treatment effect for a presumably homogeneous population, but therapeutic interventions rarely benefit the entire population targeted. Indeed, multiple RCTs have demonstrated that interindividual variability exists both in drug response and in the development of adverse effects. The field of pharmacogenomics promises to deliver the right drug to the right patient. Substantial progress has been made in this field, with advances in technology, statistical and computational methods, and the use of cell and animal model systems. However, clinical implementation of pharmacogenetic principles has been difficult because RCTs demonstrating benefit are lacking. For patients, the potential benefits of performing such trials include the individualization of therapy to maximize efficacy and minimize adverse effects. These trials would also enable investigators to reduce sample size and hence contain costs for trial sponsors. Multiple ethical, legal, and practical issues need to be considered for the conduct of genotype-based RCTs. Whether pre-emptive genotyping embedded in electronic health records will preclude the need for performing genotype-based RCTs remains to be seen.
Collapse
Affiliation(s)
- Naveen L Pereira
- Division of Cardiovascular Diseases, Department of Internal Medicine, 200 First Street SW, Rochester, MN 55905, USA
| | - Daniel J Sargent
- Department of Biomedical Statistics and Informatics, Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Michael E Farkouh
- Peter Munk Cardiac Centre and Heart and Stroke Richard Lewer Centre, University of Toronto, 585 University Avenue, Toronto, ON M5G 2N2, Canada
| | - Charanjit S Rihal
- Division of Cardiovascular Diseases, Department of Internal Medicine, 200 First Street SW, Rochester, MN 55905, USA
| |
Collapse
|
28
|
Crawford DC, Goodloe R, Farber-Eger E, Boston J, Pendergrass SA, Haines JL, Ritchie MD, Bush WS. Leveraging Epidemiologic and Clinical Collections for Genomic Studies of Complex Traits. Hum Hered 2015. [PMID: 26201699 DOI: 10.1159/000381805] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND/AIMS Present-day limited resources demand DNA and phenotyping alternatives to the traditional prospective population-based epidemiologic collections. METHODS To accelerate genomic discovery with an emphasis on diverse populations, we--as part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study--accessed all non-European American samples (n = 15,863) available in BioVU, the Vanderbilt University biorepository linked to de-identified electronic medical records, for genomic studies as part of the larger Population Architecture using Genomics and Epidemiology (PAGE) I study. Given previous studies have cautioned against the secondary use of clinically collected data compared with epidemiologically collected data, we present here a characterization of EAGLE BioVU, including the billing and diagnostic (ICD-9) code distributions for adult and pediatric patients as well as comparisons made for select health metrics (body mass index, glucose, HbA1c, HDL-C, LDL-C, and triglycerides) with the population-based National Health and Nutrition Examination Surveys (NHANES) linked to DNA samples (NHANES III, n = 7,159; NHANES 1999-2002, n = 7,839). RESULTS Overall, the distributions of billing and diagnostic codes suggest this clinical sample is a mixture of healthy and sick patients like that expected for a contemporary American population. CONCLUSION Little bias is observed among health metrics, suggesting this clinical collection is suitable for genomic studies along with traditional epidemiologic cohorts.
Collapse
Affiliation(s)
- Dana C Crawford
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Li L. The potential of translational bioinformatics approaches for pharmacology research. Br J Clin Pharmacol 2015; 80:862-7. [PMID: 25753093 DOI: 10.1111/bcp.12622] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 02/11/2015] [Accepted: 02/15/2015] [Indexed: 12/17/2022] Open
Abstract
The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines.
Collapse
Affiliation(s)
- Lang Li
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN.,Indiana Institute of Personalized Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
30
|
The effects of electronic medical record phenotyping details on genetic association studies: HDL-C as a case study. BioData Min 2015; 8:15. [PMID: 25969697 PMCID: PMC4428098 DOI: 10.1186/s13040-015-0048-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 04/28/2015] [Indexed: 02/01/2023] Open
Abstract
Background Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C). Results We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions. Conclusions These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.
Collapse
|
31
|
Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V, Cai T, Hoffnagle AG, Dai Y, Block S, Weill SR, Nadal-Vicens M, Pollastri AR, Rosenquist JN, Goryachev S, Ongur D, Sklar P, Perlis RH, Smoller JW. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2015; 172:363-72. [PMID: 25827034 PMCID: PMC4441333 DOI: 10.1176/appi.ajp.2014.14030423] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. METHOD EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. RESULTS The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. CONCLUSIONS Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.
Collapse
Affiliation(s)
- Victor M. Castro
- Partners Research Information Systems and Computing, Oregon Health & Science University, Portland, OR
| | - Jessica Minnier
- Department of Public Health & Preventive Medicine, Oregon Health & Science University, Portland, OR
| | - Shawn N. Murphy
- Partners Research Information Systems and Computing, Oregon Health & Science University, Portland, OR
- Laboratory of Computer Science and Department of Neurology, Massachusetts General Hospital, Boston, MA
| | - Isaac Kohane
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA
| | | | - Vivian Gainer
- Partners Research Information Systems and Computing, Oregon Health & Science University, Portland, OR
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, MA
| | - Alison G. Hoffnagle
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
| | - Yael Dai
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
| | - Stefanie Block
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
| | - Sydney R. Weill
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
| | - Mireya Nadal-Vicens
- Center for Anxiety and Traumatic Stress Disorders, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Alisha R. Pollastri
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - J. Niels Rosenquist
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Sergey Goryachev
- Partners Research Information Systems and Computing, Oregon Health & Science University, Portland, OR
| | | | - Pamela Sklar
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Roy H. Perlis
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
- Center for Experimental Drugs and Diagnostics, Massachusetts General Hospital, Boston, MA
| | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
32
|
Lin C, Karlson EW, Dligach D, Ramirez MP, Miller TA, Mo H, Braggs NS, Cagan A, Gainer V, Denny JC, Savova GK. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc 2015; 22:e151-61. [PMID: 25344930 PMCID: PMC5901122 DOI: 10.1136/amiajnl-2014-002642] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 08/14/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVES To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
Collapse
Affiliation(s)
- Chen Lin
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Elizabeth W Karlson
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Dmitriy Dligach
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Monica P Ramirez
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Timothy A Miller
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | - Natalie S Braggs
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Andrew Cagan
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Vivian Gainer
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Guergana K Savova
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
33
|
Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, Suominen H, Chapman WW, Savova G. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc 2015; 22:143-54. [PMID: 25147248 PMCID: PMC4433360 DOI: 10.1136/amiajnl-2013-002544] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 07/16/2014] [Accepted: 07/21/2014] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners. MATERIALS AND METHODS We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text--199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance. RESULTS For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59. DISCUSSION Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources. CONCLUSIONS The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.
Collapse
Affiliation(s)
- Sameer Pradhan
- Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | | | - Amy Vogel
- Columbia University, New York, New York, USA
| | - Hanna Suominen
- NICTA, The Australian National University, and University of Canberra, Canberra, Australian Capital Territory, Australia
| | | | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
34
|
Denny JC. Surveying Recent Themes in Translational Bioinformatics: Big Data in EHRs, Omics for Drugs, and Personal Genomics. Yearb Med Inform 2014; 9:199-205. [PMID: 25123743 PMCID: PMC4287076 DOI: 10.15265/iy-2014-0015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
OBJECTIVE To provide a survey of recent progress in the use of large-scale biologic data to impact clinical care, and the impact the reuse of electronic health record data has made in genomic discovery. METHOD Survey of key themes in translational bioinformatics, primarily from 2012 and 2013. RESULT This survey focuses on four major themes: the growing use of Electronic Health Records (EHRs) as a source for genomic discovery, adoption of genomics and pharmacogenomics in clinical practice, the possible use of genomic technologies for drug repurposing, and the use of personal genomics to guide care. CONCLUSION Reuse of abundant clinical data for research is speeding discovery, and implementation of genomic data into clinical medicine is impacting care with new classes of data rarely used previously in medicine.
Collapse
Affiliation(s)
- J C Denny
- Joshua C. Denny, MD, MS, 2525 West End Ave - Suite 672, Nashville, TN 37213, USA, E-mail:
| |
Collapse
|
35
|
Abstract
OBJECTIVES Implementation of Electronic Health Record (EHR) systems continues to expand. The massive number of patient encounters results in high amounts of stored data. Transforming clinical data into knowledge to improve patient care has been the goal of biomedical informatics professionals for many decades, and this work is now increasingly recognized outside our field. In reviewing the literature for the past three years, we focus on "big data" in the context of EHR systems and we report on some examples of how secondary use of data has been put into practice. METHODS We searched PubMed database for articles from January 1, 2011 to November 1, 2013. We initiated the search with keywords related to "big data" and EHR. We identified relevant articles and additional keywords from the retrieved articles were added. Based on the new keywords, more articles were retrieved and we manually narrowed down the set utilizing predefined inclusion and exclusion criteria. RESULTS Our final review includes articles categorized into the themes of data mining (pharmacovigilance, phenotyping, natural language processing), data application and integration (clinical decision support, personal monitoring, social media), and privacy and security. CONCLUSION The increasing adoption of EHR systems worldwide makes it possible to capture large amounts of clinical data. There is an increasing number of articles addressing the theme of "big data", and the concepts associated with these articles vary. The next step is to transform healthcare big data into actionable knowledge.
Collapse
Affiliation(s)
- M K Ross
- Lucila Ohno-Machado, Division of Biomedical Informatics, 9500 Gilman Drive, MC 0505, La Jolla, California, 92037-0505, USA, Tel: +1 858 822 4931, E-mail:
| | | | | |
Collapse
|
36
|
Xu H, Aldrich MC, Chen Q, Liu H, Peterson NB, Dai Q, Levy M, Shah A, Han X, Ruan X, Jiang M, Li Y, Julien JS, Warner J, Friedman C, Roden DM, Denny JC. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. J Am Med Inform Assoc 2014; 22:179-91. [PMID: 25053577 PMCID: PMC4433365 DOI: 10.1136/amiajnl-2014-002649] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Objectives Drug repurposing, which finds new indications for existing drugs, has received great attention recently. The goal of our work is to assess the feasibility of using electronic health records (EHRs) and automated informatics methods to efficiently validate a recent drug repurposing association of metformin with reduced cancer mortality. Methods By linking two large EHRs from Vanderbilt University Medical Center and Mayo Clinic to their tumor registries, we constructed a cohort including 32 415 adults with a cancer diagnosis at Vanderbilt and 79 258 cancer patients at Mayo from 1995 to 2010. Using automated informatics methods, we further identified type 2 diabetes patients within the cancer cohort and determined their drug exposure information, as well as other covariates such as smoking status. We then estimated HRs for all-cause mortality and their associated 95% CIs using stratified Cox proportional hazard models. HRs were estimated according to metformin exposure, adjusted for age at diagnosis, sex, race, body mass index, tobacco use, insulin use, cancer type, and non-cancer Charlson comorbidity index. Results Among all Vanderbilt cancer patients, metformin was associated with a 22% decrease in overall mortality compared to other oral hypoglycemic medications (HR 0.78; 95% CI 0.69 to 0.88) and with a 39% decrease compared to type 2 diabetes patients on insulin only (HR 0.61; 95% CI 0.50 to 0.73). Diabetic patients on metformin also had a 23% improved survival compared with non-diabetic patients (HR 0.77; 95% CI 0.71 to 0.85). These associations were replicated using the Mayo Clinic EHR data. Many site-specific cancers including breast, colorectal, lung, and prostate demonstrated reduced mortality with metformin use in at least one EHR. Conclusions EHR data suggested that the use of metformin was associated with decreased mortality after a cancer diagnosis compared with diabetic and non-diabetic cancer patients not on metformin, indicating its potential as a chemotherapeutic regimen. This study serves as a model for robust and inexpensive validation studies for drug repurposing signals using EHR data.
Collapse
Affiliation(s)
- Hua Xu
- The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA
| | - Melinda C Aldrich
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Neeraja B Peterson
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Qi Dai
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Mia Levy
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Anushi Shah
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Xue Han
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Xiaoyang Ruan
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Min Jiang
- The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA
| | - Ying Li
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Jamii St Julien
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Jeremy Warner
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| |
Collapse
|
37
|
Rosenbloom ST, Harris P, Pulley J, Basford M, Grant J, DuBuisson A, Rothman RL. The Mid-South clinical Data Research Network. J Am Med Inform Assoc 2014; 21:627-32. [PMID: 24821742 PMCID: PMC4078290 DOI: 10.1136/amiajnl-2014-002745] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The Mid-South Clinical Data Research Network (CDRN) encompasses three large health systems: (1) Vanderbilt Health System (VU) with electronic medical records for over 2 million patients, (2) the Vanderbilt Healthcare Affiliated Network (VHAN) which currently includes over 40 hospitals, hundreds of ambulatory practices, and over 3 million patients in the Mid-South, and (3) Greenway Medical Technologies, with access to 24 million patients nationally. Initial goals of the Mid-South CDRN include: (1) expansion of our VU data network to include the VHAN and Greenway systems, (2) developing data integration/interoperability across the three systems, (3) improving our current tools for extracting clinical data, (4) optimization of tools for collection of patient-reported data, and (5) expansion of clinical decision support. By 18 months, we anticipate our CDRN will robustly support projects in comparative effectiveness research, pragmatic clinical trials, and other key research areas and have the capacity to share data and health information technology tools nationally.
Collapse
Affiliation(s)
- S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jill Pulley
- Office of Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA Office of Personalized Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Melissa Basford
- Office of Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jason Grant
- Vanderbilt Health Affiliated Network, Nashville, Tennessee, USA
| | | | - Russell L Rothman
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA Center for Health Services Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
38
|
Wei WQ, Feng Q, Weeke P, Bush W, Waitara MS, Iwuchukwu OF, Roden DM, Wilke RA, Stein CM, Denny JC. Creation and Validation of an EMR-based Algorithm for Identifying Major Adverse Cardiac Events while on Statins. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:112-9. [PMID: 25717410 PMCID: PMC4333709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Statin medications are often prescribed to ameliorate a patient's risk of cardiovascular events due in part to cholesterol reduction. We developed and evaluated an algorithm that can accurately identify subjects with major adverse cardiac events (MACE) while on statins using electronic medical record (EMR) data. The algorithm also identifies subjects experiencing their first MACE while on statins for primary prevention. The algorithm achieved 90% to 97% PPVs in identification of MACE cases as compared against physician review. By applying the algorithm to EMR data in BioVU, cases and controls were identified and used subsequently to replicate known associations with eight genetic variants. We replicated 6/8 previously reported genetic associations with cardiovascular diseases or lipid metabolism disorders. Our results demonstrated that the algorithm can be used to accurately identify subjects with MACE and MACE while on statins. Consequently, future e studies can be conducted to investigate and validate the relationship between statins and MACE using real-world clinical data.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Qiping Feng
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Peter Weeke
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - William Bush
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN
| | - Magarya S. Waitara
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Otito F. Iwuchukwu
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Dan M. Roden
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN,Oates Institute for Experimental Therapeutics, Vanderbilt University, Nashville, TN,Office of Personalized Medicine, Vanderbilt University, Nashville, TN
| | | | - Charles M Stein
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| |
Collapse
|
39
|
Masanz J, Pakhomov SV, Xu H, Wu ST, Chute CG, Liu H. Open Source Clinical NLP - More than Any Single System. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:76-82. [PMID: 25954581 PMCID: PMC4419764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP's mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice.
Collapse
Affiliation(s)
| | - Serguei V. Pakhomov
- College of Pharmacy and Institute for Health Informatics, University of Minnesota
| | - Hua Xu
- School of Biomedical Informatics in The University of Texas Health Science Center at Houston
| | | | | | | |
Collapse
|
40
|
Jiang M, Wu Y, Shah A, Priyanka P, Denny JC, Xu H. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:37-42. [PMID: 25954575 PMCID: PMC4419757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.
Collapse
Affiliation(s)
- Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, US
| | - Yonghui Wu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, US
| | - Anushi Shah
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, TN, US
| | - Priyanka Priyanka
- School of Public Health, The University of Texas Health Science Center at Houston, TX, US
| | - Joshua C. Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, TN, US
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, US
| |
Collapse
|
41
|
Bui DDA, Zeng-Treitler Q. Learning regular expressions for clinical text classification. J Am Med Inform Assoc 2014; 21:850-7. [PMID: 24578357 DOI: 10.1136/amiajnl-2013-002411] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES Natural language processing (NLP) applications typically use regular expressions that have been developed manually by human experts. Our goal is to automate both the creation and utilization of regular expressions in text classification. METHODS We designed a novel regular expression discovery (RED) algorithm and implemented two text classifiers based on RED. The RED+ALIGN classifier combines RED with an alignment algorithm, and RED+SVM combines RED with a support vector machine (SVM) classifier. Two clinical datasets were used for testing and evaluation: the SMOKE dataset, containing 1091 text snippets describing smoking status; and the PAIN dataset, containing 702 snippets describing pain status. We performed 10-fold cross-validation to calculate accuracy, precision, recall, and F-measure metrics. In the evaluation, an SVM classifier was trained as the control. RESULTS The two RED classifiers achieved 80.9-83.0% in overall accuracy on the two datasets, which is 1.3-3% higher than SVM's accuracy (p<0.001). Similarly, small but consistent improvements have been observed in precision, recall, and F-measure when RED classifiers are compared with SVM alone. More significantly, RED+ALIGN correctly classified many instances that were misclassified by the SVM classifier (8.1-10.3% of the total instances and 43.8-53.0% of SVM's misclassifications). CONCLUSIONS Machine-generated regular expressions can be effectively used in clinical text classification. The regular expression-based classifier can be combined with other classifiers, like SVM, to improve classification performance.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
| | - Qing Zeng-Treitler
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
| |
Collapse
|
42
|
Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform 2014; 52:28-35. [PMID: 24534443 DOI: 10.1016/j.jbi.2014.02.003] [Citation(s) in RCA: 174] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 12/21/2013] [Accepted: 02/04/2014] [Indexed: 01/04/2023]
Abstract
The last decade has seen an exponential growth in the quantity of clinical data collected nationwide, triggering an increase in opportunities to reuse the data for biomedical research. The Vanderbilt research data warehouse framework consists of identified and de-identified clinical data repositories, fee-for-service custom services, and tools built atop the data layer to assist researchers across the enterprise. Providing resources dedicated to research initiatives benefits not only the research community, but also clinicians, patients and institutional leadership. This work provides a summary of our approach in the secondary use of clinical data for research domain, including a description of key components and a list of lessons learned, designed to assist others assembling similar services and infrastructure.
Collapse
|
43
|
Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc 2014; 20:e206-11. [PMID: 24302669 DOI: 10.1136/amiajnl-2013-002428] [Citation(s) in RCA: 165] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Affiliation(s)
- Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | | | | |
Collapse
|
44
|
Oetjens M, Bush WS, Birdwell KA, Dilks HH, Bowton EA, Denny JC, Wilke RA, Roden DM, Crawford DC. Utilization of an EMR-biorepository to identify the genetic predictors of calcineurin-inhibitor toxicity in heart transplant recipients. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:253-64. [PMID: 24297552 PMCID: PMC3923429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Calcineurin-inhibitors CI are immunosuppressive agents prescribed to patients after solid organ transplant to prevent rejection. Although these drugs have been transformative for allograft survival, long-term use is complicated by side effects including nephrotoxicity. Given the narrow therapeutic index of CI, therapeutic drug monitoring is used to prevent acute rejection from underdosing and acute toxicity from overdosing, but drug monitoring does not alleviate long-term side effects. Patients on calcineurin-inhibitors for long periods almost universally experience declines in renal function, and a subpopulation of transplant recipients ultimately develop chronic kidney disease that may progress to end stage renal disease attributable to calcineurin inhibitor toxicity (CNIT). Pharmacogenomics has the potential to identify patients who are at high risk for developing advanced chronic kidney disease caused by CNIT and providing them with existing alternate immunosuppressive therapy. In this study we utilized BioVU, Vanderbilt University Medical Center's DNA biorepository linked to de-identified electronic medical records to identify a cohort of 115 heart transplant recipients prescribed calcineurin-inhibitors to identify genetic risk factors for CNIT We identified 37 cases of nephrotoxicity in our cohort, defining nephrotoxicity as a monthly median estimated glomerular filtration rate (eGFR)<30 mL/min/1.73 m2 at least six months post-transplant for at least three consecutive months. All heart transplant patients were genotyped on the Illumina ADME Core Panel, a pharmacogenomic genotyping platform that assays 184 variants across 34 genes. In Cox regression analysis adjusting for age at transplant, pre-transplant chronic kidney disease, pre-transplant diabetes, and the three most significant principal components (PCAs), we did not identify any markers that met our multiple-testing threshold. As a secondary analysis we also modeled post-transplant eGFR directly with linear mixed models adjusted for age at transplant, cyclosporine use, median BMI, and the three most significant principal components. While no SNPs met our threshold for significance, a SNP previously identified in genetic studies of the dosing of tacrolimus CYP34A rs776746, replicated in an adjusted analysis at an uncorrected p-value of 0.02 (coeff(S.E.)=14.60(6.41)). While larger independent studies will be required to further validate this finding, this study underscores the EMRs usefulness as a resource for longitudinal pharmacogenetic study designs.
Collapse
Affiliation(s)
| | - William S. Bush
- Department of Biomedical Informatics, Center for Human Genetics Research
| | | | - Holli H. Dilks
- Vanderbilt Technologies for Advanced Genomics Core Facility
| | | | | | | | - Dan M. Roden
- Department of Medicine, Department of Pharmacology, Office of Personalized Medicine
| | - Dana C. Crawford
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Ave, Nashville, TN 37212, United States of America
| |
Collapse
|
45
|
SLCO1B1 genetic variant associated with statin-induced myopathy: a proof-of-concept study using the clinical practice research datalink. Clin Pharmacol Ther 2013; 94:695-701. [PMID: 23942138 PMCID: PMC3831180 DOI: 10.1038/clpt.2013.161] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 08/02/2013] [Indexed: 01/14/2023]
Abstract
This study aimed to determine whether patients with statin-induced myopathy could be identified using the United Kingdom Clinical Practice Research Datalink, whether DNA could be obtained, and whether previously reported associations of statin myopathy with the SLCO1B1 c.521T>C and COQ2 rs4693075 polymorphisms could be replicated. Seventy-seven statin-induced myopathy patients (serum creatine phosphokinase (CPK) > 4× upper limit of normal (ULN)) and 372 statin-tolerant controls were identified and recruited. Multiple logistic regression analysis showed the SLCO1B1 c.521T>C single-nucleotide polymorphism to be a significant risk factor (P = 0.009), with an odds ratio (OR) per variant allele of 2.06 (1.32–3.15) for all myopathy and 4.09 (2.06–8.16) for severe myopathy (CPK > 10× ULN, and/or rhabdomyolysis; n = 23). COQ2 rs4693075 was not associated with myopathy. Meta-analysis showed an association between c.521C>T and simvastatin-induced myopathy, although power for other statins was limited. Our data replicate the association of SLCO1B1 variants with statin-induced myopathy. Furthermore, we demonstrate how electronic medical records provide a time- and cost-efficient means of recruiting patients with severe adverse drug reactions for pharmacogenetic studies.
Collapse
|
46
|
Bartlett G, Antoun J, Zgheib NK. Theranostics in primary care: pharmacogenomics tests and beyond. Expert Rev Mol Diagn 2013; 12:841-55. [PMID: 23249202 DOI: 10.1586/erm.12.115] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Theranostics represents a broadening in the scope of personalized medicine to include companion diagnostics for health interventions ranging from drugs to vaccines, as well as individual susceptibility to disease. Surprisingly, in the course of this broadening of personalized medicine discourse, relatively little attention has been paid to primary care (as compared with tertiary healthcare settings) despite its vast patient population and being a crucial entry point to health services. Recent advances in pharmacogenomics (PGx), a classical theranostics application whereby genotyping and/or gene expression-based tests are used for targeted or optimal therapy, revealed new opportunities to characterize more precisely human genomic variation and the ways in which it contributes to person-to-person and population variations in drug response. In the immediate foreseeable future, the primary-care physicians are expected to play an ever increasing crucial role in PGx-based prescribing in order to reduce the rates of adverse drug events and improve drug efficacy, yet PGx testing in primary care remains limited. In this article, the authors review the advances in PGx applications, the barriers for their adoption in the clinic from a primary care point of view and the efforts that are being undertaken to move PGx forward in this hitherto neglected application context of theranostic medicine. Finally, the authors propose several salient recommendations, including a 5-year forecast, to accelerate the current convergence between PGx and primary care.
Collapse
Affiliation(s)
- Gillian Bartlett
- Department of Family Medicine, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | | | | |
Collapse
|
47
|
Wu Y, Lei J, Wei WQ, Tang B, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Zheng K, Xu H. Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages. Stud Health Technol Inform 2013; 192:662-6. [PMID: 23920639 PMCID: PMC4957806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Worldwide adoption of Electronic Medical Records (EMRs) databases in health care have generated an unprecedented amount of clinical data available electronically. There has been an increasing trend in US and western institutions towards collaborating with China on medical research using EMR data. However, few studies have investigated characteristics of EMR data in China and their differences with the data in US hospitals. As an initial step towards differentiating EMR data in Chinese and US systems, this study attempts to understand system and cultural differences that may exist between Chinese and English clinical documents. We collected inpatient discharge summaries from one Chinese and from three US institutions and manually analyzed three major clinical components in text: medical problems, tests, and treatments. We reported comparison results at the document level and section level and discussed potential reasons for observed differences. Documenting and understanding differences in clinical reports from the US and China EMRs are important for cross-country collaborations. Our study also provided valuable insights for developing natural language processing tools for Chinese clinical text.
Collapse
Affiliation(s)
- Yonghui Wu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TN, USA
| | - Jianbo Lei
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TN, USA,Center for Medical Informatics, Peking University, Beijing, China
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Buzhou Tang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TN, USA
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - S. Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Randolph A. Miller
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Dario A. Giuse
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Kai Zheng
- Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TN, USA
| |
Collapse
|
48
|
Ronquillo JG. How the electronic health record will change the future of health care. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2012; 85:379-86. [PMID: 23012585 PMCID: PMC3447201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Genetic testing is expected to play a critical role in patient care in the near future. Advances in genomic research have the potential to impact medicine in very tangible and direct ways, from carrier screening to disease diagnosis and prognosis to targeted treatments and personalized medicine. However, numerous barriers to widespread adoption of genetic testing continue to exist, and health information technology will be a critical means of addressing these challenges. Electronic health records (EHRs) are a digital replacement for the traditional paper-based patient chart designed to improve the quality of patient care. EHRs have become increasingly essential to managing the wealth of existing clinical information that now includes genetic information extracted from the patient genome. The EHR is capable of changing health care in the future by transforming the way physicians use genomic information in the practice of medicine.
Collapse
|
49
|
Roden DM, Xu H, Denny JC, Wilke RA. Electronic medical records as a tool in clinical pharmacology: opportunities and challenges. Clin Pharmacol Ther 2012; 91:1083-86. [PMID: 22534870 DOI: 10.1038/clpt.2012.42] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The development and increasing sophistication of electronic medical record (EMR) systems hold the promise of not only improving patient care but also providing unprecedented opportunities for discovery in the fields of basic, translational, and implementation sciences. Clinical pharmacology research in the EMR environment has only recently started to become a reality, with EMRs becoming increasingly populated, methods to mine drug response and other phenotypes becoming more sophisticated, and links being established with DNA repositories.
Collapse
Affiliation(s)
- D M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
| | | | | | | |
Collapse
|
50
|
Samwald M, Coulet A, Huerga I, Powers RL, Luciano JS, Freimuth RR, Whipple F, Pichler E, Prud'hommeaux E, Dumontier M, Marshall MS. Semantically enabling pharmacogenomic data for the realization of personalized medicine. Pharmacogenomics 2012; 13:201-12. [PMID: 22256869 DOI: 10.2217/pgs.11.179] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Understanding how each individual's genetics and physiology influences pharmaceutical response is crucial to the realization of personalized medicine and the discovery and validation of pharmacogenomic biomarkers is key to its success. However, integration of genotype and phenotype knowledge in medical information systems remains a critical challenge. The inability to easily and accurately integrate the results of biomolecular studies with patients' medical records and clinical reports prevents us from realizing the full potential of pharmacogenomic knowledge for both drug development and clinical practice. Herein, we describe approaches using Semantic Web technologies, in which pharmacogenomic knowledge relevant to drug development and medical decision support is represented in such a way that it can be efficiently accessed both by software and human experts. We suggest that this approach increases the utility of data, and that such computational technologies will become an essential part of personalized medicine, alongside diagnostics and pharmaceutical products.
Collapse
Affiliation(s)
- Matthias Samwald
- Department of Medical Statistics & Bioinformatics, Leiden University Medical Center/Informatics Institute, University of Amsterdam, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|