1
|
Zhang H, Jethani N, Jones S, Genes N, Major VJ, Jaffe IS, Cardillo AB, Heilenbach N, Ali NF, Bonanni LJ, Clayburn AJ, Khera Z, Sadler EC, Prasad J, Schlacter J, Liu K, Silva B, Montgomery S, Kim EJ, Lester J, Hill TM, Avoricani A, Chervonski E, Davydov J, Small W, Chakravartty E, Grover H, Dodson JA, Brody AA, Aphinyanaphongs Y, Masurkar A, Razavian N. Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.07.10.23292373. [PMID: 38405784 PMCID: PMC10888985 DOI: 10.1101/2023.07.10.23292373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Importance Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Abraham A Brody
- NYU Rory Meyers College of Nursing, NYU Grossman School of Medicine
| | | | | | | |
Collapse
|
2
|
Khan MS, Usman MS, Talha KM, Van Spall HGC, Greene SJ, Vaduganathan M, Khan SS, Mills NL, Ali ZA, Mentz RJ, Fonarow GC, Rao SV, Spertus JA, Roe MT, Anker SD, James SK, Butler J, McGuire DK. Leveraging electronic health records to streamline the conduct of cardiovascular clinical trials. Eur Heart J 2023; 44:1890-1909. [PMID: 37098746 DOI: 10.1093/eurheartj/ehad171] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 02/05/2023] [Accepted: 03/07/2023] [Indexed: 04/27/2023] Open
Abstract
Conventional randomized controlled trials (RCTs) can be expensive, time intensive, and complex to conduct. Trial recruitment, participation, and data collection can burden participants and research personnel. In the past two decades, there have been rapid technological advances and an exponential growth in digitized healthcare data. Embedding RCTs, including cardiovascular outcome trials, into electronic health record systems or registries may streamline screening, consent, randomization, follow-up visits, and outcome adjudication. Moreover, wearable sensors (i.e. health and fitness trackers) provide an opportunity to collect data on cardiovascular health and risk factors in unprecedented detail and scale, while growing internet connectivity supports the collection of patient-reported outcomes. There is a pressing need to develop robust mechanisms that facilitate data capture from diverse databases and guidance to standardize data definitions. Importantly, the data collection infrastructure should be reusable to support multiple cardiovascular RCTs over time. Systems, processes, and policies will need to have sufficient flexibility to allow interoperability between different sources of data acquisition. Clinical research guidelines, ethics oversight, and regulatory requirements also need to evolve. This review highlights recent progress towards the use of routinely generated data to conduct RCTs and discusses potential solutions for ongoing barriers. There is a particular focus on methods to utilize routinely generated data for trials while complying with regional data protection laws. The discussion is supported with examples of cardiovascular outcome trials that have successfully leveraged the electronic health record, web-enabled devices or administrative databases to conduct randomized trials.
Collapse
Affiliation(s)
- Muhammad Shahzeb Khan
- Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA
| | - Muhammad Shariq Usman
- Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA
| | - Khawaja M Talha
- Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA
| | - Harriette G C Van Spall
- Department of Medicine, McMaster University, Hamilton, ON, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Population Health Research Institute, Hamilton, ON, Canada
| | - Stephen J Greene
- Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA
- Duke Clinical Research Institute, Durham, NC, USA
| | - Muthiah Vaduganathan
- Cardiovascular Division, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Sadiya S Khan
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Nicholas L Mills
- BHF Centre for Cardiovascular Science, University of Edinburgh, Chancellors Building, Royal Infirmary of Edinburgh, Edinburgh, Scotland, UK
- Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
| | - Ziad A Ali
- DeMatteis Cardiovascular Institute, St Francis Hospital and Heart Center, Roslyn, NY, USA
| | - Robert J Mentz
- Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA
- Duke Clinical Research Institute, Durham, NC, USA
| | - Gregg C Fonarow
- Division of Cardiology, University of California Los Angeles, Los Angeles, CA, USA
| | - Sunil V Rao
- Division of Cardiology, New York University Langone Health System, New York, NY, USA
| | - John A Spertus
- Department of Cardiology, Saint Luke's Mid America Heart Institute, Kansas City, MO, USA
- Kansas City's Healthcare Institute for Innovations in Quality, University of Missouri, Kansas, MO, USA
| | - Matthew T Roe
- Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA
- Duke Clinical Research Institute, Durham, NC, USA
| | - Stefan D Anker
- Department of Cardiology (CVK), Berlin Institute of Health Center for Regenerative Therapies (BCRT), and German Centre for Cardiovascular Research (DZHK) Partner Site Berlin, Charité Universitätsmedizin, Berlin, Germany
| | - Stefan K James
- Department of Medical Sciences, Scientific Director UCR, Uppsala University, Uppsala, Uppland, Sweden
| | - Javed Butler
- Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA
- Baylor Scott & White Research Institute, Dallas, TX, USA
| | - Darren K McGuire
- Division of Cardiology, Department of Internal Medicine, UT Southwestern Medical Center and Parkland Health and Hospital System, Dallas, TX, USA
| |
Collapse
|
3
|
Pacheco JA, Rasmussen LV, Wiley K, Person TN, Cronkite DJ, Sohn S, Murphy S, Gundelach JH, Gainer V, Castro VM, Liu C, Mentch F, Lingren T, Sundaresan AS, Eickelberg G, Willis V, Furmanchuk A, Patel R, Carrell DS, Deng Y, Walton N, Satterfield BA, Kullo IJ, Dikilitas O, Smith JC, Peterson JF, Shang N, Kiryluk K, Ni Y, Li Y, Nadkarni GN, Rosenthal EA, Walunas TL, Williams MS, Karlson EW, Linder JE, Luo Y, Weng C, Wei W. Evaluation of the portability of computable phenotypes with natural language processing in the eMERGE network. Sci Rep 2023; 13:1971. [PMID: 36737471 PMCID: PMC9898520 DOI: 10.1038/s41598-023-27481-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.
Collapse
Affiliation(s)
| | | | - Ken Wiley
- National Human Genome Research Institute, Bethesda, USA
| | | | - David J Cronkite
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | | | | | | | | | | | - Cong Liu
- Columbia University, New York, USA
| | - Frank Mentch
- Children's Hospital of Philadelphia, Philadelphia, USA
| | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | | | | | | | | | | | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | - Yu Deng
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | | | | | - Yizhao Ni
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Yikuan Li
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Evanston, USA
| | | | - WeiQi Wei
- Vanderbilt University Medical Center, Nashville, USA
| |
Collapse
|
4
|
Ghanzouri I, Amal S, Ho V, Safarnejad L, Cabot J, Brown-Johnson CG, Leeper N, Asch S, Shah NH, Ross EG. Performance and usability testing of an automated tool for detection of peripheral artery disease using electronic health records. Sci Rep 2022; 12:13364. [PMID: 35922657 PMCID: PMC9349186 DOI: 10.1038/s41598-022-17180-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 07/21/2022] [Indexed: 11/18/2022] Open
Abstract
Peripheral artery disease (PAD) is a common cardiovascular disorder that is frequently underdiagnosed, which can lead to poorer outcomes due to lower rates of medical optimization. We aimed to develop an automated tool to identify undiagnosed PAD and evaluate physician acceptance of a dashboard representation of risk assessment. Data were derived from electronic health records (EHR). We developed and compared traditional risk score models to novel machine learning models. For usability testing, primary and specialty care physicians were recruited and interviewed until thematic saturation. Data from 3168 patients with PAD and 16,863 controls were utilized. Results showed a deep learning model that utilized time engineered features outperformed random forest and traditional logistic regression models (average AUCs 0.96, 0.91 and 0.81, respectively), P < 0.0001. Of interviewed physicians, 75% were receptive to an EHR-based automated PAD model. Feedback emphasized workflow optimization, including integrating risk assessments directly into the EHR, using dashboard designs that minimize clicks, and providing risk assessments for clinically complex patients. In conclusion, we demonstrate that EHR-based machine learning models can accurately detect risk of PAD and that physicians are receptive to automated risk detection for PAD. Future research aims to prospectively validate model performance and impact on patient outcomes.
Collapse
Affiliation(s)
- I Ghanzouri
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - S Amal
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - V Ho
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - L Safarnejad
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - J Cabot
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - C G Brown-Johnson
- Department of Medicine, Primary Care and Population Health, Stanford, CA, USA
| | - N Leeper
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - S Asch
- Department of Medicine, Primary Care and Population Health, Stanford, CA, USA
| | - N H Shah
- Department of Medicine, Center for Biomedical Informatics Research, Stanford University School of Medicine, 780 Welch Road, CJ350, Stanford, CA, 94305, USA
| | - E G Ross
- Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA. .,Department of Medicine, Center for Biomedical Informatics Research, Stanford University School of Medicine, 780 Welch Road, CJ350, Stanford, CA, 94305, USA.
| |
Collapse
|
5
|
van Zuydam NR, Stiby A, Abdalla M, Austin E, Dahlström EH, McLachlan S, Vlachopoulou E, Ahlqvist E, Di Liao C, Sandholm N, Forsblom C, Mahajan A, Robertson NR, Rayner NW, Lindholm E, Sinisalo J, Perola M, Kallio M, Weiss E, Price J, Paterson A, Klein B, Salomaa V, Palmer CN, Groop PH, Groop L, McCarthy MI, de Andrade M, Morris AP, Hopewell JC, Colhoun HM, Kullo IJ. Genome-Wide Association Study of Peripheral Artery Disease. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2021; 14:e002862. [PMID: 34601942 PMCID: PMC8542067 DOI: 10.1161/circgen.119.002862] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 08/31/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Peripheral artery disease (PAD) affects >200 million people worldwide and is associated with high mortality and morbidity. We sought to identify genomic variants associated with PAD overall and in the contexts of diabetes and smoking status. METHODS We identified genetic variants associated with PAD and then meta-analyzed with published summary statistics from the Million Veterans Program and UK Biobank to replicate their findings. Next, we ran stratified genome-wide association analysis in ever smokers, never smokers, individuals with diabetes, and individuals with no history of diabetes and corresponding interaction analyses, to identify variants that modify the risk of PAD by diabetic or smoking status. RESULTS We identified 5 genome-wide significant (Passociation ≤5×10-8) associations with PAD in 449 548 (Ncases=12 086) individuals of European ancestry near LPA (lipoprotein [a]), CDKN2BAS1 (CDKN2B antisense RNA 1), SH2B3 (SH2B adaptor protein 3) - PTPN11 (protein tyrosine phosphatase non-receptor type 11), HDAC9 (histone deacetylase 9), and CHRNA3 (cholinergic receptor nicotinic alpha 3 subunit) loci (which overlapped previously reported associations). Meta-analysis with variants previously associated with PAD showed that 18 of 19 published variants remained genome-wide significant. In individuals with diabetes, rs116405693 at the CCSER1 (coiled-coil serine rich protein 1) locus was associated with PAD (odds ratio [95% CI], 1.51 [1.32-1.74], Pdiabetes=2.5×10-9, Pinteractionwithdiabetes=5.3×10-7). Furthermore, in smokers, rs12910984 at the CHRNA3 locus was associated with PAD (odds ratio [95% CI], 1.15 [1.11-1.19], Psmokers=9.3×10-10, Pinteractionwithsmoking=3.9×10-5). CONCLUSIONS Our analyses confirm the published genetic associations with PAD and identify novel variants that may influence susceptibility to PAD in the context of diabetes or smoking status.
Collapse
Affiliation(s)
- Natalie R. van Zuydam
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Sweden (N.R.v.Z.)
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine (N.R.v.Z., A.M., N.R.R., N.W.R., M.I.M.), University of Oxford, United Kingdom
| | - Alexander Stiby
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health (A.S., J.C.H.), University of Oxford, United Kingdom
| | - Moustafa Abdalla
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
| | - Erin Austin
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic, Rochester, MN (E. Austin, M.d.A., I.J.K.)
| | - Emma H. Dahlström
- Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland (E.H.D., N.S., C.F., P.-H.G.)
- Abdominal Center, Nephrology (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
- Helsinki University Hospital, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
| | - Stela McLachlan
- Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, United Kingdom (S.M., E.W., J.P.)
| | - Efthymia Vlachopoulou
- Department of Medicine, Helsinki University Central Hospital (E.V.), University of Helsinki, Finland
| | - Emma Ahlqvist
- Genomics, Diabetes and Endocrinology, Lund University Diabetes Centre, Malmö, Sweden (E. Ahlqvist, E.L., L.G.)
| | - Chen Di Liao
- Dalla Lana School of Public Health, University of Toronto, ON, Canada (C.D.L., A.P.)
- Genetics & Genome Biology, SickKids, Toronto, ON, Canada (C.D.L., A.P.)
| | - Niina Sandholm
- Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland (E.H.D., N.S., C.F., P.-H.G.)
- Abdominal Center, Nephrology (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
- Helsinki University Hospital, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
| | - Carol Forsblom
- Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland (E.H.D., N.S., C.F., P.-H.G.)
- Abdominal Center, Nephrology (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
- Helsinki University Hospital, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
| | - Anubha Mahajan
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine (N.R.v.Z., A.M., N.R.R., N.W.R., M.I.M.), University of Oxford, United Kingdom
- Now with Genentech, South San Francisco, CA (A.M., M.I.M.)
| | - Neil R. Robertson
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine (N.R.v.Z., A.M., N.R.R., N.W.R., M.I.M.), University of Oxford, United Kingdom
| | - N. William Rayner
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine (N.R.v.Z., A.M., N.R.R., N.W.R., M.I.M.), University of Oxford, United Kingdom
- Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom (N.W.R.)
| | - Eero Lindholm
- Genomics, Diabetes and Endocrinology, Lund University Diabetes Centre, Malmö, Sweden (E. Ahlqvist, E.L., L.G.)
| | - Juha Sinisalo
- Heart and Lung Center (J.S.), University of Helsinki, Finland
| | - Markus Perola
- Institute for Molecular Medicine Finland (FIMM) (M.P., L.G.), University of Helsinki, Finland
- Finnish Institute for Health and Welfare, Helsinki, Finland (M.P., V.S.)
| | - Milla Kallio
- Vascular Surgery, Abdominal Center (M.K.), University of Helsinki, Finland
| | - Emily Weiss
- Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, United Kingdom (S.M., E.W., J.P.)
| | - Jackie Price
- Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, United Kingdom (S.M., E.W., J.P.)
| | - Andrew Paterson
- Dalla Lana School of Public Health, University of Toronto, ON, Canada (C.D.L., A.P.)
- Genetics & Genome Biology, SickKids, Toronto, ON, Canada (C.D.L., A.P.)
| | - Barbara Klein
- Ocular Epidemiology Research Group, University of Wisconsin-Madison (B.K.)
| | - Veikko Salomaa
- Finnish Institute for Health and Welfare, Helsinki, Finland (M.P., V.S.)
| | - Colin N.A. Palmer
- Pat Macpherson Centre for Pharmacogenetics and Pharmacogenomics, Ninewells Hospital and Medical School, University of Dundee, United Kingdom (C.N.A.P.)
| | - Per-Henrik Groop
- Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland (E.H.D., N.S., C.F., P.-H.G.)
- Abdominal Center, Nephrology (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
- Helsinki University Hospital, Research Program for Clinical and Molecular Metabolism, Faculty of Medicine (E.H.D., N.S., C.F., P.-H.G.), University of Helsinki, Finland
- Department of Medicine, Central Clinical School, Monash University, Melbourne, Victoria, Australia (P.-H.G.)
| | - Leif Groop
- Institute for Molecular Medicine Finland (FIMM) (M.P., L.G.), University of Helsinki, Finland
- Genomics, Diabetes and Endocrinology, Lund University Diabetes Centre, Malmö, Sweden (E. Ahlqvist, E.L., L.G.)
| | - Mark I. McCarthy
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine (N.R.v.Z., A.M., N.R.R., N.W.R., M.I.M.), University of Oxford, United Kingdom
- Oxford NIHR Biomedical Research Centre, Oxford University Hospitals Trust, United Kingdom (M.I.M.)
- Now with Genentech, South San Francisco, CA (A.M., M.I.M.)
| | - Mariza de Andrade
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic, Rochester, MN (E. Austin, M.d.A., I.J.K.)
| | - Andrew P. Morris
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine (N.R.v.Z., M.A., A.M., N.R.R., N.W.R., M.I.M., A.P.M.), University of Oxford, United Kingdom
- Department of Biostatistics, University of Liverpool, United Kingdom (A.P.M.)
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, United Kingdom (A.P.M.)
| | - Jemma C. Hopewell
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health (A.S., J.C.H.), University of Oxford, United Kingdom
| | - Helen M. Colhoun
- Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital Campus, United Kingdom (H.M.C.)
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic, Rochester, MN (E. Austin, M.d.A., I.J.K.)
| |
Collapse
|
6
|
Saadatagah S, Pasha AK, Alhalabi L, Sandhyavenu H, Farwati M, Smith CY, Wood‐Wentz CM, Bailey KR, Kullo IJ. Coronary Heart Disease Risk Associated with Primary Isolated Hypertriglyceridemia; a Population-Based Study. J Am Heart Assoc 2021; 10:e019343. [PMID: 34032140 PMCID: PMC8483538 DOI: 10.1161/jaha.120.019343] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 03/23/2021] [Indexed: 12/18/2022]
Abstract
Background Hypertriglyceridemia is associated with increased risk of coronary heart disease but the association is often attributed to concomitant metabolic abnormalities. We investigated the epidemiology of primary isolated hypertriglyceridemia (PIH) and associated cardiovascular risk in a population-based setting. Methods and Results We identified adults with at least one triglyceride level ≥500 mg/dL between 1998 and 2015 in Olmsted County, Minnesota. We also identified age- and sex-matched controls with triglyceride levels <150 mg/dL. There were 3329 individuals with elevated triglyceride levels; after excluding those with concomitant hypercholesterolemia, a secondary cause of high triglycerides, age <18 years or an incomplete record, 517 patients (49.4±14.0 years, 72.0% men) had PIH (triglyceride 627.6±183.6 mg/dL). The age- and sex-adjusted prevalence of PIH in adults was 0.80% (0.72-0.87); the diagnosis was recorded in 60%, 46% were on a lipid-lowering medication for primary prevention and a triglyceride level <150 mg/dL was achieved in 24.1%. The association of PIH with coronary heart disease was attenuated but remained significant after adjustment for demographic, socioeconomic, and conventional cardiovascular risk factors (hazard ratio [HR], 1.53; 95% CI, 1.06-2.20; P= 0.022). There was no statistically significant association between PIH and cerebrovascular disease (HR, 1.06; 95% CI, 0.65-1.73, P= 0.813), peripheral artery disease (HR, 1.27; 95% CI, 0.43-3.75; P= 0.668), or the composite end point of all 3 (HR, 1.28; 95% CI, 0.92-1.80; P=0.148) in adjusted models. Conclusions PIH was associated with incident coronary heart disease events (although there was attenuation after adjustment for conventional risk factors), supporting a causal role for triglycerides in coronary heart disease. The condition is relatively prevalent but awareness and control are low.
Collapse
Affiliation(s)
| | - Ahmed K. Pasha
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
| | - Lubna Alhalabi
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
| | | | - Medhat Farwati
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
| | - Carin Y. Smith
- Division of Clinical Trials and BiostatisticsMayo ClinicRochesterMN
| | | | - Kent R. Bailey
- Division of Clinical Trials and BiostatisticsMayo ClinicRochesterMN
| | - Iftikhar J. Kullo
- Department of Cardiovascular MedicineMayo ClinicRochesterMN
- Gonda Vascular CenterMayo ClinicRochesterMN
| |
Collapse
|
7
|
Abstract
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes.
Collapse
|
8
|
Abstract
PURPOSE OF REVIEW Healthcare has already been impacted by the fourth industrial revolution exemplified by tip of spear technology, such as artificial intelligence and quantum computing. Yet, there is much to be accomplished as systems remain suboptimal, and full interoperability of digital records is not realized. Given the footprint of technology in healthcare, the field of clinical immunology will certainly see improvements related to these tools. RECENT FINDINGS Biomedical informatics spans the gamut of technology in biomedicine. Within this distinct field, advances are being made, which allow for engineering of systems to automate disease detection, create computable phenotypes and improve record portability. Within clinical immunology, technologies are emerging along these lines and are expected to continue. SUMMARY This review highlights advancements in digital health including learning health systems, electronic phenotyping, artificial intelligence and use of registries. Technological advancements for improving diagnosis and care of patients with primary immunodeficiency diseases is also highlighted.
Collapse
|
9
|
Palmer MR, Kim DS, Crosslin DR, Stanaway IB, Rosenthal EA, Carrell DS, Cronkite DJ, Gordon A, Du X, Li YK, Williams MS, Weng C, Feng Q, Li R, Pendergrass SA, Hakonarson H, Fasel D, Sohn S, Sleiman P, Handelman SK, Speliotes E, Kullo IJ, Larson EB, Jarvik GP. Loci identified by a genome-wide association study of carotid artery stenosis in the eMERGE network. Genet Epidemiol 2020; 45:4-15. [PMID: 32964493 PMCID: PMC7891640 DOI: 10.1002/gepi.22360] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 07/29/2020] [Accepted: 08/11/2020] [Indexed: 12/21/2022]
Abstract
Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.
Collapse
Affiliation(s)
- Melody R Palmer
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington, USA
| | - Daniel S Kim
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - David R Crosslin
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, Washington, USA
| | - Ian B Stanaway
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, Washington, USA
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington, USA
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - David J Cronkite
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Adam Gordon
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA
| | - Xiaomeng Du
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Yatong K Li
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Qiping Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, Maryland, USA
| | | | - Hakon Hakonarson
- Department of Pediatrics, The Center for Applied Genomics, Children's Hospital of Philadelphia, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - David Fasel
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | | | - Patrick Sleiman
- Department of Pediatrics, The Children's Hospital of Philadelphia, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Samuel K Handelman
- Division of Gastroenterology, Department of Internal Medicine and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Elizabeth Speliotes
- Division of Gastroenterology, Department of Internal Medicine and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | -
- The electronic Medical Records and GEnomics Network, NHGRI, NIH, Bethesda, Maryland, USA
| | - Gail P Jarvik
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, Washington, USA
| |
Collapse
|
10
|
Weissler EH, Lippmann SJ, Smerek MM, Ward RA, Kansal A, Brock A, Sullivan RC, Long C, Patel MR, Greiner MA, Hardy NC, Curtis LH, Jones WS. Model-Based Algorithms for Detecting Peripheral Artery Disease Using Administrative Data From an Electronic Health Record Data System: Algorithm Development Study. JMIR Med Inform 2020; 8:e18542. [PMID: 32663152 PMCID: PMC7468640 DOI: 10.2196/18542] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 06/21/2020] [Accepted: 06/28/2020] [Indexed: 12/18/2022] Open
Abstract
Background Peripheral artery disease (PAD) affects 8 to 10 million Americans, who face significantly elevated risks of both mortality and major limb events such as amputation. Unfortunately, PAD is relatively underdiagnosed, undertreated, and underresearched, leading to wide variations in treatment patterns and outcomes. Efforts to improve PAD care and outcomes have been hampered by persistent difficulties identifying patients with PAD for clinical and investigatory purposes. Objective The aim of this study is to develop and validate a model-based algorithm to detect patients with peripheral artery disease (PAD) using data from an electronic health record (EHR) system. Methods An initial query of the EHR in a large health system identified all patients with PAD-related diagnosis codes for any encounter during the study period. Clinical adjudication of PAD diagnosis was performed by chart review on a random subgroup. A binary logistic regression to predict PAD was built and validated using a least absolute shrinkage and selection operator (LASSO) approach in the adjudicated patients. The algorithm was then applied to the nonsampled records to further evaluate its performance. Results The initial EHR data query using 406 diagnostic codes yielded 15,406 patients. Overall, 2500 patients were randomly selected for ground truth PAD status adjudication. In the end, 108 code flags remained after removing rarely- and never-used codes. We entered these code flags plus administrative encounter, imaging, procedure, and specialist flags into a LASSO model. The area under the curve for this model was 0.862. Conclusions The algorithm we constructed has two main advantages over other approaches to the identification of patients with PAD. First, it was derived from a broad population of patients with many different PAD manifestations and treatment pathways across a large health system. Second, our model does not rely on clinical notes and can be applied in situations in which only administrative billing data (eg, large administrative data sets) are available. A combination of diagnosis codes and administrative flags can accurately identify patients with PAD in large cohorts.
Collapse
Affiliation(s)
- Elizabeth Hope Weissler
- Division of Vascular and Endovascular Surgery, Duke University School of Medicine, Durham, NC, United States
| | - Steven J Lippmann
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States
| | - Michelle M Smerek
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States
| | - Rachael A Ward
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States
| | - Aman Kansal
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States
| | - Adam Brock
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States
| | - Robert C Sullivan
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States
| | - Chandler Long
- Division of Vascular and Endovascular Surgery, Duke University School of Medicine, Durham, NC, United States
| | - Manesh R Patel
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States.,Department of Medicine, Duke University School of Medicine, Durham, NC, United States.,Duke Clinical Research Institute, Durham, NC, United States
| | - Melissa A Greiner
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States
| | - N Chantelle Hardy
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States
| | - Lesley H Curtis
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States.,Duke Clinical Research Institute, Durham, NC, United States
| | - W Schuyler Jones
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, United States.,Department of Medicine, Duke University School of Medicine, Durham, NC, United States.,Duke Clinical Research Institute, Durham, NC, United States
| |
Collapse
|
11
|
Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network. J Biomed Inform 2019; 99:103293. [PMID: 31542521 DOI: 10.1016/j.jbi.2019.103293] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 08/26/2019] [Accepted: 09/19/2019] [Indexed: 11/21/2022]
Abstract
BACKGROUND Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ± 1.38. Specifically, the average knowledge (K) score is 0.64 ± 0.66, interpretation (I) score is 0.33 ± 0.55, and programming (P) score is 0.40 ± 0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.
Collapse
|
12
|
Gon Y, Yamamoto K, Mochizuki H. The Accuracy of Diagnostic Codes in Electronic Medical Records in Japan. J Med Syst 2019; 43:315. [PMID: 31494721 DOI: 10.1007/s10916-019-1450-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 09/03/2019] [Indexed: 11/24/2022]
Affiliation(s)
- Yasufumi Gon
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Keiichi Yamamoto
- Department of Medical Informatics, Wakayama Medical University, 811-1, Kimiidera, Wakayama, 641-8509, Japan
| | - Hideki Mochizuki
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
13
|
Taylor CO, Lemke KW, Richards TM, Roe KD, He T, Arruda-Olson A, Carrell D, Denny JC, Hripcsak G, Kiryluk K, Kullo I, Larson EB, Peissig P, Walton NA, Wei-Qi W, Ye Z, Chute CG, Weiner JP. Comorbidity Characterization Among eMERGE Institutions: A Pilot Evaluation with the Johns Hopkins Adjusted Clinical Groups® System. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:145-152. [PMID: 31258966 PMCID: PMC6568092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Electronic health records (EHR) are valuable to define phenotype selection algorithms used to identify cohorts ofpatients for sequencing or genome wide association studies (GWAS). To date, the electronic medical records and genomics (eMERGE) network institutions have developed and applied such algorithms to identify cohorts with associated DNA samples used to discover new genetic associations. For complex diseases, there are benefits to stratifying cohorts using comorbidities in order to identify their genetic determinants. The objective of this study was to: (a) characterize comorbidities in a range of phenotype-selected cohorts using the Johns Hopkins Adjusted Clinical Groups® (ACG®) System, (b) assess the frequency of important comorbidities in three commonly studied GWAS phenotypes, and (c) compare the comorbidity characterization of cases and controls. Our analysis demonstrates a framework to characterize comorbidities using the ACG system and identified differences in mean chronic condition count among GWAS cases and controls. Thus, we believe there is great potential to use the ACG system to characterize comorbidities among genetic cohorts selected based on EHR phenotypes.
Collapse
Affiliation(s)
- Casey Overby Taylor
- Johns Hopkins University School of Medicine
- Johns Hopkins University School of Public Health
| | | | | | | | - Ting He
- Johns Hopkins University School of Medicine
| | | | - David Carrell
- Kaiser Permanente Washington Health Research Institute
| | | | | | | | | | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute
| | | | | | | | | | - Christopher G Chute
- Johns Hopkins University School of Medicine
- Johns Hopkins University School of Public Health
| | | |
Collapse
|
14
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
15
|
Chen HH, Petty LE, Bush W, Naj AC, Below JE. GWAS and Beyond: Using Omics Approaches to Interpret SNP Associations. CURRENT GENETIC MEDICINE REPORTS 2019; 7:30-40. [PMID: 33312764 PMCID: PMC7731888 DOI: 10.1007/s40142-019-0159-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
PURPOSE OF REVIEW Neurodegenerative diseases, neuropsychiatric disorders, and related traits have highly complex etiologies but are also highly heritable and identifying the causal genes and biological pathways underlying these traits may advance the development of treatments and preventive strategies. While many genome-wide association studies (GWAS) have successfully identified variants contributing to polygenic neurodegenerative and neuropsychiatric phenotypes including Alzheimer's disease (AD), schizophrenia (SCZ), and bipolar disorder (BPD) amongst others, interpreting the biological roles of significantly-associated variants in the genetic architecture of these traits remains a significant challenge. Here we review several 'omics' approaches which attempt to bridge the gap from associated genetic variants to phenotype by helping define the functional roles of GWAS loci in the development of neuropsychiatric disorders and traits. RECENT FINDINGS Several common 'omics' approaches have been applied to examine neuropsychiatric traits, such as nearest-gene mapping, trans-ethnic fine mapping, annotation enrichment analysis, transcriptomic analysis, and pathway analysis, and each of these approaches has strengths and limitations in providing insight into biological mechanisms. One popular emerging method is the examination of tissue-specific genetically-regulated gene expression (GReX), which aggregates the genetic variants' effects at the gene-level. Furthermore, proteomic, metabolomic, and microbiomic studies and phenome-wide association studies will further enhance our understanding of neuropsychiatric traits. SUMMARY GWAS has been applied to neuropsychiatric traits for a decade, but our understanding about the biological function of identified variants remains limited. Today, technological advancements have created analytical approaches for integrating transcriptomics, metabolomics, proteomics, pharmacology and toxicology as tools for understanding the functional roles of genetics variants. These data, as well as the broader clinical information provided by electronic health records, can provide additional insight and complement genomic analyses.
Collapse
Affiliation(s)
- Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lauren E. Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William Bush
- Institute for Computational Biology, Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology, and Informatics; Department of Pathology and Laboratory Medicine; Center for Clinical Epidemiology and Biostatistics; Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
16
|
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:139-153. [PMID: 29994486 PMCID: PMC6388621 DOI: 10.1109/tcbb.2018.2849968] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This article reviews recent advances in applying natural language processing (NLP) to Electronic Health Records (EHRs) for computational phenotyping. NLP-based computational phenotyping has numerous applications including diagnosis categorization, novel phenotype discovery, clinical trial screening, pharmacogenomics, drug-drug interaction (DDI), and adverse drug event (ADE) detection, as well as genome-wide and phenome-wide association studies. Significant progress has been made in algorithm development and resource construction for computational phenotyping. Among the surveyed methods, well-designed keyword search and rule-based systems often achieve good performance. However, the construction of keyword and rule lists requires significant manual effort, which is difficult to scale. Supervised machine learning models have been favored because they are capable of acquiring both classification patterns and structures from data. Recently, deep learning and unsupervised learning have received growing attention, with the former favored for its performance and the latter for its ability to find novel phenotypes. Integrating heterogeneous data sources have become increasingly important and have shown promise in improving model performance. Often, better performance is achieved by combining multiple modalities of information. Despite these many advances, challenges and opportunities remain for NLP-based computational phenotyping, including better model interpretability and generalizability, and proper characterization of feature relations in clinical narratives.
Collapse
Affiliation(s)
- Zexian Zeng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Yu Deng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| | - Xiaoyu Li
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
| | - Tristan Naumann
- Science and Artificial Intelligence Lab, Massachusetts Institue of Technology, Cambridge, MA 02139.
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
| |
Collapse
|
17
|
Moussa Pacha H, Mallipeddi VP, Afzal N, Moon S, Kaggal VC, Kalra M, Oderich GS, Wennberg PW, Rooke TW, Scott CG, Kullo IJ, McBane RD, Nishimura RA, Chaudhry R, Liu H, Arruda-Olson AM. Association of Ankle-Brachial Indices With Limb Revascularization or Amputation in Patients With Peripheral Artery Disease. JAMA Netw Open 2018; 1:e185547. [PMID: 30646276 PMCID: PMC6324363 DOI: 10.1001/jamanetworkopen.2018.5547] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
IMPORTANCE The prevalence and morbidity of peripheral artery disease (PAD) are high, with limb outcomes including revascularization and amputation. In community-dwelling patients with PAD, the role of noninvasive evaluation for risk assessment and rates of limb outcomes have not been established to date. OBJECTIVE To evaluate whether ankle-brachial indices are associated with limb outcomes in community-dwelling patients with PAD. DESIGN, SETTING, AND PARTICIPANTS A population-based, observational, test-based cohort study of patients was performed from January 1, 1998, to December 31, 2014. Data analysis was conducted from July 15 to December 15, 2017. Participants included a community-based cohort of 1413 patients with PAD from Olmsted County, Minnesota, identified by validated algorithms deployed to electronic health records. Automated algorithms identified limb outcomes used to build Cox proportional hazards regression models. Ankle-brachial indices and presence of poorly compressible arteries were electronically identified from digital data sets. Guideline-recommended management strategies within 6 months of diagnosis were also electronically retrieved, including therapy with statins, antiplatelet agents, angiotensin-converting enzyme inhibitors or angiotensin-receptor blockers, and smoking abstention. MAIN OUTCOMES AND MEASURES Ankle-brachial index (index ≤0.9 indicates PAD; <.05, severe PAD; and ≥1.40, poorly compressible arteries) and limb revascularization or amputation. RESULTS Of 1413 patients, 633 (44.8%) were women; mean (SD) age was 70.8 (13.3) years. A total of 283 patients (20.0%) had severe PAD (ankle-brachial indices <0.5) and 350 (24.8%) had poorly compressible arteries (ankle-brachial indices ≥1.4); 780 (55.2%) individuals with less than severe disease formed the reference group. Only 32 of 283 patients (11.3%) with severe disease and 68 of 350 patients (19.4%) with poorly compressible arteries were receiving 4 guideline-recommended management strategies. In the severe disease subgroup, the 1-year event rate for revascularization was 32.4% (90 events); in individuals with poorly compressible arteries, the 1-year amputation rate was 13.9% (47 events). In models adjusted for age, sex, and critical limb ischemia, poorly compressible arteries were associated with amputation (hazard ratio [HR], 3.12; 95% CI, 2.16-4.50; P < .001) but not revascularization (HR, 0.91; 95% CI, 0.69-1.20; P = .49). In contrast, severe disease was associated with revascularization (HR, 2.69; 95% CI, 2.15-3.37; P < .001) but not amputation (HR, 1.30; 95% CI, 0.82-2.07; P = .27). CONCLUSIONS AND RELEVANCE Community-dwelling patients with severe PAD or poorly compressible arteries have high rates of revascularization or limb loss, respectively. Guideline-recommended management strategies for secondary risk prevention are underused in the community.
Collapse
Affiliation(s)
- Homam Moussa Pacha
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Vishnu P. Mallipeddi
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Naveed Afzal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Vinod C. Kaggal
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Manju Kalra
- Division of Vascular Surgery, Department of Surgery, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Gustavo S. Oderich
- Division of Vascular Surgery, Department of Surgery, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Paul W. Wennberg
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Thom W. Rooke
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Christopher G. Scott
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Robert D. McBane
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Rick A. Nishimura
- Department of Cardiovascular Medicine, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Rajeev Chaudhry
- Division of Primary Care Medicine and Center of Translational Informatics and Knowledge Management, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, Minnesota
| | | |
Collapse
|
18
|
Missing Data, Data Cleansing, and Treatment From a Primary Study: Implications for Predictive Models. Comput Inform Nurs 2018; 36:367-371. [PMID: 30095571 DOI: 10.1097/cin.0000000000000473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
A Network-Biology Informed Computational Drug Repositioning Strategy to Target Disease Risk Trajectories and Comorbidities of Peripheral Artery Disease. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:108-117. [PMID: 29888052 PMCID: PMC5961807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Currently, drug discovery approaches focus on the design of therapies that alleviate an index symptom by reengineering the underlying biological mechanism in agonistic or antagonistic fashion. For example, medicines are routinely developed to target an essential gene that drives the disease mechanism. Therapeutic overloading where patients get multiple medications to reduce the primary and secondary side effect burden is standard practice. This single-symptom based approach may not be scalable, as we understand that diseases are more connected than random and molecular interactions drive disease comorbidities. In this work, we present a proof-of-concept drug discovery strategy by combining network biology, disease comorbidity estimates, and computational drug repositioning, by targeting the risk factors and comorbidities of peripheral artery disease, a vascular disease associated with high morbidity and mortality. Individualized risk estimation and recommending disease sequelae based therapies may help to lower the mortality and morbidity of peripheral artery disease.
Collapse
|
20
|
Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semantics 2018; 9:12. [PMID: 29602312 PMCID: PMC5877394 DOI: 10.1186/s13326-018-0179-8] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 02/14/2018] [Indexed: 01/22/2023] Open
Abstract
Background Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Collapse
Affiliation(s)
- Aurélie Névéol
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| | | | - Sumithra Velupillai
- School of Computer Science and Communication, KTH, Stockholm, Sweden.,Institute of Psychiatry, Psychology and Neuroscience, King's College, London, UK
| | - Guergana Savova
- Children's Hospital Boston and Harvard Medical School, Boston, Massachusetts, USA
| | - Pierre Zweigenbaum
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| |
Collapse
|
21
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 340] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
22
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
23
|
Gon Y, Kabata D, Yamamoto K, Shintani A, Todo K, Mochizuki H, Sakaguchi M. Validation of an algorithm that determines stroke diagnostic code accuracy in a Japanese hospital-based cancer registry using electronic medical records. BMC Med Inform Decis Mak 2017; 17:157. [PMID: 29202795 PMCID: PMC5715513 DOI: 10.1186/s12911-017-0554-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 11/19/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND This study aimed to validate an algorithm that determines stroke diagnostic code accuracy, in a hospital-based cancer registry, using electronic medical records (EMRs) in Japan. METHODS The subjects were 27,932 patients enrolled in the hospital-based cancer registry of Osaka University Hospital, between January 1, 2007 and December 31, 2015. The ICD-10 (international classification of diseases, 10th revision) diagnostic codes for stroke were extracted from the EMR database. Specifically, subarachnoid hemorrhage (I60); intracerebral hemorrhage (I61); cerebral infarction (I63); and other transient cerebral ischemic attacks and related syndromes and transient cerebral ischemic attack (unspecified) (G458 and G459), respectively. Diagnostic codes, both "definite" and "suspected," and brain imaging information were extracted from the database. We set the algorithm with the combination of the diagnostic code and/or the brain imaging information, and manually reviewed the presence or absence of the acute cerebrovascular disease with medical charts. RESULTS A total of 2654 diagnostic codes, 1991 "definite" and 663 "suspected," were identified. After excluding duplicates, the numbers of "definite" and "suspected" diagnostic codes were 912 and 228, respectively. The proportion of the presence of the disease in the "definite" diagnostic code was 22%; this raised 51% with the combination of the diagnostic code and the use of brain imaging information. When adding the interval of when brain imaging was performed (within 30 days and within 1 day) to the diagnostic code, the proportion increased to 84% and 90%, respectively. In the algorithm of "definite" diagnostic code, history of stroke was the most common in the diagnostic code, but in the algorithm of "definite" diagnostic code and the use of brain imaging within 1 day, stroke mimics was the most frequent. CONCLUSIONS Combining the diagnostic code and clinical examination improved the proportion of the presence of disease in the diagnostic code and achieved appropriate accuracy for research. Clinical research using EMRs require outcome validation prior to conducting a study.
Collapse
Affiliation(s)
- Yasufumi Gon
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Daijiro Kabata
- Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Keichi Yamamoto
- Department of Drug and Food Clinical Evaluation, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Ayumi Shintani
- Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan
| | - Kenichi Todo
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Hideki Mochizuki
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Manabu Sakaguchi
- Department of Neurology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
24
|
Esteban S, Rodríguez Tablado M, Peper FE, Mahumud YS, Ricci RI, Kopitowski KS, Terrasa SA. Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 152:53-70. [PMID: 29054261 DOI: 10.1016/j.cmpb.2017.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/19/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs. METHODS Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set. RESULTS The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98). CONCLUSIONS The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.
Collapse
Affiliation(s)
- Santiago Esteban
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina..
| | | | - Francisco E Peper
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Yamila S Mahumud
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Ricardo I Ricci
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Karin S Kopitowski
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Sergio A Terrasa
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Public Health Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
25
|
Sohn S, Wang Y, Wi CI, Krusemark EA, Ryu E, Ali MH, Juhn YJ, Liu H. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. J Am Med Inform Assoc 2017; 25:353-359. [PMID: 29202185 PMCID: PMC7378885 DOI: 10.1093/jamia/ocx138] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/20/2017] [Accepted: 10/25/2017] [Indexed: 12/11/2022] Open
Abstract
Objective To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods Birth cohorts from Mayo Clinic and Sanford Children’s Hospital (SCH) were used in this study (n = 298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results There exist notable lexical variations (word-level similarity = 0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity = 0.944, asthma-related concept similarity = 0.971). The NLP system for asthma ascertainment had anF-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.
Collapse
Affiliation(s)
- Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Chung-Il Wi
- Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Euijung Ryu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Mir H Ali
- Department of Pediatrics, Sanford Children's Hospital, Sioux Falls, SD, USA
| | - Young J Juhn
- Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
26
|
Arruda-Olson AM, Moussa Pacha H, Afzal N, Abram S, Lewis BR, Isseh I, Haddad R, Scott CG, Bailey K, Liu H, Rooke TW, Kullo IJ. Burden of hospitalization in clinically diagnosed peripheral artery disease: A community-based study. Vasc Med 2017; 23:23-31. [PMID: 29068255 DOI: 10.1177/1358863x17736152] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The burden and predictors of hospitalization over time in community-based patients with peripheral artery disease (PAD) have not been established. This study evaluates the frequency, reasons and predictors of hospitalization over time in community-based patients with PAD. We assembled an inception cohort of 1798 PAD cases from Olmsted County, MN, USA (mean age 71.2 years, 44% female) from 1 January 1998 through 31 December 2011 who were followed until 2014. Two age- and sex-matched controls ( n = 3596) were identified for each case. ICD-9 codes were used to ascertain the primary reasons for hospitalization. Patients were censored at death or last follow-up. The most frequent reasons for hospitalization were non-cardiovascular: 68% of 8706 hospitalizations in cases and 78% of 8005 hospitalizations in controls. A total of 1533 (85%) cases and 2286 (64%) controls ( p < 0.001) were hospitalized at least once; 1262 (70%) cases and 1588 (44%) controls ( p < 0.001) ≥ two times. In adjusted models, age, prior hospitalization and comorbid conditions were independently associated with increased risk of recurrent hospitalizations in both groups. In cases, severe PAD (ankle-brachial index < 0.5) (HR: 1.25; 95% CI: 1.15, 1.36) and poorly compressible arteries (HR: 1.26; 95% CI: 1.16, 1.38) were each associated with increased risk for recurrent hospitalization. We demonstrate an increased rate of hospitalization in community-based patients with PAD and identify predictors of recurrent hospitalizations. These observations may inform strategies to reduce the burden of hospitalization of PAD patients.
Collapse
Affiliation(s)
| | - Homam Moussa Pacha
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Naveed Afzal
- 2 Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, USA
| | - Sara Abram
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Bradley R Lewis
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Iyad Isseh
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Raad Haddad
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Christopher G Scott
- 2 Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, USA
| | - Kent Bailey
- 2 Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, USA
| | - Hongfang Liu
- 2 Department of Health Sciences Research, Mayo Clinic and Mayo Foundation, Rochester, MN, USA
| | - Thom W Rooke
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| | - Iftikhar J Kullo
- 1 Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, MN, USA
| |
Collapse
|
27
|
Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. Yearb Med Inform 2017; 26:38-52. [PMID: 28480475 PMCID: PMC6239225 DOI: 10.15265/iy-2017-007] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research.
Collapse
Affiliation(s)
- S. M. Meystre
- Medical University of South Carolina, Charleston, SC, USA
| | - C. Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Switzerland
| | - T. Bürkle
- University of Applied Sciences, Bern, Switzerland
| | - G. Tognola
- Institute of Electronics, Computer and Telecommunication Engineering, Italian Natl. Research Council IEIIT-CNR, Milan, Italy
| | - A. Budrionis
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - C. U. Lehmann
- Departments of Biomedical Informatics and Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
28
|
Lin FPY, Pokorny A, Teng C, Epstein RJ. TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records. Sci Rep 2017; 7:6918. [PMID: 28761061 PMCID: PMC5537364 DOI: 10.1038/s41598-017-07111-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 06/21/2017] [Indexed: 12/13/2022] Open
Abstract
Vast amounts of clinically relevant text-based variables lie undiscovered and unexploited in electronic medical records (EMR). To exploit this untapped resource, and thus facilitate the discovery of informative covariates from unstructured clinical narratives, we have built a novel computational pipeline termed Text-based Exploratory Pattern Analyser for Prognosticator and Associator discovery (TEPAPA). This pipeline combines semantic-free natural language processing (NLP), regular expression induction, and statistical association testing to identify conserved text patterns associated with outcome variables of clinical interest. When we applied TEPAPA to a cohort of head and neck squamous cell carcinoma patients, plausible concepts known to be correlated with human papilloma virus (HPV) status were identified from the EMR text, including site of primary disease, tumour stage, pathologic characteristics, and treatment modalities. Similarly, correlates of other variables (including gender, nodal status, recurrent disease, smoking and alcohol status) were also reliably recovered. Using highly-associated patterns as covariates, a patient's HPV status was classifiable using a bootstrap analysis with a mean area under the ROC curve of 0.861, suggesting its predictive utility in supporting EMR-based phenotyping tasks. These data support using this integrative approach to efficiently identify disease-associated factors from unstructured EMR narratives, and thus to efficiently generate testable hypotheses.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia.
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
| | - Adrian Pokorny
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia
| | - Christina Teng
- Department of Medical Oncology, Liverpool Hospital, Liverpool, Sydney, NSW, Australia
| | - Richard J Epstein
- Department of Oncology, St Vincent's Hospital & The Kinghorn Cancer Centre, Darlinghurst, NSW, Australia
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| |
Collapse
|
29
|
Goodloe R, Farber-Eger E, Boston J, Crawford DC, Bush WS. Reducing Clinical Noise for Body Mass Index Measures Due to Unit and Transcription Errors in the Electronic Health Record. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:102-111. [PMID: 28815116 PMCID: PMC5543370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Body mass index (BMI) is an important outcome and covariate adjustment for many clinical association studies. Accurate assessment of BMI, therefore, is a critical part of many study designs. Electronic health records (EHRs) are a growing source of clinical data for research purposes, and have proven useful for identifying and replicating genetic associations. EHR-based data collected for clinical and billing purposes have several unique properties, including a high degree of heterogeneity or "clinical noise." In this work, we propose a new method for reducing the problems of transcription and recording error for height and weight and apply these methods to a subset of the Vanderbilt University Medical Center biorepository known as EAGLE BioVU (n=15,863). After processing, we show that the distribution of BMI from EAGLE BioVU closely matches population-based estimates from the National Health and Nutrition Examination Surveys (NHANES), and that our approach retains far more data points than traditional outlier detection methods.
Collapse
Affiliation(s)
- Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric Farber-Eger
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dana C. Crawford
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S. Bush
- Institute for Computational Biology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
30
|
Ye Z, Austin E, Schaid DJ, Bailey KR, Pellikka PA, Kullo IJ. ADAB2IPgenotype: sex interaction is associated with abdominal aortic aneurysm expansion. J Investig Med 2017; 65:1077-1082. [DOI: 10.1136/jim-2016-000404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 02/06/2023]
Abstract
A faster expansion rate of abdominal aortic aneurysm (AAA) increases the risk of rupture. Women are at higher risk of rupture than men, but the mechanisms underlying this increased risk are unknown. We investigated whether genetic variants that influence susceptibility for AAA (CDKN2A-2B,SORT1,DAB2IP,LRP1andLDLR) are associated with AAA expansion and whether these associations differ by sex in 650 patients with AAA (mean age 70±8 years, 17% women) enrolled in the Mayo Clinic Vascular Disease Biorepository. Women had a mean aneurysm expansion 0.41 mm/year greater than men after adjustment for baseline AAA size. In addition to baseline size, mean arterial pressure (MAP), non-diabetic status,SORT1-rs599839[G] andDAB2IP-rs7025486[A] were associated with greater aneurysm expansion (all p<0.05). The associations of MAP and rs599839[G] were similar in both sexes, while the associations of baseline size, pulse pressure (PP) and rs7025486[A] were stronger in women than men (all p-sexinteraction≤0.02). A three-way interaction of PP*sex* rs7025486[A] was noted in a full-factorial analysis (p=0.007) independent of baseline size and MAP. In the high PP group (≥median), women had a mean growth rate 0.68 mm/year greater per [A] of rs7025486 than men (p-sexinteraction=0.003), whereas there was no difference in the low PP group (p-sexinteraction=0.8). We demonstrate that variantsDAB2IP-rs7025486[A] andSORT1-rs599839[G] are associated with AAA expansion. The association of rs7025486[A] is stronger in women than men and amplified by high PP, contributing to sex differences in aneurysm expansion.
Collapse
|
31
|
Sohn S, Larson DW, Habermann EB, Naessens JM, Alabbad JY, Liu H. Detection of clinically important colorectal surgical site infection using Bayesian network. J Surg Res 2017; 209:168-173. [PMID: 28032554 PMCID: PMC5391146 DOI: 10.1016/j.jss.2016.09.058] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 09/15/2016] [Accepted: 09/28/2016] [Indexed: 11/19/2022]
Abstract
BACKGROUND Despite extensive efforts to monitor and prevent surgical site infections (SSIs), real-time surveillance of clinical practice has been sparse and expensive or nonexistent. However, natural language processing (NLP) and machine learning (i.e., Bayesian network analysis) may provide the methodology necessary to approach this issue in a new way. We investigated the ability to identify SSIs after colorectal surgery (CRS) through an automated detection system using a Bayesian network. MATERIALS AND METHODS Patients who underwent CRS from 2010 to 2012 and were captured in our institutional American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) comprised our cohort. A Bayesian network was applied to detect SSIs using risk factors from ACS-NSQIP data and keywords extracted from clinical notes by NLP. Two surgeons provided expertise informing the Bayesian network to identify clinically meaningful SSIs (CM-SSIs) occurring within 30 d after surgery. RESULTS We used data from 751 CRS cases experiencing 67 (8.9%) SSIs and 78 (10.4%) CM-SSIs. Our Bayesian network detected ACS-NSQIP-captured SSIs with a receiver operating characteristic area under the curve of 0.827, but this value increased to 0.892 when using surgeon-identified CM-SSIs. CONCLUSIONS A Bayesian network coupled with NLP has the potential to be used in real-time SSI surveillance. Moreover, surgeons identified CM-SSI not captured under current NSQIP definitions. Future efforts to expand CM-SSI identification may lead to improved and potentially automated approaches to survey for postoperative SSI in clinical practice.
Collapse
Affiliation(s)
- Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - David W Larson
- Division of Colorectal Surgery, Department of Surgery, Mayo Clinic, Rochester, Minnesota
| | - Elizabeth B Habermann
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - James M Naessens
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Jasim Y Alabbad
- Division of Colorectal Surgery, Department of Surgery, Mayo Clinic, Rochester, Minnesota
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota.
| |
Collapse
|
32
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2017; 41:81-93. [PMID: 27859628 PMCID: PMC5154867 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Baker Institute for Animal Health, Cornell University, Ithaca, New York, United States of America
- Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
33
|
Shameer K, Badgeley MA, Miotto R, Glicksberg BS, Morgan JW, Dudley JT. Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Brief Bioinform 2017; 18:105-124. [PMID: 26876889 PMCID: PMC5221424 DOI: 10.1093/bib/bbv118] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 11/27/2015] [Indexed: 01/01/2023] Open
Abstract
Monitoring and modeling biomedical, health care and wellness data from individuals and converging data on a population scale have tremendous potential to improve understanding of the transition to the healthy state of human physiology to disease setting. Wellness monitoring devices and companion software applications capable of generating alerts and sharing data with health care providers or social networks are now available. The accessibility and clinical utility of such data for disease or wellness research are currently limited. Designing methods for streaming data capture, real-time data aggregation, machine learning, predictive analytics and visualization solutions to integrate wellness or health monitoring data elements with the electronic medical records (EMRs) maintained by health care providers permits better utilization. Integration of population-scale biomedical, health care and wellness data would help to stratify patients for active health management and to understand clinically asymptomatic patients and underlying illness trajectories. In this article, we discuss various health-monitoring devices, their ability to capture the unique state of health represented in a patient and their application in individualized diagnostics, prognosis, clinical or wellness intervention. We also discuss examples of translational bioinformatics approaches to integrating patient-generated data with existing EMRs, personal health records, patient portals and clinical data repositories. Briefly, translational bioinformatics methods, tools and resources are at the center of these advances in implementing real-time biomedical and health care analytics in the clinical setting. Furthermore, these advances are poised to play a significant role in clinical decision-making and implementation of data-driven medicine and wellness care.
Collapse
Affiliation(s)
| | - Marcus A Badgeley
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Riccardo Miotto
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Benjamin S Glicksberg
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Joseph W Morgan
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Joel T Dudley
- Harris Center for Precision Wellness, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Health Evidence and Policy, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
34
|
Safarova MS, Liu H, Kullo IJ. Rapid identification of familial hypercholesterolemia from electronic health records: The SEARCH study. J Clin Lipidol 2016; 10:1230-9. [PMID: 27678441 DOI: 10.1016/j.jacl.2016.08.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 07/28/2016] [Accepted: 08/01/2016] [Indexed: 12/16/2022]
Abstract
BACKGROUND Little is known about prevalence, awareness, and control of familial hypercholesterolemia (FH) in the United States. OBJECTIVE To address these knowledge gaps, we developed an ePhenotyping algorithm for rapid identification of FH in electronic health records (EHRs) and deployed it in the Screening Employees And Residents in the Community for Hypercholesterolemia (SEARCH) study. METHODS We queried a database of 131,000 individuals seen between 1993 and 2014 in primary care practice to identify 5992 (mean age 52 ± 13 years, 42% men) patients with low-density lipoprotein cholesterol (LDL-C) ≥190 mg/dL, triglycerides <400 mg/dL and without secondary causes of hyperlipidemia. RESULTS Our EHR-based algorithm ascertained the Dutch Lipid Clinic Network criteria for FH using structured data sets and natural language processing for family history and presence of FH stigmata on physical examination. Blinded expert review revealed positive and negative predictive values for the SEARCH algorithm at 94% and 97%, respectively. The algorithm identified 32 definite and 391 probable cases with an overall FH prevalence of 0.32% (1:310). Only 55% of the FH cases had a diagnosis code relevant to FH. Mean LDL-C at the time of FH ascertainment was 237 mg/dL; at follow-up, 70% (298 of 423) of patients were on lipid-lowering treatment with 80% achieving an LDL-C ≤100 mg/dL. Of treated FH patients with premature CHD, only 22% (48 of 221) achieved an LDL-C ≤70 mg/dL. CONCLUSIONS In a primary care setting, we found the prevalence of FH to be 1:310 with low awareness and control. Further studies are needed to assess whether automated detection of FH in EHR improves patient outcomes.
Collapse
Affiliation(s)
- Maya S Safarova
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
35
|
Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder. PLoS One 2016; 11:e0159621. [PMID: 27472449 PMCID: PMC4966969 DOI: 10.1371/journal.pone.0159621] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 07/06/2016] [Indexed: 12/25/2022] Open
Abstract
Objective Cohort selection is challenging for large-scale electronic health record (EHR) analyses, as International Classification of Diseases 9th edition (ICD-9) diagnostic codes are notoriously unreliable disease predictors. Our objective was to develop, evaluate, and validate an automated algorithm for determining an Autism Spectrum Disorder (ASD) patient cohort from EHR. We demonstrate its utility via the largest investigation to date of the co-occurrence patterns of medical comorbidities in ASD. Methods We extracted ICD-9 codes and concepts derived from the clinical notes. A gold standard patient set was labeled by clinicians at Boston Children’s Hospital (BCH) (N = 150) and Cincinnati Children’s Hospital and Medical Center (CCHMC) (N = 152). Two algorithms were created: (1) rule-based implementing the ASD criteria from Diagnostic and Statistical Manual of Mental Diseases 4th edition, (2) predictive classifier. The positive predictive values (PPV) achieved by these algorithms were compared to an ICD-9 code baseline. We clustered the patients based on grouped ICD-9 codes and evaluated subgroups. Results The rule-based algorithm produced the best PPV: (a) BCH: 0.885 vs. 0.273 (baseline); (b) CCHMC: 0.840 vs. 0.645 (baseline); (c) combined: 0.864 vs. 0.460 (baseline). A validation at Children’s Hospital of Philadelphia yielded 0.848 (PPV). Clustering analyses of comorbidities on the three-site large cohort (N = 20,658 ASD patients) identified psychiatric, developmental, and seizure disorder clusters. Conclusions In a large cross-institutional cohort, co-occurrence patterns of comorbidities in ASDs provide further hypothetical evidence for distinct courses in ASD. The proposed automated algorithms for cohort selection open avenues for other large-scale EHR studies and individualized treatment of ASD.
Collapse
|
36
|
Lingren T, Thaker V, Brady C, Namjou B, Kennebeck S, Bickel J, Patibandla N, Ni Y, Van Driest SL, Chen L, Roach A, Cobb B, Kirby J, Denny J, Bailey-Davis L, Williams MS, Marsolo K, Solti I, Holm IA, Harley J, Kohane IS, Savova G, Crimmins N. Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers. Appl Clin Inform 2016; 7:693-706. [PMID: 27452794 DOI: 10.4338/aci-2016-01-ra-0015] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 06/15/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1-5.99 years) using structured and unstructured data from the electronic health record (EHR). INTRODUCTION Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity. DATA AND METHODS Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children's Hospital (BCH) and Cincinnati Children's Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features. RESULTS Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes. CONCLUSIONS Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.
Collapse
Affiliation(s)
- Todd Lingren
- Todd Lingren, Cincinnati Children's Hospital Medical Center, Biomedical Informatics, 3333 Burnet Avenue, MLC 7024 Cincinnati, OH 45229-3039, Phone: 513-803-9032, Fax: 513-636-2056,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Griffis D, Shivade C, Fosler-Lussier E, Lai AM. A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:88-97. [PMID: 27570656 PMCID: PMC5001746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation.
Collapse
Affiliation(s)
- Denis Griffis
- Department of Computer Science and Engineering,,National Institutes of Health, Rehabilitation Medicine Department, Mark O. Hatfield Clinical Research Center, Bethesda, MD
| | | | | | - Albert M. Lai
- Department of Computer Science and Engineering,,Department of Biomedical Informatics, The Ohio State University, Columbus, OH.,National Institutes of Health, Rehabilitation Medicine Department, Mark O. Hatfield Clinical Research Center, Bethesda, MD
| |
Collapse
|
38
|
Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, Chapman WW. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 2016; 7:26. [PMID: 27175226 PMCID: PMC4863379 DOI: 10.1186/s13326-016-0065-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time. METHODS In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats (structures) and linguistic descriptions (expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText's, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes. RESULTS We observed that most carotid mentions are recorded in prose using categorical expressions, within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently. CONCLUSION We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.
Collapse
Affiliation(s)
- Danielle L. Mowery
- />Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- />IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA
| | - Brian E. Chapman
- />Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- />IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA
| | - Mike Conway
- />Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
| | - Brett R. South
- />Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- />IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA
| | - Erin Madden
- />San Francisco Veteran Affair Health Care System, San Francisco, CA USA
| | - Salomeh Keyhani
- />San Francisco Veteran Affair Health Care System, San Francisco, CA USA
| | - Wendy W. Chapman
- />Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- />IDEAS Center, Veteran Affair Health Care System, Salt Lake City, UT USA
| |
Collapse
|
39
|
Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17:129-45. [PMID: 26875678 DOI: 10.1038/nrg.2015.36] [Citation(s) in RCA: 168] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome-phenome relationship.
Collapse
|
40
|
Afzal N, Sohn S, Abram S, Liu H, Kullo IJ, Arruda-Olson AM. Identifying Peripheral Arterial Disease Cases Using Natural Language Processing of Clinical Notes. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016; 2016:126-131. [PMID: 28111640 PMCID: PMC5248569 DOI: 10.1109/bhi.2016.7455851] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Peripheral arterial disease (PAD) is a chronic disease that affects millions of people worldwide. Ascertaining PAD status from clinical notes by manual chart review is labor intensive and time consuming. In this paper, we describe a natural language processing (NLP) algorithm for automated ascertainment of PAD status from clinical notes using predetermined criteria. We developed and evaluated our system against a gold standard that was created by medical experts based on manual chart review. Our system ascertained PAD status from clinical notes with high sensitivity (0.96), positive predictive value (0.92), negative predictive value (0.99) and specificity (0.98). NLP approaches can be used for rapid, efficient and automated ascertainment of PAD cases with implications for patient care and epidemiologic research.
Collapse
Affiliation(s)
- Naveed Afzal
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | - Sara Abram
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester MN
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester MN
| | | | | |
Collapse
|
41
|
Ye Z, Austin E, Schaid DJ, Kullo IJ. A multi-locus genetic risk score for abdominal aortic aneurysm. Atherosclerosis 2016; 246:274-9. [PMID: 26820802 DOI: 10.1016/j.atherosclerosis.2015.12.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 10/02/2015] [Accepted: 12/21/2015] [Indexed: 12/20/2022]
Abstract
BACKGROUND We investigated whether a multi-locus genetic risk scores (GRS) was associated with presence and progression of abdominal aortic aneurysm (AAA) in a case - control study. METHODS AND RESULTS The study comprised of 1124 patients with AAA (74 ± 8 years, 83% men, 52% of them with a maximal AAA size ≤ 5 cm) and 6524 non-cases (67 ± 11 years, 58% men) from the Mayo Vascular Disease Biorepository. AAA was defined as infrarenal abdominal aorta diameter ≥ 3.0 cm or history of AAA repair. Non-cases were participants without known AAA. A GRS was calculated using 4 SNPs associated with AAA at genome-wide significance (P ≤ 10(-8)). The GRS was associated with the presence of AAA after adjustment for age, sex, cardiovascular risk factors, atherosclerotic cardiovascular diseases and family history of aortic aneurysm: odds ratio (OR, 95% confidence interval, CI) 1.06 (1.04-1.09, p < 0.001). Adding GRS to conventional risk factors improved the association of presence of AAA (net reclassification index 14%, p < 0.001). In a subset of patients with AAA who had ≥ 2 imaging studies (n = 651, mean (SE) growth rate 2.47 (0.11) mm/year during a mean time interval of 5.41 years), GRS, baseline size, diabetes and family history were each associated with aneurysm growth rate in univariate association (all p < 0.05). The estimated mean aneurysm growth rate was 0.50 mm/year higher in those with GRS > median (5.78) than those with GRS ≤ median (p = 0.01), after adjustment for baseline size (p < 0.001), diabetes (p = 0.046) and family history of aortic aneurysm (p = 0.02). CONCLUSIONS A multi-locus GRS was associated with presence of AAA and greater aneurysm expansion.
Collapse
Affiliation(s)
- Zi Ye
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA
| | - Erin Austin
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA; Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | - Daniel J Schaid
- Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
42
|
Zhang R, Manohar N, Arsoniadis E, Wang Y, Adam TJ, Pakhomov SV, Melton GB. Evaluating Term Coverage of Herbal and Dietary Supplements in Electronic Health Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1361-1370. [PMID: 26958277 PMCID: PMC4765597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Herbal and dietary supplement consumption has rapidly expanded in recent years. Due to pharmacological and metabolic characteristics of some supplements, they can interact with prescription medications, potentially leading to clinically important and potentially preventable adverse reactions. Electronic health record (EHR) system provides a valuable source from which drug-supplement interactions can be mined and assessed for their clinical effects. A fundamental prerequisite is a functional understanding of supplement documentation in EHR and associated supplement coverage in major online databases. To address this, clinical notes and corresponding medication lists from an integrated healthcare system were extracted and compared with online databases. Overall, about 40% of listed medications are supplements, most of which are included in medication lists as nutritional or miscellaneous products. Gaps were found between supplement and standard medication terminologies, creating documentation difficulties in fully achieving robust supplement documentation in EHR systems. In addition, in the clinical notes we identified supplements which were not mentioned in the medication lists.
Collapse
Affiliation(s)
- Rui Zhang
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN; Department of Surgery; University of Minnesota, Minneapolis, MN
| | - Nivedha Manohar
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN
| | - Elliot Arsoniadis
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN; Department of Surgery; University of Minnesota, Minneapolis, MN
| | - Yan Wang
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN
| | - Terrence J Adam
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN; College of Pharmacy; University of Minnesota, Minneapolis, MN
| | - Serguei V Pakhomov
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN; College of Pharmacy; University of Minnesota, Minneapolis, MN
| | - Genevieve B Melton
- Institute for Health Informatics; University of Minnesota, Minneapolis, MN; Department of Surgery; University of Minnesota, Minneapolis, MN
| |
Collapse
|
43
|
Han D, Wang S, Jiang C, Jiang X, Kim HE, Sun J, Ohno-Machado L. Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 2015; 22:1153-63. [PMID: 26555018 PMCID: PMC5009912 DOI: 10.1093/jamia/ocv157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/08/2015] [Accepted: 09/14/2015] [Indexed: 01/26/2023] Open
Abstract
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.
Collapse
Affiliation(s)
- Dong Han
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chao Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hyeon-Eui Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, S30313, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
44
|
Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015; 22:1220-30. [PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). METHODS A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. RESULTS We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. CONCLUSION A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.
Collapse
Affiliation(s)
- Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Todd Lingren
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Frank D Mentch
- Center for Applied Genomics, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Firas H Wehbe
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peggy L Peissig
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Gerard Tromp
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Stellenbosch, Cape Town, South Africa
| | | | - Christopher G Chute
- Division of General Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Marc S Williams
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Kenneth Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Terrie E Kitchner
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University, Nashville, TN, USA Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
45
|
Kullo IJ, Leeper NJ. The genetic basis of peripheral arterial disease: current knowledge, challenges, and future directions. Circ Res 2015; 116:1551-60. [PMID: 25908728 DOI: 10.1161/circresaha.116.303518] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Several risk factors for atherosclerotic peripheral arterial disease (PAD), such as dyslipidemia, diabetes mellitus, and hypertension, are heritable. However, predisposition to PAD may be influenced by genetic variants acting independently of these risk factors. Identification of such genetic variants will provide insights into underlying pathophysiologic mechanisms and facilitate the development of novel diagnostic and therapeutic approaches. In contrast to coronary heart disease, relatively few genetic variants that influence susceptibility to PAD have been discovered. This may be, in part, because of greater clinical and genetic heterogeneity in PAD. In this review, we (1) provide an update on the current state of knowledge about the genetic basis of PAD, including results of family studies and candidate gene, linkage as well as genome-wide association studies; (2) highlight the challenges in investigating the genetic basis of PAD and possible strategies to overcome these challenges; and (3) discuss the potential of genome sequencing, RNA sequencing, differential gene expression, epigenetic profiling, and systems biology in increasing our understanding of the molecular genetics of PAD.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- From the Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN (I.J.K.); and Department of Vascular Surgery, Stanford, Stanford, CA (N.J.L.).
| | - Nicholas J Leeper
- From the Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN (I.J.K.); and Department of Vascular Surgery, Stanford, Stanford, CA (N.J.L.)
| |
Collapse
|
46
|
Lin C, Karlson EW, Dligach D, Ramirez MP, Miller TA, Mo H, Braggs NS, Cagan A, Gainer V, Denny JC, Savova GK. Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc 2015; 22:e151-61. [PMID: 25344930 PMCID: PMC5901122 DOI: 10.1136/amiajnl-2014-002642] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 08/14/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVES To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
Collapse
Affiliation(s)
- Chen Lin
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Elizabeth W Karlson
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Dmitriy Dligach
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- *CL, EWK and DD are co-first authors
| | - Monica P Ramirez
- Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Timothy A Miller
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | - Natalie S Braggs
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Andrew Cagan
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Vivian Gainer
- Research Computing, Partners HealthCare, Boston, Massachusetts, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - Guergana K Savova
- Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
47
|
Carroll RJ, Eyler AE, Denny JC. Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis. Expert Rev Clin Immunol 2015; 11:329-37. [PMID: 25660652 DOI: 10.1586/1744666x.2015.1009895] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In the past 10 years, electronic health records (EHRs) have had growing impact in clinical care. EHRs efficiently capture and reuse clinical information, which can directly benefit patient care by guiding treatments and providing effective reminders for best practices. The increased adoption has also lead to more complex implementations, including robust, disease-specific tools, such as for rheumatoid arthritis (RA). In addition, the data collected through normal clinical care is also used in secondary research, helping to refine patient treatment for the future. Although few studies have directly demonstrated benefits for direct clinical care of RA, the opposite is true for EHR-based research - RA has been a particularly fertile ground for clinical and genomic research that have leveraged typically advanced informatics methods to accurately define RA populations. We discuss the clinical impact of EHRs in RA treatment and their impact on secondary research, and provide recommendations for improved utility in future EHR installations.
Collapse
Affiliation(s)
- Robert J Carroll
- Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | |
Collapse
|
48
|
Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, Suominen H, Chapman WW, Savova G. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc 2015; 22:143-54. [PMID: 25147248 PMCID: PMC4433360 DOI: 10.1136/amiajnl-2013-002544] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 07/16/2014] [Accepted: 07/21/2014] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners. MATERIALS AND METHODS We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text--199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance. RESULTS For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59. DISCUSSION Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources. CONCLUSIONS The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.
Collapse
Affiliation(s)
- Sameer Pradhan
- Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | | | - Amy Vogel
- Columbia University, New York, New York, USA
| | - Hanna Suominen
- NICTA, The Australian National University, and University of Canberra, Canberra, Australian Capital Territory, Australia
| | | | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
49
|
Kohane IS. An autism case history to review the systematic analysis of large-scale data to refine the diagnosis and treatment of neuropsychiatric disorders. Biol Psychiatry 2015; 77:59-65. [PMID: 25034947 PMCID: PMC4260993 DOI: 10.1016/j.biopsych.2014.05.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 05/05/2014] [Accepted: 05/22/2014] [Indexed: 01/18/2023]
Abstract
Analysis of large-scale systems of biomedical data provides a perspective on neuropsychiatric disease that may be otherwise elusive. Described here is an analysis of three large-scale systems of data from autism spectrum disorder (ASD) and of ASD research as an exemplar of what might be achieved from study of such data. First is the biomedical literature that highlights the fact that there are two very successful but quite separate research communities and findings pertaining to genetics and the molecular biology of ASD. There are those studies positing ASD causes that are related to immunological dysregulation and those related to disorders of synaptic function and neuronal connectivity. Second is the emerging use of electronic health record systems and other large clinical databases that allow the data acquired during the course of care to be used to identify distinct subpopulations, clinical trajectories, and pathophysiological substructures of ASD. These systems reveal subsets of patients with distinct clinical trajectories, some of which are immunologically related and others which follow pathologies conventionally thought of as neurological. The third is genome-wide genomic and transcriptomic analyses which show molecular pathways that overlap neurological and immunological mechanisms. The convergence of these three large-scale data perspectives illustrates the scientific leverage that large-scale data analyses can provide in guiding researchers in an approach to the diagnosis of neuropsychiatric disease that is inclusive and comprehensive.
Collapse
Affiliation(s)
- Isaac S Kohane
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts.
| |
Collapse
|
50
|
Shameer K, Klee EW, Dalenberg AK, Kullo IJ. Whole Exome Sequencing Implicates an
INO80D
Mutation in a Syndrome of Aortic Hypoplasia, Premature Atherosclerosis, and Arterial Stiffness. ACTA ACUST UNITED AC 2014; 7:607-14. [DOI: 10.1161/circgenetics.113.000233] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background—
Massively parallel, high-throughput sequencing technology is helping to generate new insights into the genetic basis of human diseases. We used whole exome sequencing to identify the mutation underlying a syndrome affecting 2 siblings with aortic hypoplasia, calcific atherosclerosis, systolic hypertension, and premature cataract.
Methods and Results—
Exonic regions were captured and sequenced using a next-generation sequencing platform to generate 100 bases paired-end reads. A computational genomic data analysis pipeline was used to perform quality control, align reads to a reference genome, and identify genetic variants; findings were confirmed using a different exome analyses pipeline. The 2 siblings were homozygous for a rare missense mutation (Ser818Cys) in
INO80D
, a subunit of the human INO80 chromatin remodeling complex. Homozygosity mapping and Sanger sequencing confirmed that the mutation is located in one of the runs of homozygosity on chromosome 2.
INO80D
encodes a key subunit of the human IN080 complex, a multiprotein complex involved in DNA binding, chromatin modification, organization of chromosome structure, and ATP-dependent nucleosome sliding. By introducing a new disulphide-bond in the protein product and also disrupting the composition of low-complexity regions, the Ser818Cys mutation may affect INO80D function, protein–protein interactions, and chromatin remodeling.
Conclusions—
Our findings suggest a link between the Ser818Cys mutation in
INO80D
, a subunit of the human INO80 chromatin remodeling complex, and accelerated arterial aging.
Collapse
Affiliation(s)
- Khader Shameer
- From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN
| | - Eric W. Klee
- From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN
| | - Angela K. Dalenberg
- From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN
| | - Iftikhar J. Kullo
- From the Division of Cardiovascular Diseases (K.S., A.K.D., I.J.K.) and Biomedical Statistics and Informatics, Departments of Health Sciences Research and Laboratory Medicine and Pathology (E.W.K.), Mayo Clinic, Rochester MN
| |
Collapse
|