251
|
Murphy SN, Herrick C, Wang Y, Wang TD, Sack D, Andriole KP, Wei J, Reynolds N, Plesniak W, Rosen BR, Pieper S, Gollub RL. High throughput tools to access images from clinical archives for research. J Digit Imaging 2016; 28:194-204. [PMID: 25316195 PMCID: PMC4359193 DOI: 10.1007/s10278-014-9733-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Historically, medical images collected in the course of clinical care have been difficult to access for secondary research studies. While there is a tremendous potential value in the large volume of studies contained in clinical image archives, Picture Archiving and Communication Systems (PACS) are designed to optimize clinical operations and workflow. Search capabilities in PACS are basic, limiting their use for population studies, and duplication of archives for research is costly. To address this need, we augment the Informatics for Integrating Biology and the Bedside (i2b2) open source software, providing investigators with the tools necessary to query and integrate medical record and clinical research data. Over 100 healthcare institutions have installed this suite of software tools that allows investigators to search medical record metadata including images for specific types of patients. In this report, we describe a new Medical Imaging Informatics Bench to Bedside (mi2b2) module (www.mi2b2.org), available now as an open source addition to the i2b2 software platform that allows medical imaging examinations collected during routine clinical care to be made available to translational investigators directly from their institution’s clinical PACS for research and educational use in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Omnibus Rule. Access governance within the mi2b2 module is customizable per institution and PACS minimizing impact on clinical systems. Currently in active use at our institutions, this new technology has already been used to facilitate access to thousands of clinical MRI brain studies representing specific patient phenotypes for use in research.
Collapse
Affiliation(s)
- Shawn N Murphy
- Research IS and Computing, Partners HealthCare, Charlestown, MA, 02129, USA,
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
252
|
Bates J, Fodeh SJ, Brandt CA, Womack JA. Classification of radiology reports for falls in an HIV study cohort. J Am Med Inform Assoc 2016; 23:e113-7. [PMID: 26567329 PMCID: PMC4954638 DOI: 10.1093/jamia/ocv155] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 08/14/2015] [Accepted: 09/08/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To identify patients in a human immunodeficiency virus (HIV) study cohort who have fallen by applying supervised machine learning methods to radiology reports of the cohort. METHODS We used the Veterans Aging Cohort Study Virtual Cohort (VACS-VC), an electronic health record-based cohort of 146 530 veterans for whom radiology reports were available (N=2 977 739). We created a reference standard of radiology reports, represented each report by a feature set of words and Unified Medical Language System concepts, and then developed several support vector machine (SVM) classifiers for falls. We compared mutual information (MI) ranking and embedded feature selection approaches. The SVM classifier with MI feature selection was chosen to classify all radiology reports in VACS-VC. RESULTS Our SVM classifier with MI feature selection achieved an area under the curve score of 97.04 on the test set. When applied to all the radiology reports in VACS-VC, 80 416 of these reports were classified as positive for a fall. Of these, 11 484 were associated with a fall-related external cause of injury code (E-code) and 68 932 were not, corresponding to 29 280 patients with potential fall-related injuries who could not have been found using E-codes. DISCUSSION Feature selection was crucial to improving the classifier's performance. Feature selection with MI allowed us to select the number of discriminative features to use for classification, in contrast to the embedded feature selection method, in which the number of features is chosen automatically. CONCLUSION Machine learning is an effective method of identifying patients who have suffered a fall. The development of this classifier supplements the clinical researcher's toolkit and reduces dependence on under-coded structured electronic health record data.
Collapse
Affiliation(s)
- Jonathan Bates
- Yale School of Medicine, New Haven, CT VA Connecticut Healthcare System, West Haven, CT
| | | | - Cynthia A Brandt
- Yale School of Medicine, New Haven, CT VA Connecticut Healthcare System, West Haven, CT
| | - Julie A Womack
- Yale School of Nursing, West Haven, CT VA Connecticut Healthcare System, West Haven, CT
| |
Collapse
|
253
|
Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol 2016; 13:350-9. [PMID: 27009423 DOI: 10.1038/nrcardio.2016.42] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The potential for big data analytics to improve cardiovascular quality of care and patient outcomes is tremendous. However, the application of big data in health care is at a nascent stage, and the evidence to date demonstrating that big data analytics will improve care and outcomes is scant. This Review provides an overview of the data sources and methods that comprise big data analytics, and describes eight areas of application of big data analytics to improve cardiovascular care, including predictive modelling for risk and resource use, population management, drug and medical device safety surveillance, disease and treatment heterogeneity, precision medicine and clinical decision support, quality of care and performance measurement, and public health and research applications. We also delineate the important challenges for big data applications in cardiovascular care, including the need for evidence of effectiveness and safety, the methodological issues such as data quality and validation, and the critical importance of clinical integration and proof of clinical utility. If big data analytics are shown to improve quality of care and patient outcomes, and can be successfully implemented in cardiovascular practice, big data will fulfil its potential as an important component of a learning health-care system.
Collapse
Affiliation(s)
- John S Rumsfeld
- University of Colorado School of Medicine, 13001 East 17th Place, Aurora, Colorado 80045, USA.,VA Eastern Colorado Health System, Cardiology (111B), 1055 Clermont Street, Denver, Colorado 80220, USA
| | - Karen E Joynt
- Brigham and Women's Hospital, 75 Francis Street, Boston, Massachusetts 02115, USA.,Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, USA
| | - Thomas M Maddox
- University of Colorado School of Medicine, 13001 East 17th Place, Aurora, Colorado 80045, USA.,VA Eastern Colorado Health System, Cardiology (111B), 1055 Clermont Street, Denver, Colorado 80220, USA
| |
Collapse
|
254
|
Shah RU, Merz CNB. Publicly Available Data: Crowd Sourcing to Identify and Reduce Disparities. J Am Coll Cardiol 2016; 66:1973-1975. [PMID: 26515999 DOI: 10.1016/j.jacc.2015.08.884] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 08/25/2015] [Indexed: 11/26/2022]
Affiliation(s)
- Rashmee U Shah
- University of Utah, Cardiovascular Medicine, Salt Lake City, Utah.
| | - C Noel Bairey Merz
- Barbra Streisand Women's Heart Center, Cedars-Sinai Heart Institute, Los Angeles, California
| |
Collapse
|
255
|
Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 939:139-166. [PMID: 27807747 DOI: 10.1007/978-981-10-1503-8_7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Collapse
|
256
|
Identifying Patients at Risk for Aortic Stenosis Through Learning from Multimodal Data. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-46726-9_28] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
257
|
Jeanquartier F, Jean-Quartier C, Kotlyar M, Tokar T, Hauschild AC, Jurisica I, Holzinger A. Machine Learning for In Silico Modeling of Tumor Growth. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-50478-0_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
258
|
SCHULER ALEJANDRO, LIU VINCENT, WAN JOE, CALLAHAN ALISON, UDELL MADELEINE, STARK DAVIDE, SHAH NIGAMH. DISCOVERING PATIENT PHENOTYPES USING GENERALIZED LOW RANK MODELS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:144-55. [PMID: 26776181 PMCID: PMC4836913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The practice of medicine is predicated on discovering commonalities or distinguishing characteristics among patients to inform corresponding treatment. Given a patient grouping (hereafter referred to as a phenotype), clinicians can implement a treatment pathway accounting for the underlying cause of disease in that phenotype. Traditionally, phenotypes have been discovered by intuition, experience in practice, and advancements in basic science, but these approaches are often heuristic, labor intensive, and can take decades to produce actionable knowledge. Although our understanding of disease has progressed substantially in the past century, there are still important domains in which our phenotypes are murky, such as in behavioral health or in hospital settings. To accelerate phenotype discovery, researchers have used machine learning to find patterns in electronic health records, but have often been thwarted by missing data, sparsity, and data heterogeneity. In this study, we use a flexible framework called Generalized Low Rank Modeling (GLRM) to overcome these barriers and discover phenotypes in two sources of patient data. First, we analyze data from the 2010 Healthcare Cost and Utilization Project National Inpatient Sample (NIS), which contains upwards of 8 million hospitalization records consisting of administrative codes and demographic information. Second, we analyze a small (N=1746), local dataset documenting the clinical progression of autism spectrum disorder patients using granular features from the electronic health record, including text from physician notes. We demonstrate that low rank modeling successfully captures known and putative phenotypes in these vastly different datasets.
Collapse
Affiliation(s)
- ALEJANDRO SCHULER
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road Stanford, CA, 94305. USA
| | - VINCENT LIU
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road Stanford, CA, 94305. USA
| | - JOE WAN
- Computer Science, Stanford University, 353 Serra Mall Stanford, CA, 94305. USA
| | - ALISON CALLAHAN
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road Stanford, CA, 94305. USA
| | - MADELEINE UDELL
- Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125. USA
| | - DAVID E. STARK
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road Stanford, CA, 94305. USA
| | - NIGAM H. SHAH
- Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road Stanford, CA, 94305. USA
| |
Collapse
|
259
|
Klann JG, Phillips LC, Turchin A, Weiler S, Mandl KD, Murphy SN. A numerical similarity approach for using retired Current Procedural Terminology (CPT) codes for electronic phenotyping in the Scalable Collaborative Infrastructure for a Learning Health System (SCILHS). BMC Med Inform Decis Mak 2015; 15:104. [PMID: 26655696 PMCID: PMC4676189 DOI: 10.1186/s12911-015-0223-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 11/25/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Interoperable phenotyping algorithms, needed to identify patient cohorts meeting eligibility criteria for observational studies or clinical trials, require medical data in a consistent structured, coded format. Data heterogeneity limits such algorithms' applicability. Existing approaches are often: not widely interoperable; or, have low sensitivity due to reliance on the lowest common denominator (ICD-9 diagnoses). In the Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) we endeavor to use the widely-available Current Procedural Terminology (CPT) procedure codes with ICD-9. Unfortunately, CPT changes drastically year-to-year - codes are retired/replaced. Longitudinal analysis requires grouping retired and current codes. BioPortal provides a navigable CPT hierarchy, which we imported into the Informatics for Integrating Biology and the Bedside (i2b2) data warehouse and analytics platform. However, this hierarchy does not include retired codes. METHODS We compared BioPortal's 2014AA CPT hierarchy with Partners Healthcare's SCILHS datamart, comprising three-million patients' data over 15 years. 573 CPT codes were not present in 2014AA (6.5 million occurrences). No existing terminology provided hierarchical linkages for these missing codes, so we developed a method that automatically places missing codes in the most specific "grouper" category, using the numerical similarity of CPT codes. Two informaticians reviewed the results. We incorporated the final table into our i2b2 SCILHS/PCORnet ontology, deployed it at seven sites, and performed a gap analysis and an evaluation against several phenotyping algorithms. RESULTS The reviewers found the method placed the code correctly with 97 % precision when considering only miscategorizations ("correctness precision") and 52 % precision using a gold-standard of optimal placement ("optimality precision"). High correctness precision meant that codes were placed in a reasonable hierarchal position that a reviewer can quickly validate. Lower optimality precision meant that codes were not often placed in the optimal hierarchical subfolder. The seven sites encountered few occurrences of codes outside our ontology, 93 % of which comprised just four codes. Our hierarchical approach correctly grouped retired and non-retired codes in most cases and extended the temporal reach of several important phenotyping algorithms. CONCLUSIONS We developed a simple, easily-validated, automated method to place retired CPT codes into the BioPortal CPT hierarchy. This complements existing hierarchical terminologies, which do not include retired codes. The approach's utility is confirmed by the high correctness precision and successful grouping of retired with non-retired codes.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Harvard Medical School, Boston, MA, USA. .,Partners Healthcare, Boston, MA, USA. .,Massachusetts General Hospital, Boston, MA, USA.
| | | | - Alexander Turchin
- Harvard Medical School, Boston, MA, USA.,Partners Healthcare, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA.,Harvard Clinical Research Institute, Boston, MA, USA
| | | | - Kenneth D Mandl
- Harvard Medical School, Boston, MA, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA, USA.,Partners Healthcare, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
260
|
Low YS, Gallego B, Shah NH. Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records. J Comp Eff Res 2015; 5:179-92. [PMID: 26634383 PMCID: PMC4933592 DOI: 10.2217/cer.15.53] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Aims: Electronic health records (EHR), containing rich clinical histories of large patient populations, can provide evidence for clinical decisions when evidence from trials and literature is absent. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. This study compares the performance of 18 automatic confounder control methods. Methods: Methods include propensity scores, direct adjustment by machine learning, similarity matching and resampling in two simulated and one real-world EHR datasets. Results & conclusions: Direct adjustment by lasso regression and ensemble models involving multiple resamples have performance comparable to expert-based propensity scores and thus, may help provide real-time EHR-based evidence for timely clinical decisions.
Collapse
Affiliation(s)
- Yen Sia Low
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| | - Blanca Gallego
- Center for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Nigam Haresh Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
261
|
Kotfila C, Uzuner Ö. A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases. J Biomed Inform 2015; 58 Suppl:S92-S102. [PMID: 26241355 PMCID: PMC4994187 DOI: 10.1016/j.jbi.2015.07.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Revised: 07/20/2015] [Accepted: 07/22/2015] [Indexed: 12/28/2022]
Abstract
Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.
Collapse
Affiliation(s)
- Christopher Kotfila
- Informatics Department, University at Albany, State University of New York, Albany, NY, USA.
| | - Özlem Uzuner
- Department of Information Studies, University at Albany, State University of New York, NY, USA
| |
Collapse
|
262
|
Escudié JB, Jannot AS, Zapletal E, Cohen S, Malamut G, Burgun A, Rance B. Reviewing 741 patients records in two hours with FASTVISU. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:553-559. [PMID: 26958189 PMCID: PMC4765586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The secondary use of electronic health records opens up new perspectives. They provide researchers with structured data and unstructured data, including free text reports. Many applications been developed to leverage knowledge from free-text reports, but manual review of documents is still a complex process. We developed FASTVISU a web-based application to assist clinicians in reviewing documents. We used FASTVISU to review a set of 6340 documents from 741 patients suffering from the celiac disease. A first automated selection pruned the original set to 847 documents from 276 patients' records. The records were reviewed by two trained physicians to identify the presence of 15 auto-immune diseases. It took respectively two hours and two hours and a half to evaluate the entire corpus. Inter-annotator agreement was high (Cohen's kappa at 0.89). FASTVISU is a user-friendly modular solution to validate entities extracted by NLP methods from free-text documents stored in clinical data warehouses.
Collapse
Affiliation(s)
- Jean-Baptiste Escudié
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Anne-Sophie Jannot
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Eric Zapletal
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France
| | - Sarah Cohen
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Georgia Malamut
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France
| | - Anita Burgun
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Bastien Rance
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| |
Collapse
|
263
|
Dligach D, Miller T, Savova GK. Semi-supervised Learning for Phenotyping Tasks. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:502-511. [PMID: 26958183 PMCID: PMC4765699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.
Collapse
Affiliation(s)
- Dmitriy Dligach
- Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Timothy Miller
- Boston Children's Hospital and Harvard Medical School, Boston, MA
| | | |
Collapse
|
264
|
Han D, Wang S, Jiang C, Jiang X, Kim HE, Sun J, Ohno-Machado L. Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 2015; 22:1153-63. [PMID: 26555018 PMCID: PMC5009912 DOI: 10.1093/jamia/ocv157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/08/2015] [Accepted: 09/14/2015] [Indexed: 01/26/2023] Open
Abstract
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.
Collapse
Affiliation(s)
- Dong Han
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chao Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hyeon-Eui Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, S30313, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
265
|
Cai X, Perez-Concha O, Coiera E, Martin-Sanchez F, Day R, Roffe D, Gallego B. Real-time prediction of mortality, readmission, and length of stay using electronic health record data. J Am Med Inform Assoc 2015; 23:553-61. [PMID: 26374704 DOI: 10.1093/jamia/ocv110] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To develop a predictive model for real-time predictions of length of stay, mortality, and readmission for hospitalized patients using electronic health records (EHRs). MATERIALS AND METHODS A Bayesian Network model was built to estimate the probability of a hospitalized patient being "at home," in the hospital, or dead for each of the next 7 days. The network utilizes patient-specific administrative and laboratory data and is updated each time a new pathology test result becomes available. Electronic health records from 32 634 patients admitted to a Sydney metropolitan hospital via the emergency department from July 2008 through December 2011 were used. The model was tested on 2011 data and trained on the data of earlier years. RESULTS The model achieved an average daily accuracy of 80% and area under the receiving operating characteristic curve (AUROC) of 0.82. The model's predictive ability was highest within 24 hours from prediction (AUROC = 0.83) and decreased slightly with time. Death was the most predictable outcome with a daily average accuracy of 93% and AUROC of 0.84. DISCUSSION We developed the first non-disease-specific model that simultaneously predicts remaining days of hospitalization, death, and readmission as part of the same outcome. By providing a future daily probability for each outcome class, we enable the visualization of future patient trajectories. Among these, it is possible to identify trajectories indicating expected discharge, expected continuing hospitalization, expected death, and possible readmission. CONCLUSIONS Bayesian Networks can model EHRs to provide real-time forecasts for patient outcomes, which provide richer information than traditional independent point predictions of length of stay, death, or readmission, and can thus better support decision making.
Collapse
Affiliation(s)
- Xiongcai Cai
- School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia
| | - Oscar Perez-Concha
- Centre of Health Informatics, AIHI, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre of Health Informatics, AIHI, Macquarie University, Sydney, Australia
| | | | - Richard Day
- School of Medical Sciences, The University of New South Wales, Sydney, Australia
| | - David Roffe
- Information Technology Service Centre, St Vincent's Hospital, Sydney, Australia
| | - Blanca Gallego
- Centre of Health Informatics, AIHI, Macquarie University, Sydney, Australia
| |
Collapse
|
266
|
Chen Q, Li H, Tang B, Wang X, Liu X, Liu Z, Liu S, Wang W, Deng Q, Zhu S, Chen Y, Wang J. An automatic system to identify heart disease risk factors in clinical texts over time. J Biomed Inform 2015; 58 Suppl:S158-S163. [PMID: 26362344 DOI: 10.1016/j.jbi.2015.09.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 08/22/2015] [Accepted: 09/01/2015] [Indexed: 02/04/2023]
Abstract
Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.
Collapse
Affiliation(s)
- Qingcai Chen
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Haodi Li
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Buzhou Tang
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Xiaolong Wang
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Xin Liu
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Zengjian Liu
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Shu Liu
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Weida Wang
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China.
| | - Qiwen Deng
- The Sixth People's Hospital of Shenzhen, Shenzhen 518052, China.
| | - Suisong Zhu
- The Sixth People's Hospital of Shenzhen, Shenzhen 518052, China.
| | - Yangxin Chen
- Department of Cardiology, Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangzhou 510120, China.
| | - Jingfeng Wang
- Department of Cardiology, Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangzhou 510120, China.
| |
Collapse
|
267
|
Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015; 22:1220-30. [PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). METHODS A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. RESULTS We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. CONCLUSION A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.
Collapse
Affiliation(s)
- Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Todd Lingren
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Frank D Mentch
- Center for Applied Genomics, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Firas H Wehbe
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peggy L Peissig
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Gerard Tromp
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Stellenbosch, Cape Town, South Africa
| | | | - Christopher G Chute
- Division of General Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Marc S Williams
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Kenneth Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Terrie E Kitchner
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University, Nashville, TN, USA Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
268
|
Wei WQ, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc 2015; 23:e20-7. [PMID: 26338219 DOI: 10.1093/jamia/ocv130] [Citation(s) in RCA: 126] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 07/15/2015] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications. MATERIALS AND METHODS We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimer's disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinson's disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, andF-score for each EHR component alone and in combination. RESULTS The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean ± standard deviation: 0.91 ± 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 ± 0.16). Overall, the best performance (Fscore: 0.70 ± 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 ± 0.14) was only slightly lower than using two or more components, its PPV (0.71 ± 0.13) is substantially worse (0.91 ± 0.08). CONCLUSION Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Pedro L Teixeira
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jeremy L Warner
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
269
|
Daniel C, Choquet R. Information Technology for Clinical, Translational and Comparative Effectiveness Research. Findings from the Yearbook 2015 Section on Clinical Research Informatics. Yearb Med Inform 2015; 10:178-82. [PMID: 26293866 DOI: 10.15265/iy-2015-030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and clinical care. METHOD We provide a synopsis of the articles selected for the IMIA Yearbook 2015, from which we attempt to derive a synthetic overview of current and future activities in the field. As last year, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor has evaluated separately the set of 1,594 articles and the evaluation results were merged for retaining 15 articles for peer-review. RESULTS The selection and evaluation process of this Yearbook's section on Bioinformatics and Translational Informatics yielded four excellent articles regarding data management and genome medicine that are mainly tool-based papers. In the first article, the authors present PPISURV a tool for uncovering the role of specific genes in cancer survival outcome. The second article describes the classifier PredictSNP which combines six performing tools for predicting disease-related mutations. In the third article, by presenting a high-coverage map of the human proteome using high resolution mass spectrometry, the authors highlight the need for using mass spectrometry to complement genome annotation. The fourth article is also related to patient survival and decision support. The authors present datamining methods of large-scale datasets of past transplants. The objective is to identify chances of survival. CONCLUSIONS The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care. Indeed, there is a need for powerful tools for managing and interpreting complex, large-scale genomic and biological datasets, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts contribute to the challenge of impacting clinically the obtained results towards a personalized medicine.
Collapse
Affiliation(s)
- C Daniel
- Christel Daniel, MD, PhD, INSERM UMRS 1142, CCS Patient - Assistance Publique - Hôpitaux de Paris, 05 rue Santerre - 75 012 PARIS, France, Tel: +33 1 48 04 20 29, E-mail:
| | | |
Collapse
|
270
|
Velupillai S, Mowery D, South BR, Kvist M, Dalianis H. Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis. Yearb Med Inform 2015; 10:183-93. [PMID: 26293867 PMCID: PMC4587060 DOI: 10.15265/iy-2015-009] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVES We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. METHODS We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. RESULTS Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. CONCLUSIONS There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.
Collapse
Affiliation(s)
- S Velupillai
- Sumithra Velupillai, Department of Computer and Systems Sciences, Stockholm University, Postbox 7003, 164 07 Kista, Sweden, Tel: +46 8 161 174, Fax: +46 8 703 9025, E-mail:
| | | | | | | | | |
Collapse
|
271
|
Xu J, Rasmussen LV, Shaw PL, Jiang G, Kiefer RC, Mo H, Pacheco JA, Speltz P, Zhu Q, Denny JC, Pathak J, Thompson WK, Montague E. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc 2015. [PMID: 26224336 DOI: 10.1093/jamia/ocv070] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.
Collapse
Affiliation(s)
- Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Pamela L Shaw
- Galter Health Science Library, Clinical and Translational Sciences Institute (NUCATS), Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huan Mo
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peter Speltz
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County (UMBC), Baltimore, MD, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University Health System, Evanston, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
272
|
Krasowski MD, Schriever A, Mathur G, Blau JL, Stauffer SL, Ford BA. Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research. J Pathol Inform 2015; 6:45. [PMID: 26284156 PMCID: PMC4530506 DOI: 10.4103/2153-3539.161615] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 05/22/2015] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Pathology data contained within the electronic health record (EHR), and laboratory information system (LIS) of hospitals represents a potentially powerful resource to improve clinical care. However, existing reporting tools within commercial EHR and LIS software may not be able to efficiently and rapidly mine data for quality improvement and research applications. MATERIALS AND METHODS We present experience using a data warehouse produced collaboratively between an academic medical center and a private company. The data warehouse contains data from the EHR, LIS, admission/discharge/transfer system, and billing records and can be accessed using a self-service data access tool known as Starmaker. The Starmaker software allows users to use complex Boolean logic, include and exclude rules, unit conversion and reference scaling, and value aggregation using a straightforward visual interface. More complex queries can be achieved by users with experience with Structured Query Language. Queries can use biomedical ontologies such as Logical Observation Identifiers Names and Codes and Systematized Nomenclature of Medicine. RESULT We present examples of successful searches using Starmaker, falling mostly in the realm of microbiology and clinical chemistry/toxicology. The searches were ones that were either very difficult or basically infeasible using reporting tools within the EHR and LIS used in the medical center. One of the main strengths of Starmaker searches is rapid results, with typical searches covering 5 years taking only 1-2 min. A "Run Count" feature quickly outputs the number of cases meeting criteria, allowing for refinement of searches before downloading patient-identifiable data. The Starmaker tool is available to pathology residents and fellows, with some using this tool for quality improvement and scholarly projects. CONCLUSION A data warehouse has significant potential for improving utilization of clinical pathology testing. Software that can access data warehouse using a straightforward visual interface can be incorporated into pathology training programs.
Collapse
Affiliation(s)
- Matthew D Krasowski
- Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | | | - Gagan Mathur
- Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - John L Blau
- Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - Stephanie L Stauffer
- Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - Bradley A Ford
- Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| |
Collapse
|
273
|
Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform 2015; 57:28-37. [PMID: 26187250 DOI: 10.1016/j.jbi.2015.07.010] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Revised: 06/18/2015] [Accepted: 07/08/2015] [Indexed: 01/06/2023]
Abstract
BACKGROUND Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. METHODS We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. RESULTS We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. DISCUSSION We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. CONCLUSION Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.).
Collapse
Affiliation(s)
- Robert Leaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
| | - Ritu Khare
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States.
| |
Collapse
|
274
|
Gallego B, Walter SR, Day RO, Dunn AG, Sivaraman V, Shah N, Longhurst CA, Coiera E. Bringing cohort studies to the bedside: framework for a ‘green button’ to support clinical decision-making. J Comp Eff Res 2015; 4:191-197. [DOI: 10.2217/cer.15.12] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
When providing care, clinicians are expected to take note of clinical practice guidelines, which offer recommendations based on the available evidence. However, guidelines may not apply to individual patients with comorbidities, as they are typically excluded from clinical trials. Guidelines also tend not to provide relevant evidence on risks, secondary effects and long-term outcomes. Querying the electronic health records of similar patients may for many provide an alternate source of evidence to inform decision-making. It is important to develop methods to support these personalized observational studies at the point-of-care, to understand when these methods may provide valid results, and to validate and integrate these findings with those from clinical trials.
Collapse
Affiliation(s)
- Blanca Gallego
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| | - Scott R Walter
- Centre for Health Systems & Safety Research, Australian Institute of Health Innovation, Macquarie University, Australia
| | - Richard O Day
- St Vincent's Clinical School, University of New South Wales, St Vincent's Hospital, Sydney, Australia
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| | - Vijay Sivaraman
- Electrical Engineering & Telecommunications, University of New South Wales, Sydney, Australia
| | - Nigam Shah
- Biomedical Informatics Research, Stanford School of Medicine, CA 94305-5479, USA
| | | | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| |
Collapse
|
275
|
Rasmussen LV, Kiefer RC, Mo H, Speltz P, Thompson WK, Jiang G, Pacheco JA, Xu J, Zhu Q, Denny JC, Montague E, Pathak J. A Modular Architecture for Electronic Health Record-Driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:147-51. [PMID: 26306258 PMCID: PMC4525215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Increasing interest in and experience with electronic health record (EHR)-driven phenotyping has yielded multiple challenges that are at present only partially addressed. Many solutions require the adoption of a single software platform, often with an additional cost of mapping existing patient and phenotypic data to multiple representations. We propose a set of guiding design principles and a modular software architecture to bridge the gap to a standardized phenotype representation, dissemination and execution. Ongoing development leveraging this proposed architecture has shown its ability to address existing limitations.
Collapse
Affiliation(s)
| | | | - Huan Mo
- Vanderbilt University, Nashville, TN
| | | | | | | | | | - Jie Xu
- Northwestern University, Chicago, IL
| | - Qian Zhu
- University of Maryland Baltimore County, Baltimore, MD
| | | | | | | |
Collapse
|
276
|
Wagholikar KB, MacLaughlin KL, Chute CG, Greenes RA, Liu H, Chaudhry R. Granular Quality Reporting for Cervical Cytology Testing. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:178-82. [PMID: 26306264 PMCID: PMC4525216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Quality reporting for cervical cancer prevention is focused on patients with normal cervical cytology, and excludes patients with cytological abnormalities that may be at higher risk. The major obstacles for granular reporting are the complexity of surveillance guidelines and free-text data. We performed automated chart review to compare the cytology testing rates for patients with 'atypical squamous cells of undetermined significance' (ASCUS) cytology, with the rates for patients with normal cytology. We modeled the surveillance guidelines, and extracted information from free-text cytology reports, to perform this study on 28101 female patients. Our results show that patients with ASCUS cytology had significantly higher adherence rates (94.9%) than those for patients with normal cytology (90.4%). Overall our study indicates that the quality of care varies significantly between the high and average risk patients. Our study demonstrates the use of health information technology for higher granularity of reporting for cervical cytology testing.
Collapse
Affiliation(s)
- Kavishwar B. Wagholikar
- Biomedical Statistics and Informatics, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| | - Kathy L. MacLaughlin
- Family Medicine, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| | - Christopher G. Chute
- Biomedical Statistics and Informatics, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| | - Robert A. Greenes
- Biomedical Informatics, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| | - Hongfang Liu
- Biomedical Statistics and Informatics, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| | - Rajeev Chaudhry
- Primary Care Internal Medicine, Mayo Clinic Rochester, Arizona State University and Health Science Research, Mayo Clinic Scottsdale
| |
Collapse
|
277
|
Yahi A, Tatonetti NP. A knowledge-based, automated method for phenotyping in the EHR using only clinical pathology reports. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:64-8. [PMID: 26306239 PMCID: PMC4525265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The secondary use of electronic health records (EHR) represents unprecedented opportunities for biomedical discovery. Central to this goal is, EHR-phenotyping, also known as cohort identification, which remains a significant challenge. Complex phenotypes often require multivariate and multi-scale analyses, ultimately leading to manually created phenotype definitions. We present Ontology-driven Reports-based Phenotyping from Unique Signatures (ORPheUS), an automated approach to EHR-phenotyping. To do this we identify unique signatures of abnormal clinical pathology reports that correspond to pre-defined medical terms from biomedical ontologies. By using only the clinical pathology, or "lab", reports we are able to mitigate clinical biases enabling researchers to explore other dimensions of the EHR. We used ORPheUS to generate signatures for 858 diseases and validated against reference cohorts for Type 2 Diabetes Mellitus (T2DM) and Atrial Fibrillation (AF). Our results suggest that our approach, using solely clinical pathology reports, is an effective as a primary screening tool for automated clinical phenotyping.
Collapse
Affiliation(s)
- Alexandre Yahi
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Columbia University, New York, NY, USA
| | - Nicholas P. Tatonetti
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Columbia University, New York, NY, USA
| |
Collapse
|
278
|
Halpern Y, Choi Y, Horng S, Sontag D. Using Anchors to Estimate Clinical State without Labeled Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:606-615. [PMID: 25954366 PMCID: PMC4419996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We present a novel framework for learning to estimate and predict clinical state variables without labeled data. The resulting models can used for electronic phenotyping, triggering clinical decision support, and cohort selection. The framework relies on key observations which we characterize and term "anchor variables". By specifying anchor variables, an expert encodes a certain amount of domain knowledge about the problem while the rest of learning proceeds in an unsupervised manner. The ability to build anchors upon standardized ontologies and the framework's ability to learn from unlabeled data promote generalizability across institutions. We additionally develop a user interface to enable experts to choose anchor variables in an informed manner. The framework is applied to electronic medical record-based phenotyping to enable real-time decision support in the emergency department. We validate the learned models using a prospectively gathered set of gold-standard responses from emergency physicians for nine clinically relevant variables.
Collapse
Affiliation(s)
| | | | - Steven Horng
- Beth Israel Deaconess Medical Center, Boston, MA
| | | |
Collapse
|
279
|
Nadkarni GN, Gottesman O, Linneman JG, Chase H, Berg RL, Farouk S, Nadukuru R, Lotay V, Ellis S, Hripcsak G, Peissig P, Weng C, Bottinger EP. Development and validation of an electronic phenotyping algorithm for chronic kidney disease. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:907-916. [PMID: 25954398 PMCID: PMC4419875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Twenty-six million Americans are estimated to have chronic kidney disease (CKD) with increased risk for cardiovascular disease and end stage renal disease. CKD is frequently undiagnosed and patients are unaware, hampering intervention. A tool for accurate and timely identification of CKD from electronic medical records (EMR) could improve healthcare quality and identify patients for research. As members of eMERGE (electronic medical records and genomics) Network, we developed an automated phenotyping algorithm that can be deployed to identify rapidly diabetic and/or hypertensive CKD cases and controls in health systems with EMRs It uses diagnostic codes, laboratory results, medication and blood pressure records, and textual information culled from notes. Validation statistics demonstrated positive predictive values of 96% and negative predictive values of 93.3. Similar results were obtained on implementation by two independent eMERGE member institutions. The algorithm dramatically outperformed identification by ICD-9-CM codes with 63% positive and 54% negative predictive values, respectively.
Collapse
Affiliation(s)
| | | | | | - Herbert Chase
- Marshfield Clinic Research Foundation, Marshfield, WI
| | | | - Samira Farouk
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | | | - Vaneet Lotay
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | - Steve Ellis
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | | | - Peggy Peissig
- Marshfield Clinic Research Foundation, Marshfield, WI
| | - Chunhua Weng
- Columbia University Medical Center, New York, NY
| | | |
Collapse
|
280
|
Jackson MSc RG, Ball M, Patel R, Hayes RD, Dobson RJB, Stewart R. TextHunter--A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:729-38. [PMID: 25954379 PMCID: PMC4420012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises both increased sample size and data richness - therefore unprecedented study power. However, in many medical domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data, construction of concept extraction machine learning models and their application to documents. Using confidence thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use cases.
Collapse
Affiliation(s)
| | - Michael Ball
- King's College London (Institute of Psychiatry), London, UK
| | - Rashmi Patel
- King's College London (Institute of Psychiatry), London, UK
| | | | | | - Robert Stewart
- King's College London (Institute of Psychiatry), London, UK
| |
Collapse
|
281
|
Soguero-Ruiz C, Hindberg K, Rojo-Alvarez JL, Skrovseth SO, Godtliebsen F, Mortensen K, Revhaug A, Lindsetmo RO, Augestad KM, Jenssen R. Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records. IEEE J Biomed Health Inform 2014; 20:1404-15. [PMID: 25312965 DOI: 10.1109/jbhi.2014.2361688] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.
Collapse
|
282
|
Hansen MM, Miron-Shatz T, Lau AYS, Paton C. Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives. Contribution of the IMIA Social Media Working Group. Yearb Med Inform 2014; 9:21-6. [PMID: 25123717 DOI: 10.15265/iy-2014-0004] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
OBJECTIVES As technology continues to evolve and rise in various industries, such as healthcare, science, education, and gaming, a sophisticated concept known as Big Data is surfacing. The concept of analytics aims to understand data. We set out to portray and discuss perspectives of the evolving use of Big Data in science and healthcare and, to examine some of the opportunities and challenges. METHODS A literature review was conducted to highlight the implications associated with the use of Big Data in scientific research and healthcare innovations, both on a large and small scale. RESULTS Scientists and health-care providers may learn from one another when it comes to understanding the value of Big Data and analytics. Small data, derived by patients and consumers, also requires analytics to become actionable. Connectivism provides a framework for the use of Big Data and analytics in the areas of science and healthcare. This theory assists individuals to recognize and synthesize how human connections are driving the increase in data. Despite the volume and velocity of Big Data, it is truly about technology connecting humans and assisting them to construct knowledge in new ways. Concluding Thoughts: The concept of Big Data and associated analytics are to be taken seriously when approaching the use of vast volumes of both structured and unstructured data in science and health-care. Future exploration of issues surrounding data privacy, confidentiality, and education are needed. A greater focus on data from social media, the quantified self-movement, and the application of analytics to "small data" would also be useful.
Collapse
Affiliation(s)
- M M Hansen
- Margaret Hansen, School of Nursing and Health Professions, University of San Francisco, San Francisco, California, USA, E-mail:
| | | | | | | |
Collapse
|
283
|
Richesson RL, Horvath MM, Rusincovitch SA. Clinical research informatics and electronic health record data. Yearb Med Inform 2014; 9:215-23. [PMID: 25123746 DOI: 10.15265/iy-2014-0009] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES The goal of this survey is to discuss the impact of the growing availability of electronic health record (EHR) data on the evolving field of Clinical Research Informatics (CRI), which is the union of biomedical research and informatics. RESULTS Major challenges for the use of EHR-derived data for research include the lack of standard methods for ensuring that data quality, completeness, and provenance are sufficient to assess the appropriateness of its use for research. Areas that need continued emphasis include methods for integrating data from heterogeneous sources, guidelines (including explicit phenotype definitions) for using these data in both pragmatic clinical trials and observational investigations, strong data governance to better understand and control quality of enterprise data, and promotion of national standards for representing and using clinical data. CONCLUSIONS The use of EHR data has become a priority in CRI. Awareness of underlying clinical data collection processes will be essential in order to leverage these data for clinical research and patient care, and will require multi-disciplinary teams representing clinical research, informatics, and healthcare operations. Considerations for the use of EHR data provide a starting point for practical applications and a CRI research agenda, which will be facilitated by CRI's key role in the infrastructure of a learning healthcare system.
Collapse
Affiliation(s)
- R L Richesson
- Rachel Richesson, PhD, MPH, Duke University School of Nursing, 2007 Pearson Bldg, 311 Trent Drive, Durham, NC, 27710, USA, Tel: +1 (919) 681-0825, E-mai:
| | | | | |
Collapse
|
284
|
Incorporating patient-reported outcome measures into the electronic health record for research: application using the Patient Health Questionnaire (PHQ-9). Qual Life Res 2014; 24:295-303. [DOI: 10.1007/s11136-014-0764-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2014] [Indexed: 01/28/2023]
|
285
|
Rasmussen LV. The electronic health record for translational research. J Cardiovasc Transl Res 2014; 7:607-14. [PMID: 25070682 DOI: 10.1007/s12265-014-9579-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 07/15/2014] [Indexed: 02/02/2023]
Abstract
With growing adoption and use, the electronic health record (EHR) represents a rich source of clinical data that also offers many benefits for secondary use in biomedical research. Such benefits include access to a more comprehensive medical history, cost reductions, and increased efficiency in conducting research, as well as opportunities to evaluate new and expanded populations for sufficient statistical power. Existing work utilizing EHR data has uncovered some complexities and considerations for their use but, more importantly, has also generated practical lessons and solutions. Given an understanding of EHR data use in cardiovascular research, expanded adoption of this data source offers great potential to further transform the research landscape.
Collapse
Affiliation(s)
- Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA,
| |
Collapse
|
286
|
Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014; 52:199-211. [PMID: 25038555 DOI: 10.1016/j.jbi.2014.07.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 05/14/2014] [Accepted: 07/02/2014] [Indexed: 12/22/2022]
Abstract
The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. Furthermore, existing approaches are often disease-centric and specialized to the idiosyncrasies of the information technology and/or business practices of a single healthcare organization. In this paper, we propose Limestone, a nonnegative tensor factorization method to derive phenotype candidates with virtually no human supervision. Limestone represents the data source interactions naturally using tensors (a generalization of matrices). In particular, we investigate the interaction of diagnoses and medications among patients. The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and medications. Using the proposed method, multiple phenotypes can be identified simultaneously from data. We demonstrate the capability of Limestone on a cohort of 31,815 patient records from the Geisinger Health System. The dataset spans 7years of longitudinal patient records and was initially constructed for a heart failure onset prediction study. Our experiments demonstrate the robustness, stability, and the conciseness of Limestone-derived phenotypes. Our results show that using only 40 phenotypes, we can outperform the original 640 features (169 diagnosis categories and 471 medication types) to achieve an area under the receiver operator characteristic curve (AUC) of 0.720 (95% CI 0.715 to 0.725). Moreover, in consultation with a medical expert, we confirmed 82% of the top 50 candidates automatically extracted by Limestone are clinically meaningful.
Collapse
Affiliation(s)
- Joyce C Ho
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, United States.
| | - Joydeep Ghosh
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, United States
| | - Steve R Steinhubl
- Scripps Translational Science Institute, Scripps Health, La Jolla, CA 92037, United States
| | - Walter F Stewart
- Sutter Health Research, Development, and Dissemination Team, Sutter Health, Walnut Creek, CA 94598, United States
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, United States; Department of Medicine, Vanderbilt University, Nashville, TN 37232, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, United States; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37232, United States
| | - Jimeng Sun
- School of Computational Science and Engineering at College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
287
|
Köpcke F, Prokosch HU. Employing computers for the recruitment into clinical trials: a comprehensive systematic review. J Med Internet Res 2014; 16:e161. [PMID: 24985568 PMCID: PMC4128959 DOI: 10.2196/jmir.3446] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 05/15/2014] [Accepted: 05/31/2014] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Medical progress depends on the evaluation of new diagnostic and therapeutic interventions within clinical trials. Clinical trial recruitment support systems (CTRSS) aim to improve the recruitment process in terms of effectiveness and efficiency. OBJECTIVE The goals were to (1) create an overview of all CTRSS reported until the end of 2013, (2) find and describe similarities in design, (3) theorize on the reasons for different approaches, and (4) examine whether projects were able to illustrate the impact of CTRSS. METHODS We searched PubMed titles, abstracts, and keywords for terms related to CTRSS research. Query results were classified according to clinical context, workflow integration, knowledge and data sources, reasoning algorithm, and outcome. RESULTS A total of 101 papers on 79 different systems were found. Most lacked details in one or more categories. There were 3 different CTRSS that dominated: (1) systems for the retrospective identification of trial participants based on existing clinical data, typically through Structured Query Language (SQL) queries on relational databases, (2) systems that monitored the appearance of a key event of an existing health information technology component in which the occurrence of the event caused a comprehensive eligibility test for a patient or was directly communicated to the researcher, and (3) independent systems that required a user to enter patient data into an interface to trigger an eligibility assessment. Although the treating physician was required to act for the patient in older systems, it is now becoming increasingly popular to offer this possibility directly to the patient. CONCLUSIONS Many CTRSS are designed to fit the existing infrastructure of a clinical care provider or the particularities of a trial. We conclude that the success of a CTRSS depends more on its successful workflow integration than on sophisticated reasoning and data processing algorithms. Furthermore, some of the most recent literature suggest that an increase in recruited patients and improvements in recruitment efficiency can be expected, although the former will depend on the error rate of the recruitment process being replaced. Finally, to increase the quality of future CTRSS reports, we propose a checklist of items that should be included.
Collapse
Affiliation(s)
- Felix Köpcke
- Center for Information and Communication, University Hospital Erlangen, Erlangen, Germany
| | | |
Collapse
|