Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Denny JC, Miller RA, Waitman LR, Arrieta MA, Peterson JF. Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor. Int J Med Inform 2009;78 Suppl 1:S34-42. [PMID: 18938105 DOI: 10.1016/j.ijmedinf.2008.09.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Revised: 07/18/2008] [Accepted: 09/02/2008] [Indexed: 11/20/2022]

For:	Denny JC, Miller RA, Waitman LR, Arrieta MA, Peterson JF. Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor. Int J Med Inform 2009;78 Suppl 1:S34-42. [PMID: 18938105 DOI: 10.1016/j.ijmedinf.2008.09.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Revised: 07/18/2008] [Accepted: 09/02/2008] [Indexed: 11/20/2022]

Number

Cited by Other Article(s)

van Assen M, Tariq A, Razavi AC, Yang C, Banerjee I, De Cecco CN. Fusion Modeling: Combining Clinical and Imaging Data to Advance Cardiac Care. Circ Cardiovasc Imaging 2023;16:e014533. [PMID: 38073535 PMCID: PMC10754220 DOI: 10.1161/circimaging.122.014533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]

Cai J, Chen S, Guo S, Wang S, Li L, Liu X, Zheng K, Liu Y, Chen S. RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records. BMC Med Inform Decis Mak 2023;23:126. [PMID: 37464410 PMCID: PMC10353087 DOI: 10.1186/s12911-023-02239-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 07/13/2023] [Indexed: 07/20/2023] Open

Abstract

BACKGROUND

The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital's electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload.

METHODS

We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation.

RESULTS

The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518-0.9884) and ultrasonographic measures (F-score 0.9472-0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63.

CONCLUSION

A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers.

Collapse

Epstein RH, Jean YK, Dudaryk R, Freundlich RE, Walco JP, Mueller DA, Banks SE. Natural Language Mapping of Electrocardiogram Interpretations to a Standardized Ontology. Methods Inf Med 2021;60:104-109. [PMID: 34610644 DOI: 10.1055/s-0041-1736312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Abstract

BACKGROUND

Interpretations of the electrocardiogram (ECG) are often prepared using software outside the electronic health record (EHR) and imported via an interface as a narrative note. Thus, natural language processing is required to create a computable representation of the findings. Challenges include misspellings, nonstandard abbreviations, jargon, and equivocation in diagnostic interpretations.

OBJECTIVES

Our objective was to develop an algorithm to reliably and efficiently extract such information and map it to the standardized ECG ontology developed jointly by the American Heart Association, the American College of Cardiology Foundation, and the Heart Rhythm Society. The algorithm was to be designed to be easily modifiable for use with EHRs and ECG reporting systems other than the ones studied.

METHODS

An algorithm using natural language processing techniques was developed in structured query language to extract and map quantitative and diagnostic information from ECG narrative reports to the cardiology societies' standardized ECG ontology. The algorithm was developed using a training dataset of 43,861 ECG reports and applied to a test dataset of 46,873 reports.

RESULTS

Accuracy, precision, recall, and the F1-measure were all 100% in the test dataset for the extraction of quantitative data (e.g., PR and QTc interval, atrial and ventricular heart rate). Performances for matches in each diagnostic category in the standardized ECG ontology were all above 99% in the test dataset. The processing speed was approximately 20,000 reports per minute. We externally validated the algorithm from another institution that used a different ECG reporting system and found similar performance.

CONCLUSION

The developed algorithm had high performance for creating a computable representation of ECG interpretations. Software and lookup tables are provided that can easily be modified for local customization and for use with other EHR and ECG reporting systems. This algorithm has utility for research and in clinical decision-support where incorporation of ECG findings is desired.

Collapse

Robinson JR, Wei WQ, Roden DM, Denny JC. Defining Phenotypes from Clinical Data to Drive Genomic Research. Annu Rev Biomed Data Sci 2018;1:69-92. [PMID: 34109303 DOI: 10.1146/annurev-biodatasci-080917-013335] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Ye Y, Larrat EP, Caffrey AR. Algorithms used to identify ventricular arrhythmias and sudden cardiac death in retrospective studies: a systematic literature review. Ther Adv Cardiovasc Dis 2017;12:39-51. [PMID: 29224509 DOI: 10.1177/1753944717745493] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Karnes JH, Shaffer CM, Cronin R, Bastarache L, Gaudieri S, James I, Pavlos R, Steiner H, Mosley JD, Mallal S, Denny JC, Phillips EJ, Roden DM. Influence of Human Leukocyte Antigen (HLA) Alleles and Killer Cell Immunoglobulin-Like Receptors (KIR) Types on Heparin-Induced Thrombocytopenia (HIT). Pharmacotherapy 2017;37:1164-1171. [PMID: 28688202 PMCID: PMC5600645 DOI: 10.1002/phar.1983] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Abstract

Heparin-induced thrombocytopenia (HIT) is an unpredictable, life-threatening, immune-mediated reaction to heparin. Variation in human leukocyte antigen (HLA) genes is now used to prevent immune-mediated adverse drug reactions. Combinations of HLA alleles and killer cell immunoglobulin-like receptors (KIR) are associated with multiple autoimmune diseases and infections. The objective of this study is to evaluate the association of HLA alleles and KIR types, alone or in the presence of different HLA ligands, with HIT. HIT cases and heparin-exposed controls were identified in BioVU, an electronic health record coupled to a DNA biobank. HLA sequencing and KIR type imputation using Illumina OMNI-Quad data were performed. Odds ratios for HLA alleles and KIR types and HLA*KIR interactions using conditional logistic regressions were determined in the overall population and by race/ethnicity. Analysis was restricted to KIR types and HLA alleles with a frequency greater than 0.01. The p values for HLA and KIR association were corrected by using a false discovery rate q<0.05 and HLA*KIR interactions were considered significant at p<0.05. Sixty-five HIT cases and 350 matched controls were identified. No statistical differences in baseline characteristics were observed between cases and controls. The HLA-DRB3*01:01 allele was significantly associated with HIT in the overall population (odds ratio 2.81 [1.57-5.02], p=2.1×10-4 , q=0.02) and in individuals with European ancestry, independent of other alleles. No KIR types were associated with HIT, although a significant interaction was observed between KIR2DS5 and the HLA-C1 KIR binding group (p=0.03). The HLA-DRB3*01:01 allele was identified as a potential risk factor for HIT. This class II HLA gene and allele represent biologically plausible candidates for influencing HIT pathogenesis. We found limited evidence of the role of KIR types in HIT pathogenesis. Replication and further study of the HLA-DRB3*01:01 association is necessary.

Collapse

Affiliation(s)

Jason H Karnes Department of Pharmacy Practice and Science, University of Arizona College of Pharmacy, Tucson, AZ Sarver Heart Center, Tucson, AZ
Christian M Shaffer Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
Robert Cronin Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville
Lisa Bastarache Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville
Silvana Gaudieri School of Anatomy, Physiology and Human Biology, University of Western Australia, Nedlands, Western Australia, Australia Division of Infectious Diseases, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
Ian James Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
Rebecca Pavlos Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia
Heidi Steiner Department of Pharmacy Practice and Science, University of Arizona College of Pharmacy, Tucson, AZ
Jonathan D Mosley Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
Simon Mallal Division of Infectious Diseases, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch, Western Australia, Australia Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN
Joshua C Denny Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville
Elizabeth J Phillips Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN
Dan M Roden Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, TN

Collapse

Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP. A comparison of rule-based and machine learning approaches for classifying patient portal messages. Int J Med Inform 2017;105:110-120. [PMID: 28750904 DOI: 10.1016/j.ijmedinf.2017.06.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 06/13/2017] [Accepted: 06/20/2017] [Indexed: 12/28/2022]

Abstract

OBJECTIVE

Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care.

MATERIALS AND METHODS

We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers.

RESULTS

The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean').

CONCLUSIONS

This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.

Collapse

Kuo TT, Rao P, Maehara C, Doan S, Chaparro JD, Day ME, Farcas C, Ohno-Machado L, Hsu CN. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1880-1889. [PMID: 28269947 PMCID: PMC5333200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Teixeira PL, Wei WQ, Cronin RM, Mo H, VanHouten JP, Carroll RJ, LaRose E, Bastarache LA, Rosenbloom ST, Edwards TL, Roden DM, Lasko TA, Dart RA, Nikolai AM, Peissig PL, Denny JC. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc 2016;24:162-171. [PMID: 27497800 DOI: 10.1093/jamia/ocw071] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 04/03/2016] [Accepted: 04/07/2016] [Indexed: 12/11/2022] Open

Abstract

OBJECTIVE

Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites.

MATERIALS AND METHODS

We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic.

RESULTS

Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar.

CONCLUSION

This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.

Collapse

Affiliation(s)

Pedro L Teixeira Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Robert M Cronin Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Huan Mo Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Jacob P VanHouten Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
Robert J Carroll Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Eric LaRose Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
Lisa A Bastarache Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
S Trent Rosenbloom Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
Todd L Edwards Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Dan M Roden Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, TN, USA
Thomas A Lasko Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
Richard A Dart Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 N Oak Ave-MLR, Marshfield, WI 54449, USA
Anne M Nikolai Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
Peggy L Peissig Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA .,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA

Collapse

Luo Y, Szolovits P. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records. BIOMEDICAL INFORMATICS INSIGHTS 2016;8:29-38. [PMID: 27478379 PMCID: PMC4954589 DOI: 10.4137/bii.s38916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 06/13/2016] [Accepted: 06/22/2016] [Indexed: 11/07/2022]

Sauer BC, Jones BE, Globe G, Leng J, Lu CC, He T, Teng CC, Sullivan P, Zeng Q. Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data. EGEMS 2016;4:1217. [PMID: 27376095 PMCID: PMC4909376 DOI: 10.13063/2327-9214.1217] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Denny JC, Spickard A, Speltz PJ, Porier R, Rosenstiel DE, Powers JS. Using natural language processing to provide personalized learning opportunities from trainee clinical notes. J Biomed Inform 2015;56:292-9. [PMID: 26070431 DOI: 10.1016/j.jbi.2015.06.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 06/01/2015] [Accepted: 06/03/2015] [Indexed: 12/20/2022]

Liu M, Hu Y, Tang B. Role of text mining in early identification of potential drug safety issues. Methods Mol Biol 2015;1159:227-51. [PMID: 24788270 DOI: 10.1007/978-1-4939-0709-0_13] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Karnes JH, Cronin RM, Rollin J, Teumer A, Pouplard C, Shaffer CM, Blanquicett C, Bowton EA, Cowan JD, Mosley JD, Van Driest SL, Weeke PE, Wells QS, Bakchoul T, Denny JC, Greinacher A, Gruel Y, Roden DM. A genome-wide association study of heparin-induced thrombocytopenia using an electronic medical record. Thromb Haemost 2014;113:772-81. [PMID: 25503805 DOI: 10.1160/th14-08-0670] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 10/27/2014] [Indexed: 12/20/2022]

Rosenbloom ST, Harris P, Pulley J, Basford M, Grant J, DuBuisson A, Rothman RL. The Mid-South clinical Data Research Network. J Am Med Inform Assoc 2014;21:627-32. [PMID: 24821742 PMCID: PMC4078290 DOI: 10.1136/amiajnl-2014-002745] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Bui DDA, Zeng-Treitler Q. Learning regular expressions for clinical text classification. J Am Med Inform Assoc 2014;21:850-7. [PMID: 24578357 DOI: 10.1136/amiajnl-2013-002411] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Smith J, Denny J, Chen Q, Nian H, Spickard III A, Rosenbloom ST, Miller RA. Lessons learned from developing a drug evidence base to support pharmacovigilance. Appl Clin Inform 2013;4:596-617. [PMID: 24454585 PMCID: PMC3885918 DOI: 10.4338/aci-2013-08-ra-0062] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/06/2013] [Indexed: 12/14/2022] Open

Suh KS, Sarojini S, Youssif M, Nalley K, Milinovikj N, Elloumi F, Russell S, Pecora A, Schecter E, Goy A. Tissue banking, bioinformatics, and electronic medical records: the front-end requirements for personalized medicine. JOURNAL OF ONCOLOGY 2013;2013:368751. [PMID: 23818899 PMCID: PMC3683471 DOI: 10.1155/2013/368751] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 05/03/2013] [Accepted: 05/07/2013] [Indexed: 11/26/2022]

Theobald CN, Stover DG, Choma NN, Hathaway J, Green JK, Peterson NB, Sponsler KC, Vasilevskis EE, Kripalani S, Sergent J, Brown NJ, Denny JC. The effect of reducing maximum shift lengths to 16 hours on internal medicine interns' educational opportunities. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2013;88:512-518. [PMID: 23425987 PMCID: PMC3638874 DOI: 10.1097/acm.0b013e318285800f] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 2013;127:1377-85. [PMID: 23463857 DOI: 10.1161/circulationaha.112.000604] [Citation(s) in RCA: 148] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract

BACKGROUND

ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias.

METHODS AND RESULTS

We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium QRS genome-wide association study meta-analysis. Twenty-three single-nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 single-nucleotide polymorphisms were in the chromosome 3 SCN5A and SCN10A loci, where the most significant single-nucleotide polymorphisms were rs1805126 in SCN5A with P=1.2×10(-8) (eMERGE) and P=2.5×10(-20) (CHARGE) and rs6795970 in SCN10A with P=6×10(-6) (eMERGE) and P=5×10(-27) (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies on variants in these 5 loci in 13859 European Americans to search for diagnoses associated with these markers. Phenome-wide association study identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5272 "heart-healthy" study population.

CONCLUSIONS

We conclude that DNA biobanks coupled to electronic medical records not only provide a platform for genome-wide association study but also may allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The phenome-wide association study approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.

Collapse

Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol 2012;8:e1002823. [PMID: 23300414 PMCID: PMC3531280 DOI: 10.1371/journal.pcbi.1002823] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.

Collapse

Liu M, McPeek Hinz ER, Matheny ME, Denny JC, Schildcrout JS, Miller RA, Xu H. Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc 2012;20:420-6. [PMID: 23161894 DOI: 10.1136/amiajnl-2012-001119] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Denny JC, Choma NN, Peterson JF, Miller RA, Bastarache L, Li M, Peterson NB. Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Med Decis Making 2012;32:188-197. [PMID: 21393557 PMCID: PMC9616628 DOI: 10.1177/0272989x11400418] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]

Stenner SP, Johnson KB, Denny JC. PASTE: patient-centered SMS text tagging in a medication management system. J Am Med Inform Assoc 2011;19:368-74. [PMID: 21984605 DOI: 10.1136/amiajnl-2011-000484] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Xu H, AbdelRahman S, Lu Y, Denny JC, Doan S. Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences. J Biomed Inform 2011;44:1068-75. [PMID: 21856440 DOI: 10.1016/j.jbi.2011.08.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2011] [Revised: 07/26/2011] [Accepted: 08/07/2011] [Indexed: 11/20/2022]

Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011;3:79re1. [PMID: 21508311 PMCID: PMC3690272 DOI: 10.1126/scitranslmed.3001807] [Citation(s) in RCA: 246] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Wilke RA, Xu H, Denny JC, Roden DM, Krauss RM, McCarty CA, Davis RL, Skaar T, Lamba J, Savova G. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther 2011;89:379-86. [PMID: 21248726 DOI: 10.1038/clpt.2010.260] [Citation(s) in RCA: 130] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH, Pulley JM, Basford MA, Masys DR, Haines JL, Roden DM. Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation 2010;122:2016-21. [PMID: 21041692 PMCID: PMC2991609 DOI: 10.1161/circulationaha.110.948828] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Xu H, Lu Y, Jiang M, Liu M, Denny JC, Dai Q, Peterson NB. Mining Biomedical Literature for Terms related to Epidemiologic Exposures. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010;2010:897-901. [PMID: 21347108 PMCID: PMC3041399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Ramirez AH, Schildcrout JS, Blakemore DL, Masys DR, Pulley JM, Basford MA, Roden DM, Denny JC. Modulators of normal electrocardiographic intervals identified in a large electronic medical record. Heart Rhythm 2010;8:271-7. [PMID: 21044898 DOI: 10.1016/j.hrthm.2010.10.034] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2010] [Accepted: 10/26/2010] [Indexed: 10/18/2022]

Abstract

BACKGROUND

Traditional electrocardiographic (ECG) reference ranges were derived from studies in communities or clinical trial populations. The distribution of ECG parameters in a large population presenting to a healthcare system has not been studied.

OBJECTIVE

The purpose of this study was to define the contribution of age, race, gender, height, body mass index, and type 2 diabetes mellitus to normal ECG parameters in a population presenting to a healthcare system.

METHODS

Study subjects were obtained from the Vanderbilt Synthetic Derivative, a de-identified image of the electronic medical record (EMR), containing more than 20 years of records on 1.7 million subjects. We identified 63,177 unique subjects with an ECG that was read as "normal" by the reviewing cardiologist. Using combinations of natural language processing and laboratory and billing code queries, we identified a subset of 32,949 subjects without cardiovascular disease, interfering medications, or abnormal electrolytes. The ethnic makeup was 77% Caucasian, 13% African American, 1% Hispanic, 1% Asian, and 8% unknown.

RESULTS

The range that included 95% of normal PR intervals was 125-196 ms, QRS 69-103 ms, QT interval corrected with Bazett formula 365-458 ms, and heart rate 54-96 bpm. Linear regression modeling of patient characteristic effects reproduced known age and gender effects and identified novel associations with race, body mass index, and type 2 diabetes mellitus. A web-based application for patient-specific normal ranges is available online at http://biostat.mc.vanderbilt.edu/ECGPredictionInterval.

CONCLUSION

Analysis of a large set of EMR-derived normal ECGs reproduced known associations, found new relationships, and established patient-specific normal ranges. Such knowledge informs clinical and genetic research and may improve understanding of normal cardiac physiology.

Collapse

Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, Peterson NB. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc 2010;17:383-8. [PMID: 20595304 DOI: 10.1136/jamia.2010.004804] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Liu M, Denny JC, Mani S, Chen Y, Hu Y, Xu H. Identifying potential drugs that induce QT prolongation using electronic medical records. BMC Bioinformatics 2010. [PMCID: PMC3290073 DOI: 10.1186/1471-2105-11-s4-p2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB, Pulley JM, Basford MA, Brown-Gentry K, Balser JR, Masys DR, Haines JL, Roden DM. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am J Hum Genet 2010;86:560-72. [PMID: 20362271 DOI: 10.1016/j.ajhg.2010.03.003] [Citation(s) in RCA: 255] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Revised: 02/18/2010] [Accepted: 03/01/2010] [Indexed: 11/20/2022] Open

Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 2010;17:19-24. [PMID: 20064797 DOI: 10.1197/jamia.m3378] [Citation(s) in RCA: 303] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Denny JC, Bastarache L, Sastre EA, Spickard A. Tracking medical students' clinical experiences using natural language processing. J Biomed Inform 2009;42:781-9. [PMID: 19236956 PMCID: PMC5490452 DOI: 10.1016/j.jbi.2009.02.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Revised: 02/10/2009] [Accepted: 02/13/2009] [Indexed: 10/21/2022]

Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing. J Digit Imaging 2009;23:119-32. [PMID: 19484309 PMCID: PMC2837158 DOI: 10.1007/s10278-009-9215-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2009] [Revised: 04/07/2009] [Accepted: 05/02/2009] [Indexed: 11/24/2022] Open

Abstract

Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000–2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.

Collapse