1
|
Exploring Patterns of Transportation-Related CO2 Emissions Using Machine Learning Methods. SUSTAINABILITY 2022. [DOI: 10.3390/su14084588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
While the transportation sector is one of largest economic growth drivers for many countries, the adverse impacts of transportation on air quality are also well-noted, especially in developing countries. Carbon dioxide (CO2) emissions are one of the direct results of a transportation sector powered by burning fossil-based fuels. Detailed knowledge of CO2 emissions produced by the transportation sectors in various countries is essential for these countries to revise their future energy investments and policies. In this framework, three machine learning algorithms, ordinary least squares regression (OLS), support vector machine (SVM), and gradient boosting regression (GBR), are used to forecast transportation-based CO2 emissions. Both socioeconomic factors and transportation factors are also included as features in the study. We study the top 30 CO2 emissions-producing countries, including the Tier 1 group (the top five countries, accounting for 61% of global CO2 emissions production) and the Tier 2 group (the next 25 countries, accounting for 35% of total CO2 emissions production). We evaluate our model using four-fold cross-validation and report four frequently used statistical metrics (R2, MAE, rRMSE, and MAPE). Of the three machine learning algorithms, the GBR model with features combining socioeconomic and transportation factors (GBR_ALL) has the best performance, with an R2 value of 0.9943, rRMSE of 0.1165, and MAPE of 0.1408. We also find that both transportation features and socioeconomic features are important for transportation-based CO2 emission prediction. Transportation features are more important in modeling for 30 countries, while socioeconomic features (especially GDP and population) are more important when modeling for Tier 1 and Tier 2 countries.
Collapse
|
2
|
Liu F, Zheng X, Yu H, Tjia J. Neural Multi-Task Learning for Adverse Drug Reaction Extraction. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:756-762. [PMID: 33936450 PMCID: PMC8075418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.
Collapse
Affiliation(s)
- Feifan Liu
- University of Massachusetts Medical School, Worcester, MA, USA
| | - Xiaoyu Zheng
- University of Massachusetts Medical School, Worcester, MA, USA
| | - Hong Yu
- University of Massachusetts Lowell, Lowell, MA, USA
| | - Jennifer Tjia
- University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
3
|
Sutphin C, Lee K, Yepes AJ, Uzuner Ö, McInnes BT. Adverse drug event detection using reason assignments in FDA drug labels. J Biomed Inform 2020; 110:103552. [PMID: 32890727 DOI: 10.1016/j.jbi.2020.103552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 08/27/2020] [Accepted: 08/29/2020] [Indexed: 10/23/2022]
Abstract
Adverse drug events (ADEs) are unintended incidents that involve the taking of a medication. ADEs pose significant health and financial problems worldwide. Information about ADEs can inform health care and improve patient safety. However, much of this information is buried in narrative texts and needs to be extracted with Natural Language Processing techniques, in order to be useful to computerized methods. ADEs can be found on drug labels, contained in the different sections such as descriptions of the drug's active components or more prominently in descriptions of studied side-effects. Extracting these automatically could be useful in triaging and processing drug reports. In this paper, we present three base methods consisting of a Conditional Random Field (CRF), a bi-directional Long Short Term Memory unit with a CRF layer (biLSTM+CRF), and a pre-trained Bi-directional Encoder Representations from Transformers (BERT) model. We also present several ensembles of the CRF and biLSTM+CRF methods for extracting ADEs and their Reason from FDA drug labels. We show that all three methods perform well on our task, and that combining the models through different ensemble methods can improve results, providing increases in recall for the majority class and improving precision for all other classes. We also show the potential of framing ADE extraction from drug labels as a multi-class classification task on the Reason, or type, of ADE.
Collapse
Affiliation(s)
- Corey Sutphin
- Virginia Commonwealth University, Richmond, VA, USA.
| | - Kahyun Lee
- George Mason University, Fairfax, VA, USA
| | | | | | | |
Collapse
|
4
|
Malec SA, Boyce RD. Exploring Novel Computable Knowledge in Structured Drug Product Labels. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:403-412. [PMID: 32477661 PMCID: PMC7233092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This paper introduces a database derived from Structured Product Labels (SPLs). SPLs are legally mandated snapshots containing information on all drugs released to market in the United States. Since publication is not required for pre-trial findings, we hypothesize that SPLs may contain knowledge absent in the literature, and hence "novel." SemMedDB is an existing database of computable knowledge derived from the literature. If SPL content could be similarly transformed, novel clinically relevant assertions in the SPLs could be identified through comparison with SemMedDB. After we derive a database (containing 4,297,481 assertions), we compare the extracted content with SemMedDB for recent FDA drug approvals. We find that novelty between the SPLs and the literature is nuanced, due to the redundancy of SPLs. Highlighting areas for improvement and future work, we conclude that SPLs contain a wealth of novel knowledge relevant to research and complementary to the literature.
Collapse
Affiliation(s)
- Scott A Malec
- University of Pittsburgh Department of Biomedical Informatics, Pittsburgh, PA
| | - Richard D Boyce
- University of Pittsburgh Department of Biomedical Informatics, Pittsburgh, PA
| |
Collapse
|
5
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Santiso S, Perez A, Casillas A. Exploring Joint AB-LSTM With Embedded Lemmas for Adverse Drug Reaction Discovery. IEEE J Biomed Health Inform 2019; 23:2148-2155. [DOI: 10.1109/jbhi.2018.2879744] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
Jagannatha A, Liu F, Liu W, Yu H. Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0). Drug Saf 2019; 42:99-111. [PMID: 30649735 PMCID: PMC6860017 DOI: 10.1007/s40264-018-0762-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
INTRODUCTION This work describes the Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) corpus and provides an overview of the MADE 1.0 2018 challenge for extracting medication, indication, and adverse drug events (ADEs) from electronic health record (EHR) notes. OBJECTIVE The goal of MADE is to provide a set of common evaluation tasks to assess the state of the art for natural language processing (NLP) systems applied to EHRs supporting drug safety surveillance and pharmacovigilance. We also provide benchmarks on the MADE dataset using the system submissions received in the MADE 2018 challenge. METHODS The MADE 1.0 challenge has released an expert-annotated cohort of medication and ADE information comprising 1089 fully de-identified longitudinal EHR notes from 21 randomly selected patients with cancer at the University of Massachusetts Memorial Hospital. Using this cohort as a benchmark, the MADE 1.0 challenge designed three shared NLP tasks. The named entity recognition (NER) task identifies medications and their attributes (dosage, route, duration, and frequency), indications, ADEs, and severity. The relation identification (RI) task identifies relations between the named entities: medication-indication, medication-ADE, and attribute relations. The third shared task (NER-RI) evaluates NLP models that perform the NER and RI tasks jointly. In total, 11 teams from four countries participated in at least one of the three shared tasks, and 41 system submissions were received in total. RESULTS The best systems F1 scores for NER, RI, and NER-RI were 0.82, 0.86, and 0.61, respectively. Ensemble classifiers using the team submissions improved the performance further, with an F1 score of 0.85, 0.87, and 0.66 for the three tasks, respectively. CONCLUSION MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain. However, some room for improvement remains, particularly in the NER-RI task.
Collapse
Affiliation(s)
- Abhyuday Jagannatha
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA
| | - Feifan Liu
- Department of Quantitative Health Sciences and Radiology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Weisong Liu
- Department of Computer Science, University of Massachusetts, 220 Pawtucket St., Lowell, MA, 01854-2874, USA
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Hong Yu
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA.
- Department of Computer Science, University of Massachusetts, 220 Pawtucket St., Lowell, MA, 01854-2874, USA.
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA.
- Bedford VAMC, Bedford, MA, USA.
| |
Collapse
|
8
|
Lamy JB, Berthelot H, Favre M, Ugon A, Duclos C, Venot A. Using visual analytics for presenting comparative information on new drugs. J Biomed Inform 2017; 71:58-69. [DOI: 10.1016/j.jbi.2017.04.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 04/26/2017] [Accepted: 04/27/2017] [Indexed: 10/19/2022]
|
9
|
Moreno I, Boldrini E, Moreda P, Romá-Ferri MT. DrugSemantics: A corpus for Named Entity Recognition in Spanish Summaries of Product Characteristics. J Biomed Inform 2017. [PMID: 28624642 DOI: 10.1016/j.jbi.2017.06.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
For the healthcare sector, it is critical to exploit the vast amount of textual health-related information. Nevertheless, healthcare providers have difficulties to benefit from such quantity of data during pharmacotherapeutic care. The problem is that such information is stored in different sources and their consultation time is limited. In this context, Natural Language Processing techniques can be applied to efficiently transform textual data into structured information so that it could be used in critical healthcare applications, being of help for physicians in their daily workload, such as: decision support systems, cohort identification, patient management, etc. Any development of these techniques requires annotated corpora. However, there is a lack of such resources in this domain and, in most cases, the few ones available concern English. This paper presents the definition and creation of DrugSemantics corpus, a collection of Summaries of Product Characteristics in Spanish. It was manually annotated with pharmacotherapeutic named entities, detailed in DrugSemantics annotation scheme. Annotators were a Registered Nurse (RN) and two students from the Degree in Nursing. The quality of DrugSemantics corpus has been assessed by measuring its annotation reliability (overall F=79.33% [95%CI: 78.35-80.31]), as well as its annotation precision (overall P=94.65% [95%CI: 94.11-95.19]). Besides, the gold-standard construction process is described in detail. In total, our corpus contains more than 2000 named entities, 780 sentences and 226,729 tokens. Last, a Named Entity Classification module trained on DrugSemantics is presented aiming at showing the quality of our corpus, as well as an example of how to use it.
Collapse
Affiliation(s)
- Isabel Moreno
- Department of Software and Computing Systems, University of Alicante, Alicante, Spain.
| | - Ester Boldrini
- Department of Software and Computing Systems, University of Alicante, Alicante, Spain.
| | - Paloma Moreda
- Department of Software and Computing Systems, University of Alicante, Alicante, Spain.
| | | |
Collapse
|
10
|
Sharp ME. Toward a comprehensive drug ontology: extraction of drug-indication relations from diverse information sources. J Biomed Semantics 2017; 8:2. [PMID: 28069052 PMCID: PMC5223332 DOI: 10.1186/s13326-016-0110-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 12/16/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Drug ontologies could help pharmaceutical researchers overcome information overload and speed the pace of drug discovery, thus benefiting the industry and patients alike. Drug-disease relations, specifically drug-indication relations, are a prime candidate for representation in ontologies. There is a wealth of available drug-indication information, but structuring and integrating it is challenging. RESULTS We created a drug-indication database (DID) of data from 12 openly available, commercially available, and proprietary information sources, integrated by terminological normalization to UMLS and other authorities. Across sources, there are 29,964 unique raw drug/chemical names, 10,938 unique raw indication "target" terms, and 192,008 unique raw drug-indication pairs. Drug/chemical name normalization to CAS numbers or UMLS concepts reduced the unique name count to 91 or 85% of the raw count, respectively, 84% if combined. Indication "target" normalization to UMLS "phenotypic-type" concepts reduced the unique term count to 57% of the raw count. The 12 sources of raw data varied widely in coverage (numbers of unique drug/chemical and indication concepts and relations) generally consistent with the idiosyncrasies of each source, but had strikingly little overlap, suggesting that we successfully achieved source/raw data diversity. CONCLUSIONS The DID is a database of structured drug-indication relations intended to facilitate building practical, comprehensive, integrated drug ontologies. The DID itself is not an ontology, but could be converted to one more easily than the contributing raw data. Our methodology could be adapted to the creation of other structured drug-disease databases such as for contraindications, precautions, warnings, and side effects.
Collapse
Affiliation(s)
- Mark E Sharp
- Scientific Information Management, Merck Research Laboratories, 770 Sumneytown Pike, West Point, Philadelphia, PA, 19486, USA.
| |
Collapse
|
11
|
Martínez P, Martínez JL, Segura-Bedmar I, Moreno-Schneider J, Luna A, Revert R. Turning user generated health-related content into actionable knowledge through text analytics services. COMPUT IND 2016. [DOI: 10.1016/j.compind.2015.10.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Adverse drug reactions in Colombian patients, 2007-2013: Analysis of population databases. BIOMEDICA 2016; 36:59-66. [PMID: 27622439 DOI: 10.7705/biomedica.v36i1.2781] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 06/23/2015] [Indexed: 11/21/2022]
Abstract
INTRODUCTION Recognizing adverse drug reactions (ADRs) is becoming more important in clinical practice. OBJECTIVE To determine the frequency of adverse drug reactions and ADR suspicions among the population affiliated to the Colombian health system and to describe the drugs, reactions and associated variables. MATERIALS AND METHODS We revised ADRs and ADRs suspicion databases from drugs dispensed by Audifarma, S.A., both for inpatient and outpatient care from 2007 to 2013. Variables included ADR report date, city, drug, drug's Anatomical Therapeutic Classification (ATC), ADR severity, ADR type, ADR classification and ADR probability according to the World Health Organization's definitions. RESULTS We obtained 5,342 reports for 468 different drugs. The ATC groups with the most reports were anti-infectives for systemic use (25.5%), nervous system agents (17.1%) and cardiovascular system drugs (15.0%). The drugs with the highest number of reports were metamizole (4.2%), enalapril (3.8%), clarithromycin (2.8%), warfarin (2.5%) and ciprofloxacin (2.4%). The most common ADR, classified following the World Health Organization adverse reaction terminology, were: skin and appendages disorders (35.3%), general disorders (14.2%) and gastrointestinal system disorders (11.8%). Overall, 49.4% of the ADRs were classified as "moderate" and 45.1% as "mild". CONCLUSION An increasing number of ADR reports were found coinciding with a worldwide tendency. Differences between inpatient and outpatient ADR reports were found when compared to scientific publications. The information on ADR reports, mainly gathered by the Instituto Nacional de Vigilancia de Medicamentos y Alimentos - Invima, should be made public for academic and institutional use.
Collapse
|
13
|
Rodriguez LM, Fushman DD. Automatic Classification of Structured Product Labels for Pregnancy Risk Drug Categories, a Machine Learning Approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1093-1102. [PMID: 26958248 PMCID: PMC4765680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
With regular expressions and manual review, 18,342 FDA-approved drug product labels were processed to determine if the five standard pregnancy drug risk categories were mentioned in the label. After excluding 81 drugs with multiple-risk categories, 83% of the labels had a risk category within the text and 17% labels did not. We trained a Sequential Minimal Optimization algorithm on the labels containing pregnancy risk information segmented into standard document sections. For the evaluation of the classifier on the testing set, we used the Micromedex drug risk categories. The precautions section had the best performance for assigning drug risk categories, achieving Accuracy 0.79, Precision 0.66, Recall 0.64 and F1 measure 0.65. Missing pregnancy risk categories could be suggested using machine learning algorithms trained on the existing publicly available pregnancy risk information.
Collapse
Affiliation(s)
- Laritza M Rodriguez
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Dina Demner Fushman
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD
| |
Collapse
|
14
|
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions. J Biomed Inform 2015; 56:318-32. [DOI: 10.1016/j.jbi.2015.06.016] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Revised: 05/21/2015] [Accepted: 06/23/2015] [Indexed: 11/20/2022]
|
15
|
Segura-Bedmar I, Martínez P, Revert R, Moreno-Schneider J. Exploring Spanish health social media for detecting drug effects. BMC Med Inform Decis Mak 2015; 15 Suppl 2:S6. [PMID: 26100267 PMCID: PMC4474583 DOI: 10.1186/1472-6947-15-s2-s6] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications. METHODS In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences. RESULTS Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48. CONCLUSIONS The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English.
Collapse
|
16
|
Li Q, Spooner SA, Kaiser M, Lingren N, Robbins J, Lingren T, Tang H, Solti I, Ni Y. An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak 2015; 15:37. [PMID: 25943550 PMCID: PMC4427951 DOI: 10.1186/s12911-015-0160-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 04/27/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data. METHODS We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed. RESULTS The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection. CONCLUSION By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.
Collapse
Affiliation(s)
- Qi Li
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Stephen Andrew Spooner
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.,Chief Medical Information Officer, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Megan Kaiser
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Nataline Lingren
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Jessica Robbins
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Todd Lingren
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Huaxiu Tang
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA
| | - Imre Solti
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.,James M. Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Yizhao Ni
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, 45229-3039, USA.
| |
Collapse
|
17
|
Ni Y, Wright J, Perentesis J, Lingren T, Deleger L, Kaiser M, Kohane I, Solti I. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis Mak 2015; 15:28. [PMID: 25881112 PMCID: PMC4407835 DOI: 10.1186/s12911-015-0149-3] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 03/24/2015] [Indexed: 11/22/2022] Open
Abstract
Background Manual eligibility screening (ES) for a clinical trial typically requires a labor-intensive review of patient records that utilizes many resources. Leveraging state-of-the-art natural language processing (NLP) and information extraction (IE) technologies, we sought to improve the efficiency of physician decision-making in clinical trial enrollment. In order to markedly reduce the pool of potential candidates for staff screening, we developed an automated ES algorithm to identify patients who meet core eligibility characteristics of an oncology clinical trial. Methods We collected narrative eligibility criteria from ClinicalTrials.gov for 55 clinical trials actively enrolling oncology patients in our institution between 12/01/2009 and 10/31/2011. In parallel, our ES algorithm extracted clinical and demographic information from the Electronic Health Record (EHR) data fields to represent profiles of all 215 oncology patients admitted to cancer treatment during the same period. The automated ES algorithm then matched the trial criteria with the patient profiles to identify potential trial-patient matches. Matching performance was validated on a reference set of 169 historical trial-patient enrollment decisions, and workload, precision, recall, negative predictive value (NPV) and specificity were calculated. Results Without automation, an oncologist would need to review 163 patients per trial on average to replicate the historical patient enrollment for each trial. This workload is reduced by 85% to 24 patients when using automated ES (precision/recall/NPV/specificity: 12.6%/100.0%/100.0%/89.9%). Without automation, an oncologist would need to review 42 trials per patient on average to replicate the patient-trial matches that occur in the retrospective data set. With automated ES this workload is reduced by 90% to four trials (precision/recall/NPV/specificity: 35.7%/100.0%/100.0%/95.5%). Conclusion By leveraging NLP and IE technologies, automated ES could dramatically increase the trial screening efficiency of oncologists and enable participation of small practices, which are often left out from trial enrollment. The algorithm has the potential to significantly reduce the effort to execute clinical research at a point in time when new initiatives of the cancer care community intend to greatly expand both the access to trials and the number of available trials. Electronic supplementary material The online version of this article (doi:10.1186/s12911-015-0149-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yizhao Ni
- Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA.
| | - Jordan Wright
- Cancer and Blood Disease Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - John Perentesis
- Cancer and Blood Disease Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA
| | - Louise Deleger
- Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA
| | - Megan Kaiser
- Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA
| | - Isaac Kohane
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Imre Solti
- Cincinnati Children's Hospital Medical Center, Department of Biomedical Informatics, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH, USA.,James M Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| |
Collapse
|
18
|
Khare R, Wei CH, Lu Z. Automatic extraction of drug indications from FDA drug labels. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:787-794. [PMID: 25954385 PMCID: PMC4419914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Extracting computable indications, i.e. drug-disease treatment relationships, from narrative drug resources is the key for building a gold standard drug indication repository. The two steps to the extraction problem are disease named-entity recognition (NER) to identify disease mentions from a free-text description and disease classification to distinguish indications from other disease mentions in the description. While there exist many tools for disease NER, disease classification is mostly achieved through human annotations. For example, we recently resorted to human annotations to prepare a corpus, LabeledIn, capturing structured indications from the drug labels submitted to FDA by pharmaceutical companies. In this study, we present an automatic end-to-end framework to extract structured and normalized indications from FDA drug labels. In addition to automatic disease NER, a key component of our framework is a machine learning method that is trained on the LabeledIn corpus to classify the NER-computed disease mentions as "indication vs. non-indication." Through experiments with 500 drug labels, our end-to-end system delivered 86.3% F1-measure in drug indication extraction, with 17% improvement over baseline. Further analysis shows that the indication classifier delivers a performance comparable to human experts and that the remaining errors are mostly due to disease NER (more than 50%). Given its performance, we conclude that our end-to-end approach has the potential to significantly reduce human annotation costs.
Collapse
Affiliation(s)
- Ritu Khare
- National Center for Biotechnology Information (NCBI), NIH, Bethesda, MD 20894
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), NIH, Bethesda, MD 20894
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), NIH, Bethesda, MD 20894
| |
Collapse
|
19
|
Lehmann HP. From Text Tagging to Decision Support. Med Decis Making 2014; 34:414-6. [DOI: 10.1177/0272989x14529847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|