1
|
Davarpanah MA, Adatorwovor R, Mansoori Y, Ramsheh FSR, Parsa A, Hajiani M, Faramarzi H, Kavuluru R, Asadipooya K. Combination of spironolactone and sitagliptin improves clinical outcomes of outpatients with COVID-19: a prospective cohort study. J Endocrinol Invest 2024; 47:235-243. [PMID: 37354247 DOI: 10.1007/s40618-023-02141-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/16/2023] [Indexed: 06/26/2023]
Abstract
BACKGROUND There are evidences showing that sitagliptin and spironolactone can potentially improve the clinical outcomes of COVID-19 cases. In this observational study on acutely symptomatic outpatient COVID-19 cases, we investigated the effects of spironolactone and sitagliptin on the outcomes of the disease. METHODS This is a prospective, naturally randomized cohort study. We followed mild to moderate symptomatic COVID-19 patients, who were treated with either combination (spironolactone 100 mg daily and sitagliptin 100 mg daily) or standard (steroid, antiviral and/or supportive care) therapy up to 30 days. The primary outcome was hospitalization rate. The secondary outcomes included ER visit, duration of disease, and complications, such as hypoglycemia, low blood pressure or altered mental status. RESULTS Of the 206 patients referred to clinics randomly, 103 received standard therapy and 103 treated with combination therapy. There were no significant differences in baseline characteristics, except for slightly higher clinical score in control group (6.92 ± 4.01 control, 4.87 ± 2.92 combination; P < 0.0001). Treatment with combination therapy was associated with lower admission rate (5.8% combination, 22.3% control; P = 0.0011), ER visits (7.8% combination, 23.3% control; P = 0.0021) and average duration of symptoms (6.67 ± 2.30 days combination, 18.71 ± 6.49 days control; P ≤ 0.0001). CONCLUSIONS The combination of sitagliptin and spironolactone reduced duration of COVID infection and hospital visits better than standard therapeutic approaches in outpatients with COVID-19. The effects of combination of sitagliptin and spironolactone in COVID-19 patients should be further verified in a double-blind, randomized, placebo-controlled trial.
Collapse
Affiliation(s)
- M A Davarpanah
- Shiraz HIV/AIDS Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - R Adatorwovor
- Department of Biostatistics, University of Kentucky, Lexington, KY, USA
| | - Y Mansoori
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - F S R Ramsheh
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - A Parsa
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - M Hajiani
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - H Faramarzi
- Department of Community Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - R Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | - K Asadipooya
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, Barnstable Brown Diabetes and Obesity Center, University of Kentucky, 2195 Harrodsburg Rd, Suite 125, Lexington, KY, 40504, USA.
| |
Collapse
|
2
|
Liu S, Wen A, Wang L, He H, Fu S, Miller R, Williams A, Harris D, Kavuluru R, Liu M, Abu-el-Rub N, Schutte D, Zhang R, Rouhizadeh M, Osborne JD, He Y, Topaloglu U, Hong SS, Saltz JH, Schaffter T, Pfaff E, Chute CG, Duong T, Haendel MA, Fuentes R, Szolovits P, Xu H, Liu H. An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc 2023; 30:2036-2040. [PMID: 37555837 PMCID: PMC10654844 DOI: 10.1093/jamia/ocad134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 06/28/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open
Abstract
Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.
Collapse
Affiliation(s)
- Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Robert Miller
- Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA
| | - Andrew Williams
- Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA
| | - Daniel Harris
- Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Ramakanth Kavuluru
- Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Mei Liu
- Department of Internal Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Noor Abu-el-Rub
- Department of Internal Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Dalton Schutte
- Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA
| | - Rui Zhang
- Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA
| | - John D Osborne
- Department of Computer Science, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Umit Topaloglu
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Stephanie S Hong
- Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | | | - Emily Pfaff
- Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Tim Duong
- Department of Radiology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA
| | | | - Peter Szolovits
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
3
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. medRxiv 2023:2023.05.05.23289524. [PMID: 37205575 PMCID: PMC10187451 DOI: 10.1101/2023.05.05.23289524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Objective The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eric B Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | | | - Jeremy L Warner
- Lifespan Health System, Providence, RI, USA
- Legorreta Cancer Center at Brown University, Providence, RI, USA
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
4
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. JCO Clin Cancer Inform 2023; 7:e2300156. [PMID: 38113411 PMCID: PMC10752457 DOI: 10.1200/cci.23.00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 12/21/2023] Open
Abstract
PURPOSE Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - Eric B. Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | | | - Jeremy L. Warner
- Lifespan Health System, Providence, RI
- Legorreta Cancer Center at Brown University, Providence, RI
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|
5
|
Wang L, He H, Wen A, Moon S, Fu S, Peterson KJ, Ai X, Liu S, Kavuluru R, Liu H. Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis. JMIR Med Inform 2023; 11:e48072. [PMID: 37368483 PMCID: PMC10337517 DOI: 10.2196/48072] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/25/2023] [Accepted: 06/01/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Kevin J Peterson
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | - Xuguang Ai
- Department of Computer Science, University of Kentucky, Lexington, KY, United States
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
6
|
Ai X, Kavuluru R. End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies. IEEE Int Conf Healthc Inform 2023; 2023:610-618. [PMID: 38274947 PMCID: PMC10809256 DOI: 10.1109/ichi57859.2023.00108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.
Collapse
Affiliation(s)
- Xuguang Ai
- Department of Computer Science, University of Kentucky, Lexington, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, Lexington, USA
| |
Collapse
|
7
|
Jiang Y, Kavuluru R. End-to-End n-ary Relation Extraction for Combination Drug Therapies. IEEE Int Conf Healthc Inform 2023; 2023:72-80. [PMID: 38283165 PMCID: PMC10814995 DOI: 10.1109/ichi57859.2023.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the combination drug therapy MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an n-ary relation extraction problem. Unlike in the general n-ary setting where n is fixed (e.g., drug-gene-mutation relations where n = 3), extracting combination therapies is a special setting where n ≥ 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7% on the CombDrugExt test set for positive (or effective) combinations. This is an absolute ≈ 5% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic n-ary extraction scenarios.
Collapse
Affiliation(s)
- Yuhang Jiang
- Department of Computer Science, University of Kentucky, Lexington, KY USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Dept. of Internal Medicine, Univ. of Kentucky, Lexington, KY USA
| |
Collapse
|
8
|
Fouladvand S, Talbert J, Dwoskin LP, Bush H, Meadows AL, Peterson LE, Mishra YR, Roggenkamp SK, Wang F, Kavuluru R, Chen J. A Comparative Effectiveness Study on Opioid Use Disorder Prediction Using Artificial Intelligence and Existing Risk Models. IEEE J Biomed Health Inform 2023; PP. [PMID: 37037255 DOI: 10.1109/jbhi.2023.3265920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Opioid use disorder (OUD) is a leading cause of death in the United States placing a tremendous burden on patients, their families, and health care systems. Artificial intelligence (AI) can be harnessed with available healthcare data to produce automated OUD prediction tools. In this retrospective study, we developed AI based models for OUD prediction and showed that AI can predict OUD more effectively than existing clinical tools including the unweighted opioid risk tool (ORT). Data include 474,208 patients' data over 10 years; 269,748 were females with an average age of 56.78 years. Cases are prescription opioid users with at least one diagnosis of OUD or at least one prescription for buprenorphine or methadone. Controls are prescription opioid users with no OUD diagnoses or buprenorphine or methadone prescriptions. On 100 randomly selected test sets including 47,396 patients, our proposed transformer-based AI model can predict OUD more efficiently (AUC=0.742 ±0.021) compared to logistic regression (AUC=0.651 ±0.025), random forest (AUC=0.679 ±0.026), xgboost (AUC=0.690 ±0.027), long short-term memory model (AUC=0.706 ±0.026), transformer (AUC=0.725 ±0.024), and unweighted ORT model (AUC=0.559 ±0.025). Our results show that embedding AI algorithms into clinical care may assist clinicians in risk stratification and management of patients receiving opioid therapy.
Collapse
|
9
|
Ward PJ, Young AM, Slavova S, Liford M, Daniels L, Lucas R, Kavuluru R. Deep Neural Networks for Fine-Grained Surveillance of Overdose Mortality. Am J Epidemiol 2023; 192:257-266. [PMID: 36222700 DOI: 10.1093/aje/kwac180] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/16/2022] [Accepted: 10/10/2022] [Indexed: 02/07/2023] Open
Abstract
Surveillance of drug overdose deaths relies on death certificates for identification of the substances that caused death. Drugs and drug classes can be identified through the International Classification of Diseases, Tenth Revision (ICD-10), codes present on death certificates. However, ICD-10 codes do not always provide high levels of specificity in drug identification. To achieve more fine-grained identification of substances on death certificate, the free-text cause-of-death section, completed by the medical certifier, must be analyzed. Current methods for analyzing free-text death certificates rely solely on lookup tables for identifying specific substances, which must be frequently updated and maintained. To improve identification of drugs on death certificates, a deep-learning named-entity recognition model was developed, utilizing data from the Kentucky Drug Overdose Fatality Surveillance System (2014-2019), which achieved an F1-score of 99.13%. This model can identify new drug misspellings and novel substances that are not present on current surveillance lookup tables, enhancing the surveillance of drug overdose deaths.
Collapse
|
10
|
Phuong J, Riches NO, Calzoni L, Datta G, Duran D, Lin AY, Singh RP, Solomonides AE, Whysel NY, Kavuluru R. Toward informatics-enabled preparedness for natural hazards to minimize health impacts of climate change. J Am Med Inform Assoc 2022; 29:2161-2167. [PMID: 36094062 PMCID: PMC9667167 DOI: 10.1093/jamia/ocac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 08/21/2022] [Accepted: 08/30/2022] [Indexed: 09/14/2023] Open
Abstract
Natural hazards (NHs) associated with climate change have been increasing in frequency and intensity. These acute events impact humans both directly and through their effects on social and environmental determinants of health. Rather than relying on a fully reactive incident response disposition, it is crucial to ramp up preparedness initiatives for worsening case scenarios. In this perspective, we review the landscape of NH effects for human health and explore the potential of health informatics to address associated challenges, specifically from a preparedness angle. We outline important components in a health informatics agenda for hazard preparedness involving hazard-disease associations, social determinants of health, and hazard forecasting models, and call for novel methods to integrate them toward projecting healthcare needs in the wake of a hazard. We describe potential gaps and barriers in implementing these components and propose some high-level ideas to address them.
Collapse
Affiliation(s)
- Jimmy Phuong
- University of Washington, School of Medicine, Research Information Technologies, Seattle, Washington, USA
- University of Washington, Harborview Injury Prevention and Research Center, Seattle, Washington, USA
| | - Naomi O Riches
- University of Utah School of Medicine, Obstetrics and Gynecology Research Network, Salt Lake City, Utah, USA
| | - Luca Calzoni
- National Institute on Minority Health and Health Disparities (NIMHD), National Institutes of Health, Bethesda, Maryland, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Gora Datta
- Department of Civil & Environmental Engineering, University of California at Berkeley, Berkeley, California, USA
| | - Deborah Duran
- National Institute on Minority Health and Health Disparities (NIMHD), National Institutes of Health, Bethesda, Maryland, USA
| | - Asiyah Yu Lin
- National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, Maryland, USA
| | - Ramesh P Singh
- School of Life and Earth Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, USA
| | - Anthony E Solomonides
- Department of Communication Design, NorthShore University Health System, Outcomes Research Network, Research Institute, Evanston, Illinois, USA
| | - Noreen Y Whysel
- New York City College of Technology, CUNY, Brooklyn, New York, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
11
|
Zhang HG, Dagliati A, Shakeri Hossein Abad Z, Xiong X, Bonzel CL, Xia Z, Tan BWQ, Avillach P, Brat GA, Hong C, Morris M, Visweswaran S, Patel LP, Gutiérrez-Sacristán A, Hanauer DA, Holmes JH, Samayamuthu MJ, Bourgeois FT, L'Yi S, Maidlow SE, Moal B, Murphy SN, Strasser ZH, Neuraz A, Ngiam KY, Loh NHW, Omenn GS, Prunotto A, Dalvin LA, Klann JG, Schubert P, Vidorreta FJS, Benoit V, Verdy G, Kavuluru R, Estiri H, Luo Y, Malovini A, Tibollo V, Bellazzi R, Cho K, Ho YL, Tan ALM, Tan BWL, Gehlenborg N, Lozano-Zahonero S, Jouhet V, Chiovato L, Aronow BJ, Toh EMS, Wong WGS, Pizzimenti S, Wagholikar KB, Bucalo M, Cai T, South AM, Kohane IS, Weber GM. International electronic health record-derived post-acute sequelae profiles of COVID-19 patients. NPJ Digit Med 2022; 5:81. [PMID: 35768548 PMCID: PMC9242995 DOI: 10.1038/s41746-022-00623-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/19/2022] [Indexed: 11/10/2022] Open
Abstract
The risk profiles of post-acute sequelae of COVID-19 (PASC) have not been well characterized in multi-national settings with appropriate controls. We leveraged electronic health record (EHR) data from 277 international hospitals representing 414,602 patients with COVID-19, 2.3 million control patients without COVID-19 in the inpatient and outpatient settings, and over 221 million diagnosis codes to systematically identify new-onset conditions enriched among patients with COVID-19 during the post-acute period. Compared to inpatient controls, inpatient COVID-19 cases were at significant risk for angina pectoris (RR 1.30, 95% CI 1.09–1.55), heart failure (RR 1.22, 95% CI 1.10–1.35), cognitive dysfunctions (RR 1.18, 95% CI 1.07–1.31), and fatigue (RR 1.18, 95% CI 1.07–1.30). Relative to outpatient controls, outpatient COVID-19 cases were at risk for pulmonary embolism (RR 2.10, 95% CI 1.58–2.76), venous embolism (RR 1.34, 95% CI 1.17–1.54), atrial fibrillation (RR 1.30, 95% CI 1.13–1.50), type 2 diabetes (RR 1.26, 95% CI 1.16–1.36) and vitamin D deficiency (RR 1.19, 95% CI 1.09–1.30). Outpatient COVID-19 cases were also at risk for loss of smell and taste (RR 2.42, 95% CI 1.90–3.06), inflammatory neuropathy (RR 1.66, 95% CI 1.21–2.27), and cognitive dysfunction (RR 1.18, 95% CI 1.04–1.33). The incidence of post-acute cardiovascular and pulmonary conditions decreased across time among inpatient cases while the incidence of cardiovascular, digestive, and metabolic conditions increased among outpatient cases. Our study, based on a federated international network, systematically identified robust conditions associated with PASC compared to control groups, underscoring the multifaceted cardiovascular and neurological phenotype profiles of PASC.
Collapse
Affiliation(s)
- Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | | | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bryce W Q Tan
- Department of Medicine, National University Hospital, Singapore, Singapore
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University Of Kansas Medical Center, Kansas City, MO, USA
| | | | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.,Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sarah E Maidlow
- Michigan Institute for Clinical and Health Research (MICHR) Informatics, University of Michigan, Ann Arbor, MI, USA
| | - Bertrand Moal
- IAM unit, Bordeaux University Hospital, Bordeaux, France
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | | | - Antoine Neuraz
- Department of biomedical informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris (APHP), University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical informatics, WiSDM, National University Health Systems Singapore, Singapore, Singapore
| | - Ne Hooi Will Loh
- Department of Anaesthesia, National University Health Systems Singapore, Singapore, Singapore
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Andrea Prunotto
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lauren A Dalvin
- Department of Ophthalmology, Mayo Clinic, Rochester, NY, USA
| | - Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | | | - Vincent Benoit
- IT Department, Innovation & Data, APHP Greater Paris University Hospital, Paris, France
| | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics (Department of Internal Medicine), University of Kentucky, Lexington, KY, USA
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA.,Population Health and Data Science, VA Boston Healthcare System, Boston, MA, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA, USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Byorn W L Tan
- Department of Medicine, National University Hospital, Singapore, Singapore
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Vianney Jouhet
- IAM unit, INSERM Bordeaux Population Health ERIAS TEAM, Bordeaux University Hospital / ERIAS - Inserm, U1219 BPH, Bordeaux, France
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Bruce J Aronow
- Departments of Biomedical Informatics, Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | - Emma M S Toh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Wei Gen Scott Wong
- Department of Medicine, National University Health Systems Singapore, Singapore, Singapore
| | - Sara Pizzimenti
- Scientific Direction, IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milan, Italy
| | | | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | | | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew M South
- Department of Pediatrics-Section of Nephrology, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Hong C, Zhang HG, L'Yi S, Weber G, Avillach P, Tan BWQ, Gutiérrez-Sacristán A, Bonzel CL, Palmer NP, Malovini A, Tibollo V, Luo Y, Hutch MR, Liu M, Bourgeois F, Bellazzi R, Chiovato L, Sanz Vidorreta FJ, Le TT, Wang X, Yuan W, Neuraz A, Benoit V, Moal B, Morris M, Hanauer DA, Maidlow S, Wagholikar K, Murphy S, Estiri H, Makoudjou A, Tippmann P, Klann J, Follett RW, Gehlenborg N, Omenn GS, Xia Z, Dagliati A, Visweswaran S, Patel LP, Mowery DL, Schriver ER, Samayamuthu MJ, Kavuluru R, Lozano-Zahonero S, Zöller D, Tan ALM, Tan BWL, Ngiam KY, Holmes JH, Schubert P, Cho K, Ho YL, Beaulieu-Jones BK, Pedrera-Jiménez M, García-Barrio N, Serrano-Balazote P, Kohane I, South A, Brat GA, Cai T. Changes in laboratory value improvement and mortality rates over the course of the pandemic: an international retrospective cohort study of hospitalised patients infected with SARS-CoV-2. BMJ Open 2022; 12:e057725. [PMID: 35738646 PMCID: PMC9226470 DOI: 10.1136/bmjopen-2021-057725] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 06/12/2022] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE To assess changes in international mortality rates and laboratory recovery rates during hospitalisation for patients hospitalised with SARS-CoV-2 between the first wave (1 March to 30 June 2020) and the second wave (1 July 2020 to 31 January 2021) of the COVID-19 pandemic. DESIGN, SETTING AND PARTICIPANTS This is a retrospective cohort study of 83 178 hospitalised patients admitted between 7 days before or 14 days after PCR-confirmed SARS-CoV-2 infection within the Consortium for Clinical Characterization of COVID-19 by Electronic Health Record, an international multihealthcare system collaborative of 288 hospitals in the USA and Europe. The laboratory recovery rates and mortality rates over time were compared between the two waves of the pandemic. PRIMARY AND SECONDARY OUTCOME MEASURES The primary outcome was all-cause mortality rate within 28 days after hospitalisation stratified by predicted low, medium and high mortality risk at baseline. The secondary outcome was the average rate of change in laboratory values during the first week of hospitalisation. RESULTS Baseline Charlson Comorbidity Index and laboratory values at admission were not significantly different between the first and second waves. The improvement in laboratory values over time was faster in the second wave compared with the first. The average C reactive protein rate of change was -4.72 mg/dL vs -4.14 mg/dL per day (p=0.05). The mortality rates within each risk category significantly decreased over time, with the most substantial decrease in the high-risk group (42.3% in March-April 2020 vs 30.8% in November 2020 to January 2021, p<0.001) and a moderate decrease in the intermediate-risk group (21.5% in March-April 2020 vs 14.3% in November 2020 to January 2021, p<0.001). CONCLUSIONS Admission profiles of patients hospitalised with SARS-CoV-2 infection did not differ greatly between the first and second waves of the pandemic, but there were notable differences in laboratory improvement rates during hospitalisation. Mortality risks among patients with similar risk profiles decreased over the course of the pandemic. The improvement in laboratory values and mortality risk was consistent across multiple countries.
Collapse
Affiliation(s)
- Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Griffin Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Bryce W Q Tan
- Department of Medicine, National University Hospital, Singapore
| | | | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Lombardia, Italy
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Lombardia, Italy
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Evanston, Illinois, USA
| | - Meghan R Hutch
- Department of Preventive Medicine, Northwestern University, Evanston, Illinois, USA
| | - Molei Liu
- Department of Biostatistics, Harvard University T H Chan School of Public Health, Boston, Massachusetts, USA
| | - Florence Bourgeois
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Lombardia, Italy
| | | | - Trang T Le
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Antoine Neuraz
- Department of Biomedical Informatics, Hopital Universitaire Necker-Enfants Malades, Paris, Île-de-France, France
| | - Vincent Benoit
- IT department, Innovation & Data, APHP Greater Paris University Hospital, Paris, France
| | - Bertrand Moal
- IAM unit, Bordeaux University Hospital, Bordeaux, France
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Sarah Maidlow
- MICHR Informatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kavishwar Wagholikar
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Shawn Murphy
- Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Adeline Makoudjou
- Institute of Medical Biometry and Statistics, University of Freiburg Faculty of Medicine, Freiburg, Baden-Württemberg, Germany
| | - Patric Tippmann
- Institute of Medical Biometry and Statistics, Medical Center-University of Freiburg, Freiburg, Baden-Württemberg, Germany
| | - Jeffery Klann
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Robert W Follett
- Department of Medicine, David Geffen School of Medicine, Los Angeles, California, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Gilbert S Omenn
- Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Kansas, USA
| | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Emily R Schriver
- Data Analytics Center, University of Pennsylvania Health System, Philadelphia, Pennsylvania, USA
| | | | - Ramakanth Kavuluru
- Institute for Biomedical Informatics, University of Kentucky, Lexington, Kentucky, USA
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, University of Freiburg Faculty of Medicine, Freiburg, Baden-Württemberg, Germany
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, University of Freiburg Faculty of Medicine, Freiburg, Baden-Württemberg, Germany
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Byorn W L Tan
- Department of Medicine, National University Hospital, Singapore
| | - Kee Yuan Ngiam
- Department of Surgery, National University Hospital, Singapore
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, USA
| | | | - Miguel Pedrera-Jiménez
- Health Informatics, Hospital Universitario 12 de Octubre, Madrid, Comunidad de Madrid, Spain
| | - Noelia García-Barrio
- Health Informatics, Hospital Universitario 12 de Octubre, Madrid, Comunidad de Madrid, Spain
| | - Pablo Serrano-Balazote
- Health Informatics, Hospital Universitario 12 de Octubre, Madrid, Comunidad de Madrid, Spain
| | - Isaac Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Andrew South
- Department of Pediatrics, Section of Nephrology, Wake Forest University, Winston Salem, North Carolina, USA
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - T Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Bopaiah J, Garimella K, Kavuluru R. Opinions on Homeopathy for COVID-19 on Twitter. Proc ACM Web Sci Conf 2022; 2022:359-363. [PMID: 36112977 PMCID: PMC9472594 DOI: 10.1145/3501247.3531575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Homeopathy is a medical system originating in Germany more than 200 years ago. Based on prior investigations, mainstream health agencies and medical research communities indicate that there is little evidence that homeopathy can be an effective treatment for any specific health condition. However, it continues to be practiced as a popular form of alternative medicine in many countries, even during the ongoing COVID-19 pandemic. In this paper, we mine opinions on homeopathy for COVID-19 expressed in Twitter data. Our experiments are conducted with a dataset of nearly 60K tweets collected during a seven month period ending in July 2020. We first built text classifiers (linear and neural models) to mine opinions on homeopathy (positive, negative, neutral) from tweets using a dataset of 2400 hand-labeled tweets obtaining an average macro F-score of 81.5% for the positive and negative classes. We applied this model to identify opinions from the full dataset. Our results show that the number of unique positive tweets is twice that of the number of unique negative tweets; but when including retweets, there are 23% more negative tweets overall indicating that negative tweets are getting more retweets and better traction on Twitter. Using a word shift graph analysis on the Twitter bios of authors of positive and negative tweets, we observe that opinions on homeopathy appear to be correlated with political/religious ideologies of the authors (e.g., liberal vs nationalist, atheist vs Hindu). To our knowledge, this is the first study to analyze public opinions on homeopathy on any social media platform. Our results surface a tricky landscape for public health agencies as they promote evidence-based therapies and preventative measures for COVID-19.
Collapse
|
14
|
Phuong J, Riches NO, Madlock‐Brown C, Duran D, Calzoni L, Espinoza JC, Datta G, Kavuluru R, Weiskopf NG, Ward‐Caviness CK, Lin AY. Social Determinants of Health Factors for Gene-Environment COVID-19 Research: Challenges and Opportunities. Adv Genet (Hoboken) 2022; 3:2100056. [PMID: 35574521 PMCID: PMC9087427 DOI: 10.1002/ggn2.202100056] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Indexed: 01/25/2023]
Abstract
The characteristics of a person's health status are often guided by how they live, grow, learn, their genetics, as well as their access to health care. Yet, all too often, studies examining the relationship between social determinants of health (behavioral, sociocultural, and physical environmental factors), the role of demographics, and health outcomes poorly represent these relationships, leading to misinterpretations, limited study reproducibility, and datasets with limited representativeness and secondary research use capacity. This is a profound hurdle in what questions can or cannot be rigorously studied about COVID-19. In practice, gene-environment interactions studies have paved the way for including these factors into research. Similarly, our understanding of social determinants of health continues to expand with diverse data collection modalities as health systems, patients, and community health engagement aim to fill the knowledge gaps toward promoting health and wellness. Here, a conceptual framework is proposed, adapted from the population health framework, socioecological model, and causal modeling in gene-environment interaction studies to integrate the core constructs from each domain with practical considerations needed for multidisciplinary science.
Collapse
Affiliation(s)
- Jimmy Phuong
- Division of Biomedical and Health InformaticsUniversity of WashingtonSeattleWA98195USA
- Harborview Injury Prevention Research CenterUniversity of WashingtonSeattleWA98104USA
| | - Naomi O. Riches
- Department of Biomedical InformaticsUniversity of Utah School of MedicineSalt Lake CityUT84108‐3514USA
| | - Charisse Madlock‐Brown
- Health Informatics and Information ManagementUniversity of Tennessee Health Science CenterMemphisTN38163USA
| | - Deborah Duran
- National Institute on Minority Health and Health Disparities (NIMHD)National Institutes of HealthBethesdaMD20892‐5465USA
| | - Luca Calzoni
- National Institute on Minority Health and Health Disparities (NIMHD)National Institutes of HealthBethesdaMD20892‐5465USA
- Department of Biomedical InformaticsUniversity of PittsburghPittsburghPA15206USA
| | - Juan C. Espinoza
- Department of PediatricsChildren's Hospital Los AngelesLos AngelesCA90015USA
| | - Gora Datta
- Department of Civil and Environmental EngineeringUniversity of California at BerkeleyBerkeleyCA94720USA
| | - Ramakanth Kavuluru
- Division of Biomedical InformaticsDepartment of Internal MedicineUniversity of KentuckyLexingtonKY40506USA
| | - Nicole G. Weiskopf
- Department of Medical Informatics & Clinical EpidemiologyOregon Health & Science UniversityPortlandOR97239USA
| | - Cavin K. Ward‐Caviness
- Center for Public Health and Environmental AssessmentUS Environmental Protection AgencyChapel HillNC27514USA
| | - Asiyah Yu Lin
- National Human Genome Research Institute (NHGRI)National Institutes of HealthBethesdaMD20892‐2152USA
| |
Collapse
|
15
|
Song Q, Bates B, Shao YR, Hsu FC, Liu F, Madhira V, Mitra AK, Bergquist T, Kavuluru R, Li X, Sharafeldin N, Su J, Topaloglu U. Risk and Outcome of Breakthrough COVID-19 Infections in Vaccinated Patients With Cancer: Real-World Evidence From the National COVID Cohort Collaborative. J Clin Oncol 2022; 40:1414-1427. [PMID: 35286152 PMCID: PMC9061155 DOI: 10.1200/jco.21.02419] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/07/2022] [Accepted: 02/18/2022] [Indexed: 12/15/2022] Open
Abstract
PURPOSE To provide real-world evidence on risks and outcomes of breakthrough COVID-19 infections in vaccinated patients with cancer using the largest national cohort of COVID-19 cases and controls. METHODS We used the National COVID Cohort Collaborative (N3C) to identify breakthrough infections between December 1, 2020, and May 31, 2021. We included patients partially or fully vaccinated with mRNA COVID-19 vaccines with no prior SARS-CoV-2 infection record. Risks for breakthrough infection and severe outcomes were analyzed using logistic regression. RESULTS A total of 6,860 breakthrough cases were identified within the N3C-vaccinated population, among whom 1,460 (21.3%) were patients with cancer. Solid tumors and hematologic malignancies had significantly higher risks for breakthrough infection (odds ratios [ORs] = 1.12, 95% CI, 1.01 to 1.23 and 4.64, 95% CI, 3.98 to 5.38) and severe outcomes (ORs = 1.33, 95% CI, 1.09 to 1.62 and 1.45, 95% CI, 1.08 to 1.95) compared with noncancer patients, adjusting for age, sex, race/ethnicity, smoking status, vaccine type, and vaccination date. Compared with solid tumors, hematologic malignancies were at increased risk for breakthrough infections (adjusted OR ranged from 2.07 for lymphoma to 7.25 for lymphoid leukemia). Breakthrough risk was reduced after the second vaccine dose for all cancers (OR = 0.04; 95% CI, 0.04 to 0.05), and for Moderna's mRNA-1273 compared with Pfizer's BNT162b2 vaccine (OR = 0.66; 95% CI, 0.62 to 0.70), particularly in patients with multiple myeloma (OR = 0.35; 95% CI, 0.15 to 0.72). Medications with major immunosuppressive effects and bone marrow transplantation were strongly associated with breakthrough risk among the vaccinated population. CONCLUSION Real-world evidence shows that patients with cancer, especially hematologic malignancies, are at higher risk for developing breakthrough infections and severe outcomes. Patients with vaccination were at markedly decreased risk for breakthrough infections. Further work is needed to assess boosters and new SARS-CoV-2 variants.
Collapse
Affiliation(s)
| | | | | | - Fang-Chi Hsu
- Wake Forest School of Medicine, Winston-Salem, NC
| | - Feifan Liu
- University of Massachusetts Chan Medical School, Boston, MA
| | | | | | | | | | - Xiaochun Li
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN
| | - Noha Sharafeldin
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN
| | | |
Collapse
|
16
|
Fouladvand S, Talbert J, Dwoskin LP, Bush H, Meadows AL, Peterson LE, Roggenkamp SK, Kavuluru R, Chen J. Identifying Opioid Use Disorder from Longitudinal Healthcare Data using a Multi-stream Transformer. AMIA Annu Symp Proc 2022; 2021:476-485. [PMID: 35308960 PMCID: PMC8861731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Opioid Use Disorder (OUD) is a public health crisis costing the US billions of dollars annually in healthcare, lost workplace productivity, and crime. Analyzing longitudinal healthcare data is critical in addressing many real-world problems in healthcare. Leveraging the real-world longitudinal healthcare data, we propose a novel multi-stream transformer model called MUPOD for OUD identification. MUPOD is designed to simultaneously analyze multiple types of healthcare data streams, such as medications and diagnoses, by attending to segments within and across these data streams. Our model tested on the data from 392,492 patients with long-term back pain problems showed significantly better performance than the traditional models and recently developed deep learning models.
Collapse
Affiliation(s)
| | - Jeffery Talbert
- Institute for Biomedical Informatics
- Department of Internal Medicine
| | | | | | | | - Lars E Peterson
- Department of Family and Community Medicine, University of Kentucky, Lexington, KY, USA
- American Board of Family Medicine, Lexington, KY, USA
| | | | - Ramakanth Kavuluru
- Institute for Biomedical Informatics
- Department of Computer Science
- Department of Internal Medicine
| | - Jin Chen
- Institute for Biomedical Informatics
- Department of Computer Science
- Department of Internal Medicine
| |
Collapse
|
17
|
Abstract
The fat acceptance (FA) movement aims to counteract weight stigma and discrimination against individuals who are overweight/obese. We developed a supervised neural network model to classify sentiment toward the FA movement in tweets and identify links between FA sentiment and various Twitter user characteristics. We collected any tweet containing either "fat acceptance" or "#fatacceptance" from 2010-2019 and obtained 48,974 unique tweets. We independently labeled 2000 of them and implemented/trained an Average stochastic gradient descent Weight-Dropped Long Short-Term Memory (AWD-LSTM) neural network that incorporates transfer learning from language modeling to automatically identify each tweet's stance toward the FA movement. Our model achieved nearly 80% average precision and recall in classifying "supporting" and "opposing" tweets. Applying this model to the complete dataset, we observed that the majority of tweets at the beginning of the last decade supported FA, but sentiment trended downward until 2016, when support was at its lowest. Overall, public sentiment is negative across Twitter. Users who tweet more about FA or use FA-related hashtags are more supportive than general users. Our findings reveal both challenges to and strengths of the modern FA movement, with implications for those who wish to reduce societal weight stigma.
Collapse
Affiliation(s)
- Sadie Bograd
- 326741Paul Laurence Dunbar High School, Lexington, KY USA
| | - Benjamin Chen
- 326741Paul Laurence Dunbar High School, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Department of Internal Medicine, 4530University of Kentucky, Lexington, KY, USA
| |
Collapse
|
18
|
Soleymanpour M, Saderholm S, Kavuluru R. Therapeutic Claims in Cannabidiol (CBD) Marketing Messages on Twitter. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2021; 2021:3083-3088. [PMID: 35096472 DOI: 10.1109/bibm52615.2021.9669404] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Although the U.S. FDA has only approved exactly one cannabidiol (CBD) drug product (specifically to treat seizures), CBD products are proliferating rapidly through different modes of usage including food products, cosmetics, vaping pods, and supplements (typically, oils). Despite the FDA clearly warning consumers about unproven health claims made by manufacturers selling CBD products over the counter, the CBD market share was nearly 3 billion USD in 2020 and is expected to top 55 billion USD in 2028. In this context, it is important to assess the presence of health claims being made on social media, especially claims that are part of marketing messages. To this end, we collected over two million English tweets discussing CBD themes. We created a hand-labeled dataset and built machine learned classifiers to identify marketing tweets from regular tweets that may be generated by consumers. The best classifier achieved 85% precision, 83% recall, and 84% F-score. Our analyses showed that pain, anxiety disorders, sleep disorders, and stress are the four main therapeutic claims made constituting 31.67%, 27.11%, 13.77%, and 10.37% of all medical claims made on Twitter, respectively. Also, more than 93% of advertised CBD products are edibles or oil/tinctures. Our effort is the first to demonstrate the feasibility of surveillance of marketing claims for CBD products. We believe this could pave way for more explorations into this indispensable task in the current landscape of social media driven health (mis)information and communication.
Collapse
Affiliation(s)
| | - Sofia Saderholm
- Electrical and Computer Engineering, University Of Kentucky, Lexington, USA
| | | |
Collapse
|
19
|
Deer RR, Rock MA, Vasilevsky N, Carmody L, Rando H, Anzalone AJ, Basson MD, Bennett TD, Bergquist T, Boudreau EA, Bramante CT, Byrd JB, Callahan TJ, Chan LE, Chu H, Chute CG, Coleman BD, Davis HE, Gagnier J, Greene CS, Hillegass WB, Kavuluru R, Kimble WD, Koraishy FM, Köhler S, Liang C, Liu F, Liu H, Madhira V, Madlock-Brown CR, Matentzoglu N, Mazzotti DR, McMurry JA, McNair DS, Moffitt RA, Monteith TS, Parker AM, Perry MA, Pfaff E, Reese JT, Saltz J, Schuff RA, Solomonides AE, Solway J, Spratt H, Stein GS, Sule AA, Topaloglu U, Vavougios GD, Wang L, Haendel MA, Robinson PN. Characterizing Long COVID: Deep Phenotype of a Complex Condition. EBioMedicine 2021; 74:103722. [PMID: 34839263 PMCID: PMC8613500 DOI: 10.1016/j.ebiom.2021.103722] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/22/2021] [Accepted: 11/15/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. METHODS The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. FUNDING We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. INTERPRETATION Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. FUNDING U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.
Collapse
Affiliation(s)
- Rachel R Deer
- University of Texas Medical Branch, Galveston, TX, USA.
| | | | - Nicole Vasilevsky
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Leigh Carmody
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Halie Rando
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alfred J Anzalone
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Marc D Basson
- Department of Surgery, University of North Dakota School of Medicine and Health Sciences
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Eilis A Boudreau
- Department of Neurology; Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239
| | - Carolyn T Bramante
- Departments of Internal Medicine and Pediatrics, University of Minnesota Medical School, Minneapolis, MN 55455
| | - James Brian Byrd
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Tiffany J Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lauren E Chan
- Monarch Initiative; College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - Haitao Chu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN USA
| | - Christopher G Chute
- Johns Hopkins University, Schools of Medicine, Public Health, and Nursing, Baltimore, MD, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | | | - Joel Gagnier
- Departments of Orthopaedic Surgery & Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Casey S Greene
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - William B Hillegass
- University of Mississippi Medical Center, University of Mississippi Medical Center, Jackson, MS, USA; Departments of Data Science and Medicine
| | | | - Wesley D Kimble
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, WV, USA
| | | | | | - Chen Liang
- Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | | | - Charisse R Madlock-Brown
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 920 Madison Ave. Suite 518N, Memphis TN 38613
| | - Nicolas Matentzoglu
- Monarch Initiative; Semanticly Ltd; European Bioinformatics Institute (EMBL-EBI)
| | - Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Douglas S McNair
- Quantitative Sciences, Global Health Div., Gates Foundation, Seattle, WA 98109, USA
| | | | | | - Ann M Parker
- Pulmonary and Critical Care Medicine, Johns Hopkins University, Schools of Medicine, Baltimore, MD, USA
| | - Mallory A Perry
- Children's Hospital of Philadelphia Research Institute, Philadelphia, PA, USA
| | | | - Justin T Reese
- Monarch Initiative; Lawrence Berkeley National Laboratory
| | - Joel Saltz
- Stony Brook University; Biomedical Informatics
| | | | - Anthony E Solomonides
- Outcomes Research Network, Research Institute, NorthShore University HealthSystem, Evanston, IL 60201, USA; Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Julian Solway
- Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, TX, USA
| | - Gary S Stein
- University of Vermont Larner College of Medicine, Departments of Biochemistry and Surgery, Burlington, Vermont 05405
| | | | | | - George D Vavougios
- Department of Computer Science and Telecommunications, University of Thessaly, Papasiopoulou 2 - 4, P.C.; 131 - Galaneika, Lamia, Greece; Department of Neurology, Athens Naval Hospital 70 Deinokratous Street, P.C. 115 21 Athens, Greece; Department of Respiratory Medicine, Faculty of Medicine, University of Thessaly, Biopolis, P.C. 41500 Larissa, Greece
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative.
| | - Peter N Robinson
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA.
| |
Collapse
|
20
|
Weber GM, Zhang HG, L'Yi S, Bonzel CL, Hong C, Avillach P, Gutiérrez-Sacristán A, Palmer NP, Tan ALM, Wang X, Yuan W, Gehlenborg N, Alloni A, Amendola DF, Bellasi A, Bellazzi R, Beraghi M, Bucalo M, Chiovato L, Cho K, Dagliati A, Estiri H, Follett RW, García Barrio N, Hanauer DA, Henderson DW, Ho YL, Holmes JH, Hutch MR, Kavuluru R, Kirchoff K, Klann JG, Krishnamurthy AK, Le TT, Liu M, Loh NHW, Lozano-Zahonero S, Luo Y, Maidlow S, Makoudjou A, Malovini A, Martins MR, Moal B, Morris M, Mowery DL, Murphy SN, Neuraz A, Ngiam KY, Okoshi MP, Omenn GS, Patel LP, Pedrera Jiménez M, Prudente RA, Samayamuthu MJ, Sanz Vidorreta FJ, Schriver ER, Schubert P, Serrano Balazote P, Tan BW, Tanni SE, Tibollo V, Visweswaran S, Wagholikar KB, Xia Z, Zöller D, Kohane IS, Cai T, South AM, Brat GA. Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res 2021; 23:e34625. [PMID: 34889759 PMCID: PMC8672293 DOI: 10.2196/34625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 11/10/2021] [Indexed: 11/15/2022] Open
Affiliation(s)
- Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | | | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Amelia Li Min Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Anna Alloni
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Danilo F Amendola
- Clinical Research Unit, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Antonio Bellasi
- Division of Nephrology, Department of Medicine, Ente Ospedaliero Cantonale, Lugano, Switzerland
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Michele Beraghi
- Information Technology Department, Azienda Socio-Sanitaria Territoriale di Pavia, Pavia, Italy
| | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Robert W Follett
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | | | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Darren W Henderson
- Department of Biomedical Informatics, University of Kentucky, Lexington, KY, United States
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Meghan R Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Ramakanth Kavuluru
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, United States
| | - Katie Kirchoff
- Medical University of South Carolina, Charleston, SC, United States
| | - Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Ashok K Krishnamurthy
- Department of Computer Science, Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Trang T Le
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Molei Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Ne Hooi Will Loh
- Department of Anaesthesia, National University Health System, Singapore, Singapore
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Sarah Maidlow
- Michigan Institute for Clinical & Health Research Informatics, University of Michigan, Ann Arbor, MI, United States
| | - Adeline Makoudjou
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | | | - Bertrand Moal
- Informatique et archivistique médicales unit, Bordeaux University Hospital, Bordeaux, France
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Antoine Neuraz
- Department of Biomedical Informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris, University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical Informatics, Institute for Digital Medicine, National University Health System, Singapore, Singapore
| | - Marina P Okoshi
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Lav P Patel
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | | | - Robson A Prudente
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | | | - Fernando J Sanz Vidorreta
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Emily R Schriver
- Data Analytics Center, University of Pennsylvania Health System, Philadelphia, PA, United States
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | | | - Byorn Wl Tan
- Department of Medicine, National University Health System, Singapore, Singapore
| | - Suzana E Tanni
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | -
- see Authors' Contributions,
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, NC, United States
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
21
|
Weidner K, Lowman J, Fleischer A, Kosik K, Goodbread P, Chen B, Kavuluru R. Twitter, Telepractice, and the COVID-19 Pandemic: A Social Media Content Analysis. Am J Speech Lang Pathol 2021; 30:2561-2571. [PMID: 34499843 PMCID: PMC9132031 DOI: 10.1044/2021_ajslp-21-00034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 04/04/2021] [Accepted: 06/21/2021] [Indexed: 05/31/2023]
Abstract
Purpose Telepractice was extensively utilized during the COVID-19 pandemic. Little is known about issues experienced during the wide-scale rollout of a service delivery model that was novel to many. Social media research is a way to unobtrusively analyze public communication, including during a health crisis. We investigated the characteristics of tweets about telepractice through the lens of an established health technology implementation framework. Results can help guide efforts to support and sustain telehealth beyond the pandemic context. Method We retrieved a historical Twitter data set containing tweets about telepractice from the early months of the pandemic. Tweets were analyzed using a concurrent mixed-methods content analysis design informed by the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) framework. Results Approximately 2,200 Twitter posts were retrieved, and 820 original tweets were analyzed qualitatively. Volume of tweets about telepractice increased in the early months of the pandemic. The largest group of Twitter users tweeting about telepractice was a group of clinical professionals. Tweet content reflected many, but not all, domains of the NASSS framework. Conclusions Twitter posting about telepractice increased during the pandemic. Although many tweets represented topics expected in technology implementation, some represented phenomena were potentially unique to speech-language pathology. Certain technology implementation topics, notably sustainability, were not found in the data. Implications for future telepractice implementation and further research are discussed.
Collapse
|
22
|
Weber GM, Zhang HG, L'Yi S, Bonzel CL, Hong C, Avillach P, Gutiérrez-Sacristán A, Palmer NP, Tan ALM, Wang X, Yuan W, Gehlenborg N, Alloni A, Amendola DF, Bellasi A, Bellazzi R, Beraghi M, Bucalo M, Chiovato L, Cho K, Dagliati A, Estiri H, Follett RW, García Barrio N, Hanauer DA, Henderson DW, Ho YL, Holmes JH, Hutch MR, Kavuluru R, Kirchoff K, Klann JG, Krishnamurthy AK, Le TT, Liu M, Loh NHW, Lozano-Zahonero S, Luo Y, Maidlow S, Makoudjou A, Malovini A, Martins MR, Moal B, Morris M, Mowery DL, Murphy SN, Neuraz A, Ngiam KY, Okoshi MP, Omenn GS, Patel LP, Pedrera Jiménez M, Prudente RA, Samayamuthu MJ, Sanz Vidorreta FJ, Schriver ER, Schubert P, Serrano Balazote P, Tan BW, Tanni SE, Tibollo V, Visweswaran S, Wagholikar KB, Xia Z, Zöller D, Kohane IS, Cai T, South AM, Brat GA. International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res 2021; 23:e31400. [PMID: 34533459 PMCID: PMC8510151 DOI: 10.2196/31400] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/02/2021] [Accepted: 09/02/2021] [Indexed: 02/06/2023] Open
Abstract
Background Many countries have experienced 2 predominant waves of COVID-19–related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. Objective In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. Methods Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. Results Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. Conclusions Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.
Collapse
Affiliation(s)
- Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Harrison G Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Sehi L'Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | | | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Amelia Li Min Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - William Yuan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Anna Alloni
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Danilo F Amendola
- Clinical Research Unit, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Antonio Bellasi
- Division of Nephrology, Department of Medicine, Ente Ospedaliero Cantonale, Lugano, Switzerland
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Michele Beraghi
- Information Technology Department, Azienda Socio-Sanitaria Territoriale di Pavia, Pavia, Italy
| | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Luca Chiovato
- Unit of Internal Medicine and Endocrinology, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Kelly Cho
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | - Arianna Dagliati
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Hossein Estiri
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Robert W Follett
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | | | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Darren W Henderson
- Department of Biomedical Informatics, University of Kentucky, Lexington, KY, United States
| | - Yuk-Lam Ho
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Meghan R Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Ramakanth Kavuluru
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, United States
| | - Katie Kirchoff
- Medical University of South Carolina, Charleston, SC, United States
| | - Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Ashok K Krishnamurthy
- Department of Computer Science, Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Trang T Le
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Molei Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Ne Hooi Will Loh
- Department of Anaesthesia, National University Health System, Singapore, Singapore
| | - Sara Lozano-Zahonero
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Sarah Maidlow
- Michigan Institute for Clinical & Health Research Informatics, University of Michigan, Ann Arbor, MI, United States
| | - Adeline Makoudjou
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | | | - Bertrand Moal
- Informatique et archivistique médicales unit, Bordeaux University Hospital, Bordeaux, France
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Antoine Neuraz
- Department of Biomedical Informatics, Hôpital Necker-Enfants Malade, Assistance Publique Hôpitaux de Paris, University of Paris, Paris, France
| | - Kee Yuan Ngiam
- Department of Biomedical Informatics, Institute for Digital Medicine, National University Health System, Singapore, Singapore
| | - Marina P Okoshi
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Lav P Patel
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | | | - Robson A Prudente
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | | | - Fernando J Sanz Vidorreta
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Emily R Schriver
- Data Analytics Center, University of Pennsylvania Health System, Philadelphia, PA, United States
| | - Petra Schubert
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, United States
| | | | - Byorn Wl Tan
- Department of Medicine, National University Health System, Singapore, Singapore
| | - Suzana E Tanni
- Internal Medicine Department, Botucatu Medical School, São Paulo State University, Botucatu, Brazil
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri SpA SB IRCCS, Pavia, Italy
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Daniela Zöller
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | -
- see Authors' Contributions,
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, NC, United States
| | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
23
|
Kavuluru R, Noh J, Rose SW. Twitter discourse on nicotine as potential prophylactic or therapeutic for COVID-19. Int J Drug Policy 2021; 99:103470. [PMID: 34607223 PMCID: PMC8450069 DOI: 10.1016/j.drugpo.2021.103470] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 09/04/2021] [Accepted: 09/10/2021] [Indexed: 12/20/2022]
Abstract
Background An unproven “nicotine hypothesis” that indicates nicotine's therapeutic potential for COVID-19 has been proposed in recent literature. This study is about Twitter posts that misinterpret this hypothesis to make baseless claims about benefits of smoking and vaping in the context of COVID-19. We quantify the presence of such misinformation and characterize the tweeters who post such messages. Methods Twitter premium API was used to download tweets (n = 17,533) that match terms indicating (a) nicotine or vaping themes, (b) a prophylactic or therapeutic effect, and (c) COVID-19 (January-July 2020) as a conjunctive query. A constraint on the length of the span of text containing the terms in the tweets allowed us to focus on those that convey the therapeutic intent. We hand-annotated these filtered tweets and built a classifier that identifies tweets that extrapolate the nicotine hypothesis to smoking/vaping with a positive predictive value of 85%. We analyzed the frequently used terms in author bios, top Web links, and hashtags of such tweets. Results 21% of our filtered COVID-19 tweets indicate a vaping or smoking-based prevention/treatment narrative. Qualitative analyses show a variety of ways therapeutic claims are being made and tweeter bios reveal pre-existing notions of positive stances toward vaping. Conclusion The social media landscape is a double-edged sword in tobacco communication. Although it increases information reach, consumers can also be subject to confirmation bias when exposed to inadvertent or deliberate framing of scientific discourse that may border on misinformation. This calls for circumspection and additional planning in countering such narratives as the COVID-19 pandemic continues to ravage our world. Our results also serve as a cautionary tale in how social media can be leveraged to spread misleading information about tobacco products in the wake of pandemics.
Collapse
Affiliation(s)
- Ramakanth Kavuluru
- Associate Professor, Division of Biomedical Informatics, Internal Medicine, 230E MDS Bldg, 725 Rose St, Lexington, KY, 40506, USA.
| | - Jiho Noh
- doctoral student, Computer Science Department, Lexington, KY, USA
| | - Shyanika W Rose
- Assistant Professor, Center for Health Equity Transformation and Department of Behavioral Science, College of Medicine, Lexington, KY, USA
| |
Collapse
|
24
|
Abstract
Background: An unproven “nicotine hypothesis” that indicates nicotine’s therapeutic potential for COVID-19 has been proposed in recent literature. This study is about Twitter posts that misinterpret this hypothesis to make baseless claims about benefits of smoking and vaping in the context of COVID-19. We quantify the presence of such misinformation and characterize the tweeters who post such messages. Methods: Twitter premium API was used to download tweets (n = 17,533) that match terms indicating (a) nicotine or vaping themes, (b) a prophylactic or therapeutic effect, and (c) COVID-19 (January-July 2020) as a conjunctive query. A constraint on the length of the span of text containing the terms in the tweets allowed us to focus on those that convey the therapeutic intent. We hand-annotated these filtered tweets and built a classifier that identifies tweets that extrapolate the nicotine hypothesis to smoking/vaping with a positive predictive value of 85%. We analyzed the frequently used terms in author bios, top Web links, and hashtags of such tweets. Results: 21% of our filtered COVID-19 tweets indicate a vaping or smoking-based prevention/treatment narrative. Qualitative analyses show a variety of ways therapeutic claims are being made and tweeter bios reveal pre-existing notions of positive stances toward vaping. Conclusion: The social media landscape is a double-edged sword in tobacco communication. Although it increases information reach, consumers can also be subject to confirmation bias when exposed to inadvertent or deliberate framing of scientific discourse that may border on misinformation. This calls for circumspection and additional planning in countering such narratives as the COVID-19 pandemic continues to ravage our world. Our results also serve as a cautionary tale in how social media can be leveraged to spread misleading information about tobacco products in the wake of pandemics.
Collapse
Affiliation(s)
- Ramakanth Kavuluru
- Division of Biomedical Informatics, Internal Medicine, 230E MDS Bldg, 725 Rose St, Lexington KY 40506
| | - Jiho Noh
- Computer Science Department, Lexington, KY
| | - Shyanika W Rose
- Center for Health Equity Transformation and Department of Behavioral Science, College of Medicine, Lexington, KY
| |
Collapse
|
25
|
Noh J, Kavuluru R. Joint Learning for Biomedical NER and Entity Normalization: Encoding Schemes, Counterfactual Examples, and Zero-Shot Evaluation. ACM BCB 2021; 2021. [PMID: 34505115 DOI: 10.1145/3459930.3469533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Named entity recognition (NER) and normalization (EN) form an indispensable first step to many biomedical natural language processing applications. In biomedical information science, recognizing entities (e.g., genes, diseases, or drugs) and normalizing them to concepts in standard terminologies or thesauri (e.g., Entrez, ICD-10, or RxNorm) is crucial for identifying more informative relations among them that drive disease etiology, progression, and treatment. In this effort we pursue two high level strategies to improve biomedical ER and EN. The first is to decouple standard entity encoding tags (e.g., "B-Drug" for the beginning of a drug) into type tags (e.g., "Drug") and positional tags (e.g., "B"). A second strategy is to use additional counterfactual training examples to handle the issue of models learning spurious correlations between surrounding context and normalized concepts in training data. We conduct elaborate experiments using the MedMentions dataset, the largest dataset of its kind for ER and EN in biomedicine. We find that our first strategy performs better in entity normalization when compared with the standard coding scheme. The second data augmentation strategy uniformly improves performance in span detection, typing, and normalization. The gains from counterfactual examples are more prominent when evaluating in zero-shot settings, for concepts that have never been encountered during training.
Collapse
Affiliation(s)
- Jiho Noh
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics (Internal Medicine), University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
26
|
Liang G, Greenwell C, Zhang Y, Xing X, Wang X, Kavuluru R, Jacobs N. Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging. IEEE J Biomed Health Inform 2021; 26:1640-1649. [PMID: 34495856 PMCID: PMC9242687 DOI: 10.1109/jbhi.2021.3110805] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A key challenge in training neural networks for a given medical imaging task is often the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports, which are often readily available in medical records, contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. This approach can be applied to any task for which text-image pairs are readily available. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%--98%.
Collapse
|
27
|
Rios A, Durbin EB, Hands I, Kavuluru R. Assigning ICD-O-3 Codes to Pathology Reports using Neural Multi-Task Training with Hierarchical Regularization. ACM BCB 2021; 2021:32. [PMID: 34541582 PMCID: PMC8445227 DOI: 10.1145/3459930.3469541] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Tracking population-level cancer information is essential for researchers, clinicians, policymakers, and the public. Unfortunately, much of the information is stored as unstructured data in pathology reports. Thus, too process the information, we require either automated extraction techniques or manual curation. Moreover, many of the cancer-related concepts appear infrequently in real-world training datasets. Automated extraction is difficult because of the limited data. This study introduces a novel technique that incorporates structured expert knowledge to improve histology and topography code classification models. Using pathology reports collected from the Kentucky Cancer Registry, we introduce a novel multi-task training approach with hierarchical regularization that incorporates structured information about the International Classification of Diseases for Oncology, 3rd Edition classes to improve predictive performance. Overall, we find that our method improves both micro and macro F1. For macro F1, we achieve up to a 6% absolute improvement for topography codes and up to 4% absolute improvement for histology codes.
Collapse
Affiliation(s)
- Anthony Rios
- Dept. of Information Systems & Cyber Security, Cyber Center for Security & Analytics, University of Texas at San Antonio, San Antonio, Texas, USA
| | - Eric B Durbin
- Division of Biomedical Informatics (Internal Medicine), Kentucky Cancer Registry, University of Kentucky, Lexington, Kentucky, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Lexington, Kentucky, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics (Internal Medicine), University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
28
|
Abstract
BACKGROUND Recent natural language processing (NLP) research is dominated by neural network methods that employ word embeddings as basic building blocks. Pre-training with neural methods that capture local and global distributional properties (e.g., skip-gram, GLoVE) using free text corpora is often used to embed both words and concepts. Pre-trained embeddings are typically leveraged in downstream tasks using various neural architectures that are designed to optimize task-specific objectives that might further tune such embeddings. OBJECTIVE Despite advances in contextualized language model based embeddings, static word embeddings still form an essential starting point in BioNLP research and applications. They are useful in low resource settings and in lexical semantics studies. Our main goal is to build improved biomedical word embeddings and make them publicly available for downstream applications. METHODS We jointly learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information manifesting in co-occurring Medical Subject Heading (MeSH) concepts in biomedical citations. This fine-tuning is accomplished with the transformer-based BERT architecture in the two-sentence input mode with a classification objective that captures MeSH pair co-occurrence. We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts. RESULTS Both in qualitative and quantitative evaluations we demonstrate that our methods produce improved biomedical embeddings in comparison with other static embedding efforts. Without selectively culling concepts and terms (as was pursued by previous efforts), we believe we offer the most exhaustive evaluation of biomedical embeddings to date with clear performance improvements across the board. CONCLUSION We repurposed a transformer architecture (typically used to generate dynamic embeddings) to improve static biomedical word embeddings using concept correlations. We provide our code and embeddings for public use for downstream applications and research endeavors: https://github.com/bionlproc/BERT-CRel-Embeddings.
Collapse
Affiliation(s)
- Jiho Noh
- Department of Computer Science, University of Kentucky, United States of America.
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, United States of America; Department of Computer Science, University of Kentucky, United States of America.
| |
Collapse
|
29
|
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, Chute CG. Clinical Characterization and Prediction of Clinical Severity of SARS-CoV-2 Infection Among US Adults Using Data From the US National COVID Cohort Collaborative. JAMA Netw Open 2021; 4:e2116901. [PMID: 34255046 PMCID: PMC8278272 DOI: 10.1001/jamanetworkopen.2021.16901] [Citation(s) in RCA: 153] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/03/2021] [Indexed: 12/15/2022] Open
Abstract
Importance The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy. Objectives To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. Design, Setting, and Participants In a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). Main Outcomes and Measures Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. Results The cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity. Conclusions and Relevance This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.
Collapse
Affiliation(s)
- Tellen D. Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Richard A. Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | | | | | - Adit Anand
- Stony Brook University, Stony Brook, New York
| | | | | | | | - James Brian Byrd
- Department of Internal Medicine, The University of Michigan at Ann Arbor, Ann Arbor
| | - Alina Denham
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York
| | - Peter E. DeWitt
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Davera Gabriel
- Institute for Clinical and Translational Research, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Brian T. Garibaldi
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | | | - Elaine L. Hill
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York
| | - Stephanie S. Hong
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington
| | - Kristin Kostka
- Real World Solutions, IQVIA, Cambridge, Massachusetts
- Observational Health Data Sciences and Informatics, New York, New York
| | - Harold P. Lehmann
- Division of Health Science Informatics, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Eli Levitt
- Department of Orthopaedic Surgery, University of Alabama at Birmingham, Birmingham
| | | | | | - Julie A. McMurry
- Translational and Integrative Sciences Center, Oregon State University, Corvallis
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - John Muschelli
- Department of Biostatistics, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Andrew J. Neumann
- Translational and Integrative Sciences Center, Oregon State University, Corvallis
| | | | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill
| | - Zhenglong Qian
- Department of biomedical informatics, Stony Brook University, Stony Brook, New York
| | | | - Seth Russell
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora
| | - Heidi Spratt
- Department of Preventive Medicine and Public Health, University of Texas Medical Branch, Galveston
| | - Anita Walden
- Sage Bionetworks, Seattle, Washington
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland
| | - Andrew E. Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts
| | | | - Yun Jae Yoo
- Stony Brook University, Stony Brook, New York
| | - Xiaohan Tanner Zhang
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Richard L. Zhu
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Christopher P. Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York
| | - Ken R. Gersing
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland
| | - Melissa A. Haendel
- TriNetX, Cambridge, Massachusetts
- Center for Health AI, University of Colorado, Aurora
| | - Christopher G. Chute
- Department of Health Policy and Management, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Nursing, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
30
|
Tran T, Kavuluru R, Kilicoglu H. Attention-Gated Graph Convolutions for Extracting Drug Interaction Information from Drug Labels. ACM Trans Comput Healthc 2021; 2:10. [PMID: 34541578 PMCID: PMC8445229 DOI: 10.1145/3423209] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 09/01/2020] [Indexed: 01/02/2023]
Abstract
Preventable adverse events as a result of medical errors present a growing concern in the healthcare system. As drug-drug interactions (DDIs) may lead to preventable adverse events, being able to extract DDIs from drug labels into a machine-processable form is an important step toward effective dissemination of drug safety information. Herein, we tackle the problem of jointly extracting mentions of drugs and their interactions, including interaction outcome, from drug labels. Our deep learning approach entails composing various intermediate representations, including graph-based context derived using graph convolutions (GCs) with a novel attention-based gating mechanism (holistically called GCA), which are combined in meaningful ways to predict on all subtasks jointly. Our model is trained and evaluated on the 2018 TAC DDI corpus. Our GCA model in conjunction with transfer learning performs at 39.20% F1 and 26.09% F1 on entity recognition (ER) and relation extraction (RE), respectively, on the first official test set and at 45.30% F1 and 27.87% F1 on ER and RE, respectively, on the second official test set. These updated results lead to improvements over our prior best by up to 6 absolute F1 points. After controlling for available training data, the proposed model exhibits state-of-the-art performance for this task.
Collapse
Affiliation(s)
- Tung Tran
- University of Kentucky, United States
| | | | | |
Collapse
|
31
|
Tran T, Ickes MJ, Hester JW, Kavuluru R. Identifying current Juul users among emerging adults through Twitter feeds. Int J Med Inform 2021; 146:104350. [PMID: 33341556 PMCID: PMC7855996 DOI: 10.1016/j.ijmedinf.2020.104350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 11/12/2020] [Accepted: 11/20/2020] [Indexed: 11/27/2022]
Abstract
INTRODUCTION Juul is the most popular electronic cigarette on the market. Amid concerns around uptake of e-cigarettes by never smokers, can we detect whether someone uses Juul based on their social media activities? This is the central premise of the effort reported in this paper. Several recent social media-related studies on Juul use tend to focus on the characterization of Juul-related messages on social media. In this study, we assess the potential in using machine learning methods to automatically identify Juul users (past 30-day usage) based on their Twitter data. METHODS We obtained a collection of 588 instances, for training and testing, of Juul use patterns (along with associated Twitter handles) via survey responses of college students. With this data, we built and tested supervised machine learning models based on linear and deep learning algorithms with textual, social network (friends and followers), and other hand-crafted features. RESULTS The linear model with textual and follower network features performed best with a precision-recall trade-off such that precision (PPV) is 57 % at 24 % recall (sensitivity). Hence, at least every other college-attending Twitter user flagged by our model is expected to be a Juul user. Additionally, our results indicate that social network features tend to have a large impact (positive) on classification performance. CONCLUSION There are enough latent signals from social feeds for supervised modeling of Juul use, even with limited training data, implying that such models are highly beneficial to very focused intervention campaigns. This initial success indicates potential for more involved automated surveillance of Juul use based on social media data, including Juul usage patterns, nicotine dependence, and risk awareness.
Collapse
Affiliation(s)
- Tung Tran
- Department of Computer Science University of Kentucky, Lexington, USA
| | - Melinda J Ickes
- Department of Kinesiology and Health Promotion University of Kentucky, Lexington, USA
| | - Jakob W Hester
- Department of Kinesiology and Health Promotion University of Kentucky, Lexington, USA
| | - Ramakanth Kavuluru
- Department of Computer Science University of Kentucky, Lexington, USA; Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, USA.
| |
Collapse
|
32
|
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, Chute CG. The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction. medRxiv 2021. [PMID: 33469592 PMCID: PMC7814838 DOI: 10.1101/2021.01.12.21249511] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background: The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. Methods and Findings: In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients. Conclusions: This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease.
Collapse
|
33
|
Noh J, Kavuluru R. Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization. Proc Conf Empir Methods Nat Lang Process 2020; 2020:3389-3399. [PMID: 34541588 PMCID: PMC8444997 DOI: 10.18653/v1/2020.findings-emnlp.304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST's TREC-PM track datasets (2017-2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval.
Collapse
Affiliation(s)
- Jiho Noh
- Department of Computer Science, University of Kentucky, Kentucky, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, University of Kentucky, Kentucky, USA
| |
Collapse
|
34
|
Ickes M, Hester JW, Wiggins AT, Rayens MK, Hahn EJ, Kavuluru R. Prevalence and reasons for Juul use among college students. J Am Coll Health 2020; 68:455-459. [PMID: 30913003 PMCID: PMC6763357 DOI: 10.1080/07448481.2019.1577867] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 12/17/2018] [Accepted: 01/29/2019] [Indexed: 06/01/2023]
Abstract
Objective: Examine Juul use patterns, sociodemographic and personal factors associated with Juul use, and reasons for Juul initiation and current use, among college students. Participants: Convenience sample of 371 undergraduates at a large university in the southeast; recruited April 2018. Methods: Cross-sectional design using an online survey. Logistic regression identified the personal risk factors for current use. Results: Over 80% of participants recognized Juul; 36% reported ever use and 21% past 30-day use. Significant risk factors for current Juul use were: male, White/non-Hispanic, lower undergraduate, and current cigarette smoker. Current Juul users chose ease of use and lack of a bad smell as reasons for use. Ever Juul users most commonly endorsed curiosity and use by friends as reasons for trying Juul. Conclusions: Given the propensity for nicotine addiction among youth and young adults, rates of Juul use are alarming and warrant immediate intervention.
Collapse
Affiliation(s)
- Melinda Ickes
- Kinesiology and Health Promotion, Tobacco Policy, BREATHE, University of Kentucky, Lexington, KY, USA
| | - Jakob W. Hester
- Health Promotion, College of Education, University of Kentucky, Lexington, KY, USA
| | - Amanda T. Wiggins
- Data Management & Outcomes, BREATHE, College of Nursing, University of Kentucky, Lexington, KY, USA
| | - Mary Kay Rayens
- BREATHE, College of Nursing, University of Kentucky, Lexington, KY, USA
| | - Ellen J. Hahn
- BREATHE, College of Nursing, University of Kentucky, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
35
|
Bakal G, Kilicoglu H, Kavuluru R. Non-Negative Matrix Factorization for Drug Repositioning: Experiments with the repoDB Dataset. AMIA Annu Symp Proc 2020; 2019:238-247. [PMID: 32308816 PMCID: PMC7153111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Computational methods for drug repositioning are gaining mainstream attention with the availability of experimental gene expression datasets and manually curated relational information in knowledge bases. When building repurpos-ing tools, a fundamental limitation is the lack of gold standard datasets that contain realistic true negative examples of drug-disease pairs that were shown to be non-indications. To address this gap, the repoDB dataset was created in 2017 as a first of its kind realistic resource to benchmark drug repositioning methods - its positive examples are drawn from FDA approved indications and negatives examples are derivedfrom failed clinical trials. In this paper, we present the first effort for repositioning that directly tests against repoDB instances. By using hand-curated drug-disease indications from the UMLS Metathesaurus and automatically extracted relations from the SemMedDB database, we employ non-negative matrix factorization (NMF) methods to recover repoDB positive indications. Among recoverable approved indications, our NMF methods achieve 96% recall with 80% precision providing further evidence that hand-curated knowledge and matrix completion methods can be exploited for hypothesis generation.
Collapse
|
36
|
Tran T, Kavuluru R. Social media surveillance for perceived therapeutic effects of cannabidiol (CBD) products. Int J Drug Policy 2020; 77:102688. [PMID: 32092666 DOI: 10.1016/j.drugpo.2020.102688] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 01/06/2020] [Accepted: 01/24/2020] [Indexed: 01/03/2023]
Abstract
BACKGROUND CBD products have risen in popularity given CBD's therapeutic potential and lack of legal oversight, despite lacking conclusive scientific evidence for widespread over-the-counter usage for many of its perceived benefits. While medical evidence is being generated, social media surveillance offers a fast and inexpensive alternative to traditional surveys in ascertaining perceived therapeutic purposes and modes of consumption for CBD products. METHODS We collected all comments from the CBD subreddit posted between January 1 and April 30, 2019 as well as comments submitted to the FDA regarding regulation of cannabis-derived products and analyzed them using a rule-based language processing method. A relative ranking of popular therapeutic uses and product groups for CBD is obtained based on frequency of pattern matches including precise queries that entail identifying mentions of the condition, a CBD product, and some "trigger" phrase indicating therapeutic use. We validated the social media-based findings using a similar analysis on comments to the U.S. Food and Drug Administration's (FDA) 2019 request-for-comments on cannabis-derived products. RESULTS CBD is mostly discussed as a remedy for anxiety disorders and pain and this is consistent across both comment sources. Of comments posted to the CBD subreddit during the monitored time span, 6.19% mentioned anxiety at least once with at least 6.02% of these comments specifically mentioning CBD as a treatment for anxiety (i.e., 0.37% of total comments). The most popular CBD product group is oil and tinctures. CONCLUSION Social media surveillance of CBD usage has the potential to surface new therapeutic use-cases as they are posted. Contemporary social media data indicate, for example, that stress and nausea are frequently mentioned as therapeutic use cases for CBD without corresponding evidence, that affirms or denies, in the research literature. However, the abundance of anecdotal claims warrants serious scientific exploration moving forward. Meanwhile, as FDA ponders regulation, our effort demonstrates that social data offers a convenient affordance to surveil for CBD usage patterns in a way that is fast and inexpensive and can inform conventional electronic surveys.
Collapse
Affiliation(s)
- Tung Tran
- Department of Computer Science, University of Kentucky, USA.
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, Department of Computer Science, University of Kentucky, USA.
| |
Collapse
|
37
|
Walsh CG, Chaudhry B, Dua P, Goodman KW, Kaplan B, Kavuluru R, Solomonides A, Subbian V. Stigma, biomarkers, and algorithmic bias: recommendations for precision behavioral health with artificial intelligence. JAMIA Open 2020; 3:9-15. [PMID: 32607482 PMCID: PMC7309258 DOI: 10.1093/jamiaopen/ooz054] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 07/29/2019] [Accepted: 10/30/2019] [Indexed: 12/22/2022] Open
Abstract
Effective implementation of artificial intelligence in behavioral healthcare delivery depends on overcoming challenges that are pronounced in this domain. Self and social stigma contribute to under-reported symptoms, and under-coding worsens ascertainment. Health disparities contribute to algorithmic bias. Lack of reliable biological and clinical markers hinders model development, and model explainability challenges impede trust among users. In this perspective, we describe these challenges and discuss design and implementation recommendations to overcome them in intelligent systems for behavioral and mental health.
Collapse
Affiliation(s)
- Colin G Walsh
- Biomedical Informatics, Medicine and Psychiatry, Vanderbilt University Medical Center, 2525 West End, Suite 1475, Nashville, TN, USA
| | - Beenish Chaudhry
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
| | - Prerna Dua
- Department of Health Informatics and Information Management, Louisiana Tech University, Ruston, Louisiana, USA
| | - Kenneth W Goodman
- Institute for Bioethics and Health Policy, University of Miami, Miller School of Medicine, Miami, Florida, USA
| | - Bonnie Kaplan
- Yale Center for Medical Informatics, Yale Bioethics Center, Yale Information Society, Yale Solomon Center for Health Law & Policy, Yale University, New Haven, Connecticut, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Anthony Solomonides
- Outcomes Research and Biomedical Informatics, NorthShore University HealthSystem, Research Institute, Evanston, Illinois, USA
| | - Vignesh Subbian
- Department of Biomedical Engineering, Department of Systems and Industrial Engineering, The University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
38
|
Sarker A, Belousov M, Friedrichs J, Hakala K, Kiritchenko S, Mehryary F, Han S, Tran T, Rios A, Kavuluru R, de Bruijn B, Ginter F, Mahata D, Mohammad SM, Nenadic G, Gonzalez-Hernandez G. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task. J Am Med Inform Assoc 2019; 25:1274-1283. [PMID: 30272184 PMCID: PMC6188524 DOI: 10.1093/jamia/ocy114] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 08/02/2018] [Indexed: 12/19/2022] Open
Abstract
Objective We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Maksim Belousov
- School of Computer Science, University of Manchester, Manchester, UK
| | | | - Kai Hakala
- Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland.,The University of Turku Graduate School, University of Turku, Turku, Finland
| | - Svetlana Kiritchenko
- Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada
| | - Farrokh Mehryary
- Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland.,The University of Turku Graduate School, University of Turku, Turku, Finland
| | - Sifei Han
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Tung Tran
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Anthony Rios
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.,Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Berry de Bruijn
- Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada
| | - Filip Ginter
- Turku NLP Group, Department of Future Technologies, University of Turku, Turku, Finland
| | | | - Saif M Mohammad
- Digital Technologies Research Centre, National Research Council Canada, Ottawa, Canada
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, UK
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
39
|
Ward PJ, Rock PJ, Slavova S, Young AM, Bunn TL, Kavuluru R. Enhancing timeliness of drug overdose mortality surveillance: A machine learning approach. PLoS One 2019; 14:e0223318. [PMID: 31618226 PMCID: PMC6795484 DOI: 10.1371/journal.pone.0223318] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 09/18/2019] [Indexed: 11/26/2022] Open
Abstract
Background Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance. Methods Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created as well as features indicating the part-of-speech of each word. These features were then used to train machine learning classifiers on 2017 data. The resulting models were tested on 2018 Kentucky data and compared to a simple rule-based classification approach. Documented code for this method is available for reuse and extensions: https://github.com/pjward5656/dcnlp. Results The top scoring machine learning model achieved 0.96 positive predictive value (PPV) and 0.98 sensitivity for an F-score of 0.97 in identification of fatal drug overdoses on test data. This machine learning model achieved significantly higher performance for sensitivity (p<0.001) than the rule-based approach. Additional feature engineering may improve the model’s prediction. This model can be deployed on death certificates as soon as the free-text is available, eliminating the time needed to code the death certificates. Conclusion Machine learning using natural language processing is a relatively new approach in the context of surveillance of health conditions. This method presents an accessible application of machine learning that improves the timeliness of drug overdose mortality surveillance. As such, it can be employed to inform public health responses to the drug overdose epidemic in near-real time as opposed to several weeks following events.
Collapse
Affiliation(s)
- Patrick J. Ward
- Kentucky Injury Prevention and Research Center, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- * E-mail:
| | - Peter J. Rock
- Kentucky Injury Prevention and Research Center, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
| | - Svetla Slavova
- Kentucky Injury Prevention and Research Center, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
| | - April M. Young
- Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- Center on Drug and Alcohol Research, College of Medicine, University of Kentucky, Lexington, Kentucky, United States of America
| | - Terry L. Bunn
- Kentucky Injury Prevention and Research Center, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Preventive Medicine and Environmental Health, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America
| | - Ramakanth Kavuluru
- Department of Computer Science, College of Engineering, University of Kentucky, Lexington, Kentucky, United States of America
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, Kentucky, United States of America
| |
Collapse
|
40
|
Rios A, Durbin EB, Hands I, Arnold SM, Shah D, Schwartz SM, Goulart BHL, Kavuluru R. Cross-registry neural domain adaptation to extract mutational test results from pathology reports. J Biomed Inform 2019; 97:103267. [PMID: 31401235 PMCID: PMC6736690 DOI: 10.1016/j.jbi.2019.103267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 07/30/2019] [Accepted: 08/05/2019] [Indexed: 10/26/2022]
Abstract
OBJECTIVE We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site. MATERIALS AND METHODS We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios. RESULTS The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%). CONCLUSION Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.
Collapse
Affiliation(s)
- Anthony Rios
- Department of Information Systems and Cyber Security, University of Texas at San Antonio, USA
| | - Eric B Durbin
- Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Kentucky Cancer Registry, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Lexington, KY, USA
| | - Susanne M Arnold
- Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - Darshil Shah
- Ironwood Cancer and Research Centers, Avondale, AZ, USA
| | | | | | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Computer Science Department, University of Kentucky, USA.
| |
Collapse
|
41
|
Tran T, Kavuluru R. Distant supervision for treatment relation extraction by leveraging MeSH subheadings. Artif Intell Med 2019; 98:18-26. [PMID: 31521249 PMCID: PMC6748648 DOI: 10.1016/j.artmed.2019.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 06/04/2019] [Accepted: 06/05/2019] [Indexed: 11/26/2022]
Abstract
The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise.
Collapse
Affiliation(s)
- Tung Tran
- Department of Computer Science, University of Kentucky, Lexington, KY, United States.
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, United States; Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States.
| |
Collapse
|
42
|
Goulart BHL, Silgard ET, Baik CS, Bansal A, Sun Q, Durbin EB, Hands I, Shah D, Arnold SM, Ramsey SD, Kavuluru R, Schwartz SM. Validity of Natural Language Processing for Ascertainment of EGFR and ALK Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer. JCO Clin Cancer Inform 2019; 3:1-15. [PMID: 31058542 PMCID: PMC6874053 DOI: 10.1200/cci.18.00098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/29/2019] [Indexed: 01/03/2023] Open
Abstract
PURPOSE SEER registries do not report results of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR). METHODS We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets. RESULTS NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients. CONCLUSION NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.
Collapse
Affiliation(s)
| | | | - Christina S. Baik
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| | | | - Qin Sun
- Fred Hutchinson Cancer Research Center, Seattle, WA
| | | | | | | | | | - Scott D. Ramsey
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| | | | - Stephen M. Schwartz
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| |
Collapse
|
43
|
Rios A, Kavuluru R. Neural transfer learning for assigning diagnosis codes to EMRs. Artif Intell Med 2019; 96:116-122. [PMID: 31164204 DOI: 10.1016/j.artmed.2019.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 12/20/2018] [Accepted: 04/10/2019] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Electronic medical records (EMRs) are manually annotated by healthcare professionals and specialized medical coders with a standardized set of alphanumeric diagnosis and procedure codes, specifically from the International Classification of Diseases (ICD). Annotating EMRs with ICD codes is important for medical billing and downstream epidemiological studies. However, manually annotating EMRs is both time-consuming and error prone. In this paper, we explore the use of convolutional neural networks (CNNs) for automatic ICD coding. Because many codes occur infrequently, CNN performance is inhibited. Therefore, we propose supplementing EMR data with PubMed indexed biomedical research abstracts through neural transfer learning. MATERIALS AND METHODS Transfer learning is the process of "transferring" knowledge acquired from one task (the source task) to a different (target) task. For the source task, we train a CNN to predict medical subject headings (MeSH) using 1.6 million PubMed indexed biomedical abstracts. For the target task, we train a CNN on 71,463 real-world EMRs collected from the University of Kentucky (UKY) medical center to predict ICD diagnosis codes. We introduce a simple, yet effective, transfer learning methodology which avoids forgetting knowledge gained from the source task. RESULTS Compared to our prior work using EMRs from the UKY medical center, we improve both the micro and macro F-scores by more than 8%. Likewise, compared to other transfer learning methods, our approach results in nearly 2% improvement in macro F-score. CONCLUSION We show that transfer learning can improve CNN performance for EMR coding in the presence of data sparsity issues. Furthermore, we find that our proposed transfer learning approach outperforms other methods with respect to macro F-score. Finally, we analyze how transfer learning impacts codes with respect to code frequency. We find that we achieve greater improvement on infrequent codes compared to improvements in most frequent codes.
Collapse
Affiliation(s)
- Anthony Rios
- Department of Computer Science, University of Kentucky, Lexington, KY, United States
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, United States; Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, Lexington, KY, United States.
| |
Collapse
|
44
|
Kavuluru R, Han S, Hahn EJ. On the popularity of the USB flash drive-shaped electronic cigarette Juul. Tob Control 2019; 28:110-112. [PMID: 29654121 PMCID: PMC6186192 DOI: 10.1136/tobaccocontrol-2018-054259] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/04/2022]
Affiliation(s)
- Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Sifei Han
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA
| | - Ellen J Hahn
- College of Nursing, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
45
|
Islamaj Dogan R, Kim S, Chatr-Aryamontri A, Wei CH, Comeau DC, Antunes R, Matos S, Chen Q, Elangovan A, Panyam NC, Verspoor K, Liu H, Wang Y, Liu Z, Altinel B, Hüsünbeyi ZM, Özgür A, Fergadis A, Wang CK, Dai HJ, Tran T, Kavuluru R, Luo L, Steppi A, Zhang J, Qu J, Lu Z. Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine. Database (Oxford) 2019; 2019:5303240. [PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/19/2018] [Indexed: 12/16/2022]
Abstract
The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.
Collapse
Affiliation(s)
- Rezarta Islamaj Dogan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sun Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Qingyu Chen
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
| | - Aparna Elangovan
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
| | - Nagesh C Panyam
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
| | - Hongfang Liu
- Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of Health Science Research, Mayo Clinic, Rochester, MN, USA
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Berna Altinel
- Department of Computer Engineering, Marmara University, Istanbul, Turkey
| | | | | | - Aris Fergadis
- School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece
| | - Chen-Kai Wang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
| | - Hong-Jie Dai
- Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan
| | - Tung Tran
- Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | - Ling Luo
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Albert Steppi
- Department of Statistics, Florida State University, Florida, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Florida, USA
| | - Jinchan Qu
- Department of Statistics, Florida State University, Florida, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
46
|
Abstract
Document retrieval (DR) forms an important component in end-to-end question-answering (QA) systems where particular answers are sought for well-formed questions. DR in the QA scenario is also useful by itself even without a more involved natural language processing component to extract exact answers from the retrieved documents. This latter step may simply be done by humans like in traditional search engines granted the retrieved documents contain the answer. In this paper, we take advantage of datasets made available through the BioASQ end-to-end QA shared task series and build an effective biomedical DR system that relies on relevant answer snippets in the BioASQ training datasets. At the core of our approach is a question-answer sentence matching neural network that learns a measure of relevance of a sentence to an input question in the form of a matching score. In addition to this matching score feature, we also exploit two auxiliary features for scoring document relevance: the name of the journal in which a document is published and the presence/absence of semantic relations (subject-predicate-object triples) in a candidate answer sentence connecting entities mentioned in the question. We rerank our baseline sequential dependence model scores using these three additional features weighted via adaptive random research and other learning-to-rank methods. Our full system placed 2nd in the final batch of Phase A (DR) of task B (QA) in BioASQ 2018. Our ablation experiments highlight the significance of the neural matching network component in the full system.
Collapse
Affiliation(s)
- Jiho Noh
- Department of Computer Science, University of Kentucky, Lexington KY
| | - Ramakanth Kavuluru
- Div. of Biomedical Informatics (Internal Medicine), University of Kentucky, Lexington KY
| |
Collapse
|
47
|
Peng Y, Rios A, Kavuluru R, Lu Z. Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database (Oxford) 2018; 2018:5055578. [PMID: 30020437 PMCID: PMC6051439 DOI: 10.1093/database/bay073] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 06/15/2018] [Indexed: 11/14/2022]
Abstract
Mining relations between chemicals and proteins from the biomedical literature is an increasingly important task. The CHEMPROT track at BioCreative VI aims to promote the development and evaluation of systems that can automatically detect the chemical–protein relations in running text (PubMed abstracts). This work describes our CHEMPROT track entry, which is an ensemble of three systems, including a support vector machine, a convolutional neural network, and a recurrent neural network. Their output is combined using majority voting or stacking for final predictions. Our CHEMPROT system obtained 0.7266 in precision and 0.5735 in recall for an F-score of 0.6410 during the challenge, demonstrating the effectiveness of machine learning-based approaches for automatic relation extraction from biomedical literature and achieving the highest performance in the task during the 2017 challenge. Database URL: http://www.biocreative.org/tasks/biocreative-vi/track-5/
Collapse
Affiliation(s)
- Yifan Peng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Anthony Rios
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.,Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, USA.,Division of Biomedical Informatics Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
48
|
Rios A, Kavuluru R. Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces. Proc Conf Empir Methods Nat Lang Process 2018; 2018:3132-3142. [PMID: 30775726 PMCID: PMC6375489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Large multi-label datasets contain labels that occur thousands of times (frequent group), those that occur only a few times (few-shot group), and labels that never appear in the training dataset (zero-shot group). Multi-label few- and zero-shot label prediction is mostly unexplored on datasets with large label spaces, especially for text classification. In this paper, we perform a fine-grained evaluation to understand how state-of-the-art methods perform on infrequent labels. Furthermore, we develop few- and zero-shot methods for multi-label text classification when there is a known structure over the label space, and evaluate them on two publicly available medical text datasets: MIMIC II and MIMIC III. For few-shot labels we achieve improvements of 6.2% and 4.8% in R@10 for MIMIC II and MIMIC III, respectively, over prior efforts; the corresponding R@10 improvements for zero-shot labels are 17.3% and 19%.
Collapse
Affiliation(s)
- Anthony Rios
- Department of Computer Science, University of Kentucky, Lexington, KY
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, University of Kentucky, Lexington, KY
| |
Collapse
|
49
|
Rios A, Kavuluru R, Lu Z. Generalizing biomedical relation classification with neural adversarial domain adaptation. Bioinformatics 2018; 34:2973-2981. [PMID: 29590309 PMCID: PMC6129312 DOI: 10.1093/bioinformatics/bty190] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 03/15/2018] [Accepted: 03/25/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Creating large datasets for biomedical relation classification can be prohibitively expensive. While some datasets have been curated to extract protein-protein and drug-drug interactions (PPIs and DDIs) from text, we are also interested in other interactions including gene-disease and chemical-protein connections. Also, many biomedical researchers have begun to explore ternary relationships. Even when annotated data are available, many datasets used for relation classification are inherently biased. For example, issues such as sample selection bias typically prevent models from generalizing in the wild. To address the problem of cross-corpora generalization, we present a novel adversarial learning algorithm for unsupervised domain adaptation tasks where no labeled data are available in the target domain. Instead, our method takes advantage of unlabeled data to improve biased classifiers through learning domain-invariant features via an adversarial process. Finally, our method is built upon recent advances in neural network (NN) methods. Results We experiment by extracting PPIs and DDIs from text. In our experiments, we show domain invariant features can be learned in NNs such that classifiers trained for one interaction type (protein-protein) can be re-purposed to others (drug-drug). We also show that our method can adapt to different source and target pairs of PPI datasets. Compared to prior convolutional and recurrent NN-based relation classification methods without domain adaptation, we achieve improvements as high as 30% in F1-score. Likewise, we show improvements over state-of-the-art adversarial methods. Availability and implementation Experimental code is available at https://github.com/bionlproc/adversarial-relation-classification. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anthony Rios
- National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, MD, USA
- Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Department of Computer Science, University of Kentucky, Lexington, KY, USA
- Division of Biomedical Informatics, Department of Internal Medicine, Lexington, KY, USA
| | - Zhiyong Lu
- National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, MD, USA
| |
Collapse
|
50
|
Bakal G, Talari P, Kakani EV, Kavuluru R. Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. J Biomed Inform 2018; 82:189-199. [PMID: 29763706 PMCID: PMC6070294 DOI: 10.1016/j.jbi.2018.05.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Revised: 01/31/2018] [Accepted: 05/09/2018] [Indexed: 01/27/2023]
Abstract
BACKGROUND Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. OBJECTIVE To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. METHODS We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80-20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. RESULTS Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. CONCLUSIONS We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction.
Collapse
Affiliation(s)
- Gokhan Bakal
- Department of Computer Science, University of Kentucky, United States.
| | - Preetham Talari
- Division of Hospital Medicine, Department of Internal Medicine, University of Kentucky, United States.
| | - Elijah V Kakani
- Division of Hospital Medicine, Department of Internal Medicine, University of Kentucky, United States.
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, United States; Department of Computer Science, University of Kentucky, United States.
| |
Collapse
|