1
|
Bachman JA, Gyori BM, Sorger PK. Automated assembly of molecular mechanisms at scale from text mining and curated databases. Mol Syst Biol 2023; 19:e11325. [PMID: 36938926 PMCID: PMC10167483 DOI: 10.15252/msb.202211325] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 02/24/2023] [Accepted: 02/27/2023] [Indexed: 03/21/2023] Open
Abstract
The analysis of omic data depends on machine-readable information about protein interactions, modifications, and activities as found in protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. These resources typically depend heavily on human curation. Natural language processing systems that read the primary literature have the potential to substantially extend knowledge resources while reducing the burden on human curators. However, machine-reading systems are limited by high error rates and commonly generate fragmentary and redundant information. Here, we describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies full and partial overlaps in information extracted from published papers and pathway databases, uses predictive models to improve the reliability of machine reading, and thereby assembles individual pieces of information into non-redundant and broadly usable mechanistic knowledge. Using INDRA to create high-quality corpora of causal knowledge we show it is possible to extend protein-protein interaction databases and explain co-dependencies in the Cancer Dependency Map.
Collapse
Affiliation(s)
- John A Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.,Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
2
|
Bucur CI, Kuhn T, Ceolin D, van Ossenbruggen J. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers. PeerJ Comput Sci 2023; 9:e1159. [PMID: 37346675 PMCID: PMC10280262 DOI: 10.7717/peerj-cs.1159] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/25/2022] [Indexed: 06/23/2023]
Abstract
With the rapidly increasing amount of scientific literature, it is getting continuously more difficult for researchers in different disciplines to keep up-to-date with the recent findings in their field of study. Processing scientific articles in an automated fashion has been proposed as a solution to this problem, but the accuracy of such processing remains very poor for extraction tasks beyond the most basic ones (like locating and identifying entities and simple classification based on predefined categories). Few approaches have tried to change how we publish scientific results in the first place, such as by making articles machine-interpretable by expressing them with formal semantics from the start. In the work presented here, we propose a first step in this direction by setting out to demonstrate that we can formally publish high-level scientific claims in formal logic, and publish the results in a special issue of an existing journal. We use the concept and technology of nanopublications for this endeavor, and represent not just the submissions and final papers in this RDF-based format, but also the whole process in between, including reviews, responses, and decisions. We do this by performing a field study with what we call formalization papers, which contribute a novel formalization of a previously published claim. We received 15 submissions from 18 authors, who then went through the whole publication process leading to the publication of their contributions in the special issue. Our evaluation shows the technical and practical feasibility of our approach. The participating authors mostly showed high levels of interest and confidence, and mostly experienced the process as not very difficult, despite the technical nature of the current user interfaces. We believe that these results indicate that it is possible to publish scientific results from different fields with machine-interpretable semantics from the start, which in turn opens countless possibilities to radically improve in the future the effectiveness and efficiency of the scientific endeavor as a whole.
Collapse
Affiliation(s)
- Cristina-Iulia Bucur
- Computer Science Department, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Tobias Kuhn
- Computer Science Department, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Davide Ceolin
- Human-Centered Data Analytics Group, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | | |
Collapse
|
3
|
Chen Q, Allot A, Leaman R, Islamaj R, Du J, Fang L, Wang K, Xu S, Zhang Y, Bagherzadeh P, Bergler S, Bhatnagar A, Bhavsar N, Chang YC, Lin SJ, Tang W, Zhang H, Tavchioski I, Pollak S, Tian S, Zhang J, Otmakhova Y, Yepes AJ, Dong H, Wu H, Dufour R, Labrak Y, Chatterjee N, Tandon K, Laleye FAA, Rakotoson L, Chersoni E, Gu J, Friedrich A, Pujari SC, Chizhikova M, Sivadasan N, VG S, Lu Z. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database (Oxford) 2022; 2022:baac069. [PMID: 36043400 PMCID: PMC9428574 DOI: 10.1093/database/baac069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 08/02/2022] [Accepted: 08/13/2022] [Indexed: 05/03/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.
Collapse
Affiliation(s)
- Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Alexis Allot
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Robert Leaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Rezarta Islamaj
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Jingcheng Du
- School of Biomedical Informatics, UT Health, TX, Houston 77030, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Shuo Xu
- College of Economics and Management, Beijing University of Technology, Beijing, QC, China
| | - Yuefu Zhang
- College of Economics and Management, Beijing University of Technology, Beijing, QC, China
| | | | | | | | | | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Sheng-Jie Lin
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Wentai Tang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Hongtong Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Ilija Tavchioski
- Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Jožef Stefan Institute, Ljubljana, Slovenia
| | | | - Shubo Tian
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yulia Otmakhova
- School of Computing and Information Systems, University of Melbourne, Melbourne, AU-VIC, Australia
| | | | - Hang Dong
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, UK
| | | | | | - Niladri Chatterjee
- Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
| | - Kushagri Tandon
- Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
| | | | | | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jinghang Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | | | - Subhash Chandra Pujari
- Institute of Computer Science, Heidelberg University, Heidelberg, Germany
- Bosch Center for Artificial Intelligence, Renningen, Germany
| | - Mariia Chizhikova
- SINAI Group, Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Jaén, Spain
| | | | | | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| |
Collapse
|
4
|
Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, Hu F, Lopes EN, Martens M, Pham N, Shin W, Slenter DN, Waagmeester A, Willighagen EL, Winckers LA, Evelo CT, Pico AR. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol 2021; 17:e1009226. [PMID: 34411100 PMCID: PMC8375987 DOI: 10.1371/journal.pcbi.1009226] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
| | - Martina Kutmon
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Susan L. Coort
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Lauren J. Dupuis
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Finterly Hu
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Elisson N. Lopes
- Instituto de Ciencias Biologicas, Departamento de Bioquimica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Nhung Pham
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Woosub Shin
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Denise N. Slenter
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | | | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Laurent A. Winckers
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Alexander R. Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
5
|
Ruiz Castro PA, Yepiskoposyan H, Gubian S, Calvino-Martin F, Kogel U, Renggli K, Peitsch MC, Hoeng J, Talikka M. Systems biology approach highlights mechanistic differences between Crohn's disease and ulcerative colitis. Sci Rep 2021; 11:11519. [PMID: 34075172 PMCID: PMC8169754 DOI: 10.1038/s41598-021-91124-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/21/2021] [Indexed: 12/11/2022] Open
Abstract
The molecular mechanisms of IBD have been the subject of intensive exploration. We, therefore, assembled the available information into a suite of causal biological network models, which offer comprehensive visualization of the processes underlying IBD. Scientific text was curated by using Biological Expression Language (BEL) and compiled with OpenBEL 3.0.0. Network properties were analysed by Cytoscape. Network perturbation amplitudes were computed to score the network models with transcriptomic data from public data repositories. The IBD network model suite consists of three independent models that represent signalling pathways that contribute to IBD. In the “intestinal permeability” model, programmed cell death factors were downregulated in CD and upregulated in UC. In the “inflammation” model, PPARG, IL6, and IFN-associated pathways were prominent regulatory factors in both diseases. In the “wound healing” model, factors promoting wound healing were upregulated in CD and downregulated in UC. Scoring of publicly available transcriptomic datasets onto these network models demonstrated that the IBD models capture the perturbation in each dataset accurately. The IBD network model suite can provide better mechanistic insights of the transcriptional changes in IBD and constitutes a valuable tool in personalized medicine to further understand individual drug responses in IBD.
Collapse
Affiliation(s)
- Pedro A Ruiz Castro
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland.
| | - Hasmik Yepiskoposyan
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland.
| | - Sylvain Gubian
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Florian Calvino-Martin
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Ulrike Kogel
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Kasper Renggli
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Manuel C Peitsch
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Julia Hoeng
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland
| | - Marja Talikka
- Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland.
| |
Collapse
|
6
|
Stefanovski L, Meier JM, Pai RK, Triebkorn P, Lett T, Martin L, Bülau K, Hofmann-Apitius M, Solodkin A, McIntosh AR, Ritter P. Bridging Scales in Alzheimer's Disease: Biological Framework for Brain Simulation With The Virtual Brain. Front Neuroinform 2021; 15:630172. [PMID: 33867964 PMCID: PMC8047422 DOI: 10.3389/fninf.2021.630172] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 03/08/2021] [Indexed: 12/18/2022] Open
Abstract
Despite the acceleration of knowledge and data accumulation in neuroscience over the last years, the highly prevalent neurodegenerative disease of AD remains a growing problem. Alzheimer's Disease (AD) is the most common cause of dementia and represents the most prevalent neurodegenerative disease. For AD, disease-modifying treatments are presently lacking, and the understanding of disease mechanisms continues to be incomplete. In the present review, we discuss candidate contributing factors leading to AD, and evaluate novel computational brain simulation methods to further disentangle their potential roles. We first present an overview of existing computational models for AD that aim to provide a mechanistic understanding of the disease. Next, we outline the potential to link molecular aspects of neurodegeneration in AD with large-scale brain network modeling using The Virtual Brain (www.thevirtualbrain.org), an open-source, multiscale, whole-brain simulation neuroinformatics platform. Finally, we discuss how this methodological approach may contribute to the understanding, improved diagnostics, and treatment optimization of AD.
Collapse
Affiliation(s)
- Leon Stefanovski
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
| | - Jil Mona Meier
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
| | - Roopa Kalsank Pai
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| | - Paul Triebkorn
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
- Institut de Neurosciences des Systèmes, Aix Marseille Université, Marseille, France
| | - Tristram Lett
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
| | - Leon Martin
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
| | - Konstantin Bülau
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Sankt Augustin, Germany
| | - Ana Solodkin
- Behavioral and Brain Sciences, University of Texas at Dallas, Dallas, TX, United States
| | | | - Petra Ritter
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Neurology with Experimental Neurology, Brain Simulation Section, Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
- Einstein Center for Neuroscience Berlin, Berlin, Germany
- Einstein Center Digital Future, Berlin, Germany
| |
Collapse
|
7
|
Zucker J, Paneri K, Mohammad-Taheri S, Bhargava S, Kolambkar P, Bakker C, Teuton J, Hoyt CT, Oxford K, Ness R, Vitek O. Leveraging Structured Biological Knowledge for Counterfactual Inference: A Case Study of Viral Pathogenesis. IEEE TRANSACTIONS ON BIG DATA 2021; 7:25-37. [PMID: 37981991 PMCID: PMC8769018 DOI: 10.1109/tbdata.2021.3050680] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 11/11/2020] [Accepted: 12/14/2020] [Indexed: 11/21/2023]
Abstract
Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This article proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.
Collapse
Affiliation(s)
- Jeremy Zucker
- Pacific Northwest National LaboratoryRichlandWA99354USA
| | | | | | | | | | - Craig Bakker
- Pacific Northwest National LaboratoryRichlandWA99354USA
| | - Jeremy Teuton
- Pacific Northwest National LaboratoryRichlandWA99354USA
| | | | | | | | | |
Collapse
|
8
|
Biziukova NY, Tarasova OA, Rudik AV, Filimonov DA, Poroikov VV. Automatic Recognition of Chemical Entity Mentions in Texts of Scientific Publications. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS 2021. [DOI: 10.3103/s0005105520060023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Razjouyan J, Freytag J, Dindo L, Kiefer L, Odom E, Halaszynski J, Silva JW, Naik AD. Measuring Adoption of Patient Priorities-Aligned Care Using Natural Language Processing of Electronic Health Records: Development and Validation of the Model. JMIR Med Inform 2021; 9:e18756. [PMID: 33605893 PMCID: PMC7935648 DOI: 10.2196/18756] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 11/16/2020] [Accepted: 12/17/2020] [Indexed: 12/04/2022] Open
Abstract
Background Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults who have multiple chronic conditions. Following identification of patient priorities, this information is documented in the patient’s electronic health record (EHR). Objective Our goal is to develop and validate a natural language processing (NLP) model that reliably documents when clinicians identify patient priorities (ie, values, outcome goals, and care preferences) within the EHR as a measure of PPC adoption. Methods This is a retrospective analysis of unstructured National Veteran Health Administration EHR free-text notes using an NLP model. The data were sourced from 778 patient notes of 658 patients from encounters with 144 social workers in the primary care setting. Each patient’s free-text clinical note was reviewed by 2 independent reviewers for the presence of PPC language such as priorities, values, and goals. We developed an NLP model that utilized statistical machine learning approaches. The performance of the NLP model in training and validation with 10-fold cross-validation is reported via accuracy, recall, and precision in comparison to the chart review. Results Of 778 notes, 589 (75.7%) were identified as containing PPC language (kappa=0.82, P<.001). The NLP model in the training stage had an accuracy of 0.98 (95% CI 0.98-0.99), a recall of 0.98 (95% CI 0.98-0.99), and precision of 0.98 (95% CI 0.97-1.00). The NLP model in the validation stage had an accuracy of 0.92 (95% CI 0.90-0.94), recall of 0.84 (95% CI 0.79-0.89), and precision of 0.84 (95% CI 0.77-0.91). In contrast, an approach using simple search terms for PPC only had a precision of 0.757. Conclusions An automated NLP model can reliably measure with high precision, recall, and accuracy when clinicians document patient priorities as a key step in the adoption of PPC.
Collapse
Affiliation(s)
- Javad Razjouyan
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States.,Department of Medicine, Baylor College of Medicine, Houston, TX, United States.,Big Data Scientist Training Enhancement Program (BD-STEP), VA Office of Research and Development, Washington, DC, United States
| | - Jennifer Freytag
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States
| | - Lilian Dindo
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States.,Department of Medicine, Baylor College of Medicine, Houston, TX, United States
| | - Lea Kiefer
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States
| | - Edward Odom
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States
| | - Jaime Halaszynski
- Social Work Service, Butler VA Health Care System, Butler, PA, United States.,VA National Social Work Program Office, Care Management and Social Work, Patient Care Services, Department of Veterans Affairs, Washington, DC, United States.,VA Tennessee Valley Healthcare System, Nashville, TN, United States
| | - Jennifer W Silva
- VA National Social Work Program Office, Care Management and Social Work, Patient Care Services, Department of Veterans Affairs, Washington, DC, United States.,VA Tennessee Valley Healthcare System, Nashville, TN, United States
| | - Aanand D Naik
- VA Health Services Research and Development Service, Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey VA Medical Center, Houston, TX, United States.,Department of Medicine, Baylor College of Medicine, Houston, TX, United States.,Big Data Scientist Training Enhancement Program (BD-STEP), VA Office of Research and Development, Washington, DC, United States.,VA Quality Scholars Coordinating Center, IQuESt, Michael E DeBakey VA Medical Center, Houston, TX, United States
| |
Collapse
|
10
|
Shao Y, Li H, Gu J, Qian L, Zhou G. Extraction of causal relations based on SBEL and BERT model. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6133143. [PMID: 33570092 PMCID: PMC7904051 DOI: 10.1093/database/baab005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 01/19/2021] [Accepted: 01/26/2021] [Indexed: 11/15/2022]
Abstract
Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form—SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets
Collapse
Affiliation(s)
- Yifan Shao
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu Province, China, 215006
| | - Haoru Li
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu Province, China, 215006
| | - Jinghang Gu
- Department of Chinese & Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China, 999077
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu Province, China, 215006
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu Province, China, 215006
| |
Collapse
|