1
|
Madan S, Kühnel L, Fröhlich H, Hofmann-Apitius M, Fluck J. Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks. Database (Oxford) 2024; 2024:baae066. [PMID: 39104284 DOI: 10.1093/database/baae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 06/23/2024] [Accepted: 07/10/2024] [Indexed: 08/07/2024]
Abstract
MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.
Collapse
Affiliation(s)
- Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757 Sankt Augustin, Germany
| | - Lisa Kühnel
- Knowledge Management, German National Library of Medicine (ZB MED)-Information Centre for Life Sciences, Friedrich-Hirzebruch-Allee 4, Bonn 53115, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, Postfach 10 01 31, Bielefeld, Nordrhein-Westfalen 33501, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757 Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Friedrich-Hirzebruch-Allee 6, Bonn 53113, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757 Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Friedrich-Hirzebruch-Allee 6, Bonn 53113, Germany
| | - Juliane Fluck
- Knowledge Management, German National Library of Medicine (ZB MED)-Information Centre for Life Sciences, Friedrich-Hirzebruch-Allee 4, Bonn 53115, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, Postfach 10 01 31, Bielefeld, Nordrhein-Westfalen 33501, Germany
- Information management, Institute of Geodesy and Geoinformation, University of Bonn, Katzenburgweg 1a, Bonn 53115, Germany
| |
Collapse
|
2
|
Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Fröhlich H. Transformer models in biomedicine. BMC Med Inform Decis Mak 2024; 24:214. [PMID: 39075407 PMCID: PMC11287876 DOI: 10.1186/s12911-024-02600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
Deep neural networks (DNN) have fundamentally revolutionized the artificial intelligence (AI) field. The transformer model is a type of DNN that was originally used for the natural language processing tasks and has since gained more and more attention for processing various kinds of sequential data, including biological sequences and structured electronic health records. Along with this development, transformer-based models such as BioBERT, MedBERT, and MassGenie have been trained and deployed by researchers to answer various scientific questions originating in the biomedical domain. In this paper, we review the development and application of transformer models for analyzing various biomedical-related datasets such as biomedical textual data, protein sequences, medical structured-longitudinal data, and biomedical images as well as graphs. Also, we look at explainable AI strategies that help to comprehend the predictions of transformer-based models. Finally, we discuss the limitations and challenges of current models, and point out emerging novel research directions.
Collapse
Affiliation(s)
- Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Institute of Computer Science, University of Bonn, Bonn, 53115, Germany.
| | - Manuel Lentzen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Johannes Brandt
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Daniel Rueckert
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
- School of Computation, Information and Technology, Technical University Munich, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany.
| |
Collapse
|
3
|
González-Pérez Y, Montero Delgado A, Martinez Sesmero JM. [Translated article] Introducing artificial intelligence to hospital pharmacy departments. FARMACIA HOSPITALARIA 2024; 48 Suppl 1:TS35-TS44. [PMID: 39097375 DOI: 10.1016/j.farma.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/03/2024] [Accepted: 02/14/2024] [Indexed: 08/05/2024] Open
Abstract
Artificial intelligence is a broad concept that includes the study of the ability of computers to perform tasks that would normally require the intervention of human intelligence. By exploiting large volumes of healthcare data, Artificial intelligence algorithms can identify patterns and predict outcomes, which can help healthcare organizations and their professionals make better decisions and achieve better results. Machine learning, deep learning, neural networks, or natural language processing are among the most important methods, allowing systems to learn and improve from data without the need for explicit programming. Artificial intelligence has been introduced in biomedicine, accelerating processes, improving accuracy and efficiency, and improving patient care. By using Artificial intelligence algorithms and machine learning, hospital pharmacists can analyze a large volume of patient data, including medical records, laboratory results, and medication profiles, aiding them in identifying potential drug-drug interactions, assessing the safety and efficacy of medicines, and making informed recommendations. Artificial intelligence integration will improve the quality of pharmaceutical care, optimize processes, promote research, deploy open innovation, and facilitate education. Hospital pharmacists who master Artificial intelligence will play a crucial role in this transformation.
Collapse
Affiliation(s)
- Yared González-Pérez
- Servicio de Farmacia, Hospital Universitario de Canarias, San Cristóbal de La Laguna, Spain.
| | - Alfredo Montero Delgado
- Servicio de Farmacia, Hospital Nuestra Señora de la Candelaria, Santa Cruz de Tenerife, Spain
| | | |
Collapse
|
4
|
González-Pérez Y, Montero Delgado A, Martinez Sesmero JM. Approaching artificial intelligence to Hospital Pharmacy. FARMACIA HOSPITALARIA 2024; 48 Suppl 1:S35-S44. [PMID: 39097366 DOI: 10.1016/j.farma.2024.02.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/03/2024] [Accepted: 02/14/2024] [Indexed: 08/05/2024] Open
Abstract
Artificial intelligence (AI) is a broad concept that includes the study of the ability of computers to perform tasks that would normally require the intervention of human intelligence. By exploiting large volumes of healthcare data, artificial intelligence algorithms can identify patterns and predict outcomes, which can help healthcare organizations and their professionals make better decisions and achieve better results. Machine learning, deep learning, neural networks or natural language processing are among the most important methods, allowing systems to learn and improve from data without the need for explicit programming. AI has been introduced in biomedicine, accelerating processes, improving safety and efficiency, and improving patient care. By using AI algorithms and Machine Learning, hospital pharmacists can analyze a large volume of patient data, including medical records, laboratory results, and medication profiles, aiding them in identifying potential drug-drug interactions, assessing the safety and efficacy of medicines, and making informed recommendations. AI integration will improve the quality of pharmaceutical care, optimize processes, promote research, deploy open innovation, and facilitate education. Hospital pharmacists who master AI will play a crucial role in this transformation.
Collapse
Affiliation(s)
- Yared González-Pérez
- Servicio de Farmacia, Hospital Universitario de Canarias, San Cristóbal de La Laguna, España.
| | - Alfredo Montero Delgado
- Servicio de Farmacia, Hospital Nuestra Señora de la Candelaria, Santa Cruz de Tenerife, España
| | | |
Collapse
|
5
|
Gill JK, Chetty M, Lim S, Hallinan J. Large language model based framework for automated extraction of genetic interactions from unstructured data. PLoS One 2024; 19:e0303231. [PMID: 38771886 PMCID: PMC11108146 DOI: 10.1371/journal.pone.0303231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/23/2024] [Indexed: 05/23/2024] Open
Abstract
Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.
Collapse
Affiliation(s)
- Jaskaran Kaur Gill
- Health Innovation and Transformation Centre, Federation University, Ballarat, Victoria, Australia
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Ballarat, Victoria, Australia
| | - Suryani Lim
- Health Innovation and Transformation Centre, Federation University, Ballarat, Victoria, Australia
| | - Jennifer Hallinan
- Health Innovation and Transformation Centre, Federation University, Ballarat, Victoria, Australia
- BioThink, Brisbane, Queensland, Australia
| |
Collapse
|
6
|
Masoumi S, Amirkhani H, Sadeghian N, Shahraz S. Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research. Syst Rev 2024; 13:107. [PMID: 38622611 PMCID: PMC11020656 DOI: 10.1186/s13643-024-02470-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/28/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model. METHODS Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy. RESULTS The trained models' micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF. The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI: 56.80-78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions (AAGR = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%). CONCLUSIONS BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.
Collapse
Affiliation(s)
- Safoora Masoumi
- Pediatric Infectious Diseases Research Center, Mazandaran University of Medical Sciences, Sari, Iran.
| | - Hossein Amirkhani
- Computer and Information Technology Department, University of Qom, Qom, Iran
| | - Najmeh Sadeghian
- Student Research Committee, Mazandaran University of Medical Sciences, Sari, Iran
| | - Saeid Shahraz
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA
| |
Collapse
|
7
|
Zirkle J, Han X, Racz R, Samieegohar M, Chaturbedi A, Mann J, Chakravartula S, Li Z. Deep learning-enabled natural language processing to identify directional pharmacokinetic drug-drug interactions. BMC Bioinformatics 2023; 24:413. [PMID: 37914988 PMCID: PMC10619324 DOI: 10.1186/s12859-023-05520-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND During drug development, it is essential to gather information about the change of clinical exposure of a drug (object) due to the pharmacokinetic (PK) drug-drug interactions (DDIs) with another drug (precipitant). While many natural language processing (NLP) methods for DDI have been published, most were designed to evaluate if (and what kind of) DDI relationships exist in the text, without identifying the direction of DDI (object vs. precipitant drug). Here we present a method for the automatic identification of the directionality of a PK DDI from literature or drug labels. METHODS We reannotated the Text Analysis Conference (TAC) DDI track 2019 corpus for identifying the direction of a PK DDI and evaluated the performance of a fine-tuned BioBERT model on this task by following the training and validation steps prespecified by TAC. RESULTS This initial attempt showed the model achieved an F-score of 0.82 in identifying sentences as containing PK DDI and an F-score of 0.97 in identifying object versus precipitant drugs in those sentences. DISCUSSION AND CONCLUSION Despite a growing list of NLP methods for DDI extraction, most of them use a common set of corpora to perform general purpose tasks (e.g., classifying a sentence into one of several fixed DDI categories). There is a lack of coordination between the drug development and biomedical informatics method development community to develop corpora and methods to perform specific tasks (e.g., extract clinical exposure changes due to PK DDI). We hope that our effort can encourage such a coordination so that more "fit for purpose" NLP methods could be developed and used to facilitate the drug development process.
Collapse
Affiliation(s)
- Joel Zirkle
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Xiaomei Han
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Rebecca Racz
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Mohammadreza Samieegohar
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Anik Chaturbedi
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - John Mann
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Shilpa Chakravartula
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Zhihua Li
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA.
| |
Collapse
|
8
|
Lv Q, Zhou J, Yang Z, He H, Chen CYC. 3D graph neural network with few-shot learning for predicting drug-drug interactions in scaffold-based cold start scenario. Neural Netw 2023; 165:94-105. [PMID: 37276813 DOI: 10.1016/j.neunet.2023.05.039] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/15/2023] [Accepted: 05/19/2023] [Indexed: 06/07/2023]
Abstract
Understanding drug-drug interactions (DDI) of new drugs is critical for minimizing unexpected adverse drug reactions. The modeling of new drugs is called a cold start scenario. In this scenario, Only a few structural information or physicochemical information about new drug is available. The 3D conformation of drug molecules usually plays a crucial role in chemical properties compared to the 2D structure. 3D graph network with few-shot learning is a promising solution. However, the 3D heterogeneity of drug molecules and the discretization of atomic distributions lead to spatial confusion in few-shot learning. Here, we propose a 3D graph neural network with few-shot learning, Meta3D-DDI, to predict DDI events in cold start scenario. The 3DGNN ensures rotation and translation invariance by calculating atomic pairwise distances, and incorporates 3D structure and distance information in the information aggregation stage. The continuous filter interaction module can continuously simulate the filter to obtain the interaction between the target atom and other atoms. Meta3D-DDI further develops a FSL strategy based on bilevel optimization to transfer meta-knowledge for DDI prediction tasks from existing drugs to new drugs. In addition, the existing cold start setting may cause the scaffold structure information in the training set to leak into the test set. We design scaffold-based cold start scenario to ensure that the drug scaffolds in the training set and test set do not overlap. The extensive experiments demonstrate that our architecture achieves the SOTA performance for DDI prediction under scaffold-based cold start scenario on two real-world datasets. The visual experiment shows that Meta3D-DDI significantly improves the learning for DDI prediction of new drugs. We also demonstrate how Meta3D-DDI can reduce the amount of data required to make meaningful DDI predictions.
Collapse
Affiliation(s)
- Qiujie Lv
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Jun Zhou
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Ziduo Yang
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Haohuai He
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Calvin Yu-Chian Chen
- School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China; Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan; Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
9
|
Zhou X, Fu Q, Chen J, Liu L, Wang Y, Lu Y, Wu H. Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network. J Biomed Inform 2023; 144:104445. [PMID: 37467835 DOI: 10.1016/j.jbi.2023.104445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 06/06/2023] [Accepted: 07/15/2023] [Indexed: 07/21/2023]
Abstract
In biomedical literature, cross-sentence texts can usually express rich knowledge, and extracting the interaction relation between entities from cross-sentence texts is of great significance to biomedical research. However, compared with single sentence, cross-sentence text has a longer sequence length, so the research on cross-sentence text information extraction should focus more on learning the context dependency structural information. Nowadays, it is still a challenge to handle global dependencies and structural information of long sequences effectively, and graph-oriented modeling methods have received more and more attention recently. In this paper, we propose a new graph attention network guided by syntactic dependency relationship (SR-GAT) for extracting biomedical relation from the cross-sentence text. It allows each node to pay attention to other nodes in its neighborhood, regardless of the sequence length. The attention weight between nodes is given by a syntactic relation graph probability network (SR-GPR), which encodes the syntactic dependency between nodes and guides the graph attention mechanism to learn information about the dependency structure. The learned feature representation retains information about the node-to-node syntactic dependency, and can further discover global dependencies effectively. The experimental results demonstrate on a publicly available biomedical dataset that, our method achieves state-of-the-art performance while requiring significantly less computational resources. Specifically, in the "drug-mutation" relation extraction task, our method achieves an advanced accuracy of 93.78% for binary classification and 92.14% for multi-classification. In the "drug-gene-mutation" relation extraction task, our method achieves an advanced accuracy of 93.22% for binary classification and 92.28% for multi-classification. Across all relation extraction tasks, our method improves accuracy by an average of 0.49% compared to the existing best model. Furthermore, our method achieved an accuracy of 69.5% in text classification, surpassing most existing models, demonstrating its robustness in generalization across different domains without additional fine-tuning.
Collapse
Affiliation(s)
- Xueyang Zhou
- Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Qiming Fu
- Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China.
| | - Jianping Chen
- Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China; Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou 215009, China; Chongqing Industrial Big Data Innovation Center Co., Ltd., Chongqing 4007071, China.
| | - Lanhui Liu
- Chongqing Industrial Big Data Innovation Center Co., Ltd., Chongqing 4007071, China
| | - Yunzhe Wang
- Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China
| | - You Lu
- Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China; Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongjie Wu
- Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| |
Collapse
|
10
|
Oh JH, Tannenbaum A, Deasy JO. Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods. Front Genet 2023; 14:1161047. [PMID: 37529777 PMCID: PMC10390074 DOI: 10.3389/fgene.2023.1161047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/29/2023] [Indexed: 08/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However, the manual search of scientific literature in PubMed is laborious and time-consuming. Natural language processing (NLP) techniques along with artificial intelligence/machine learning approaches may allow for automatic processing in identifying DILI-related literature, but useful methods are yet to be demonstrated. To address this issue, we have developed an integrated NLP/machine learning classification model to identify DILI-related literature using only paper titles and abstracts. For prediction modeling, we used 14,203 publications provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, employing word vectorization techniques in NLP in conjunction with machine learning methods. Classification modeling was performed using 2/3 of the data for training and the remainder for test in internal validation. The best performance was achieved using a linear support vector machine (SVM) model on the combined vectors derived from term frequency-inverse document frequency (TF-IDF) and Word2Vec, resulting in an accuracy of 95.0% and an F1-score of 95.0%. The final SVM model constructed from all 14,203 publications was tested on independent datasets, resulting in accuracies of 92.5%, 96.3%, and 98.3%, and F1-scores of 93.5%, 86.1%, and 75.6% for three test sets (T1-T3). Furthermore, the SVM model was tested on four external validation sets (V1-V4), resulting in accuracies of 92.0%, 96.2%, 98.3%, and 93.1%, and F1-scores of 92.4%, 82.9%, 75.0%, and 93.3%.
Collapse
Affiliation(s)
- Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Allen Tannenbaum
- Department of Computer Science, Stony Brook University, Stony Brook, NY, United States
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
| | - Joseph O. Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| |
Collapse
|
11
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
12
|
Molina M, Jiménez C, Montenegro C. Improving Drug-Drug Interaction Extraction with Gaussian Noise. Pharmaceutics 2023; 15:1823. [PMID: 37514010 PMCID: PMC10385013 DOI: 10.3390/pharmaceutics15071823] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/28/2023] [Accepted: 06/12/2023] [Indexed: 07/30/2023] Open
Abstract
Drug-Drug Interactions (DDIs) produce essential and valuable insights for healthcare professionals, since they provide data on the impact of concurrent administration of medications to patients during therapy. In that sense, some relevant works, related to the DDIExtraction2013 Challenge, are available in the current technical literature. This study aims to improve previous results, using two models, where a Gaussian noise layer is added to achieve better DDI relationship extraction. (1) A Piecewise Convolutional Neural Network (PW-CNN) model is used to capture relationships among pharmacological entities described in biomedical databases. Additionally, the model incorporates multichannel words to enrich a person's vocabulary and reduce unfamiliar words. (2) The model uses the pre-trained BERT language model to classify relationships, while also integrating data from the target entities. After identifying the target entities, the model transfers the relevant information through the pre-trained architecture and integrates the encoded data for both entities. The results of the experiment show an improved performance, with respect to previous models.
Collapse
Affiliation(s)
- Marco Molina
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| | - Cristina Jiménez
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| | - Carlos Montenegro
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| |
Collapse
|
13
|
Deng H, Li Q, Liu Y, Zhu J. MTMG: A multi-task model with multi-granularity information for drug-drug interaction extraction. Heliyon 2023; 9:e16819. [PMID: 37484258 PMCID: PMC10360954 DOI: 10.1016/j.heliyon.2023.e16819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 07/25/2023] Open
Abstract
Drug-drug interactions (DDIs) extraction includes identifying drug entities and interactions between drug pairs from the biomedical corpus. The discovery of potential DDIs aids in our understanding of the mechanisms underlying adverse reactions or combination therapy to improve patient safety. The manual extraction of DDIs is very time-consuming and expensive; therefore, computer-aided extraction of DDIs is vital. Many neural network-based methods have been proposed and achieved good efficiency in the extraction of DDIs over the years. However, most studies improved the performance of DDIs extraction with various external drug features while directly using golden drug entities, leading to error propagation and low universality in practical application. In this paper, we propose a new multi-task framework called MTMG, which changes DDIs extraction from a sentence-level classification task to a sequence labeling task named Drug-Specified Token Classification (DSTC). The proposed approach, MTMG, jointly trains DSTC with drug named entity recognition (DNER) and two sentence-level auxiliary tasks we designed. We aim to improve the performance of the entire DDIs extraction pipeline by better using the correlation between entities and relationships and, to the extent possible, using the information of varying granularity implied in the dataset. Experimental results show that MTMG can both improve the accuracy of DNER and DDIs extraction and outperforms state-of-the-art technique.
Collapse
|
14
|
Kim S, Yoon J, Kwon O. Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model. Bioengineering (Basel) 2023; 10:bioengineering10050586. [PMID: 37237656 DOI: 10.3390/bioengineering10050586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/06/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
The identification of drug-drug and chemical-protein interactions is essential for understanding unpredictable changes in the pharmacological effects of drugs and mechanisms of diseases and developing therapeutic drugs. In this study, we extract drug-related interactions from the DDI (Drug-Drug Interaction) Extraction-2013 Shared Task dataset and the BioCreative ChemProt (Chemical-Protein) dataset using various transfer transformers. We propose BERTGAT that uses a graph attention network (GAT) to take into account the local structure of sentences and embedding features of nodes under the self-attention scheme and investigate whether incorporating syntactic structure can help relation extraction. In addition, we suggest T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the relation classification problem by removing the self-attention layer in the decoder block. Furthermore, we evaluated the potential of biomedical relation extraction of GPT-3 (Generative Pre-trained Transformer) using GPT-3 variant models. As a result, T5slim_dec, which is a model with a tailored decoder designed for classification problems within the T5 architecture, demonstrated very promising performances for both tasks. We achieved an accuracy of 91.15% in the DDI dataset and an accuracy of 94.29% for the CPR (Chemical-Protein Relation) class group in ChemProt dataset. However, BERTGAT did not show a significant performance improvement in the aspect of relation extraction. We demonstrated that transformer-based approaches focused only on relationships between words are implicitly eligible to understand language well without additional knowledge such as structural information.
Collapse
Affiliation(s)
- Seonho Kim
- Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea
| | - Juntae Yoon
- VAIV Company, Seoul 04107, Republic of Korea
| | - Ohyoung Kwon
- Department of Future Technology, Korea University of Technology and Education, Cheonan-si 31253, Republic of Korea
| |
Collapse
|
15
|
EMSI-BERT: Asymmetrical Entity-Mask Strategy and Symbol-Insert Structure for Drug–Drug Interaction Extraction Based on BERT. Symmetry (Basel) 2023. [DOI: 10.3390/sym15020398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Drug-drug interaction (DDI) extraction has seen growing usage of deep models, but their effectiveness has been restrained by limited domain-labeled data, a weak representation of co-occurring entities, and poor adaptation of downstream tasks. This paper proposes a novel EMSI-BERT method for drug–drug interaction extraction based on an asymmetrical Entity-Mask strategy and a Symbol-Insert structure. Firstly, the EMSI-BERT method utilizes the asymmetrical Entity-Mask strategy to address the weak representation of co-occurring entity information using the drug entity dictionary in the pre-training BERT task. Secondly, the EMSI-BERT method incorporates four symbols to distinguish different entity combinations of the same input sequence and utilizes the Symbol-Insert structure to address the week adaptation of downstream tasks in the fine-tuning stage of DDI classification. The experimental results showed that EMSI-BERT for DDI extraction achieved a 0.82 F1-score on DDI-Extraction 2013, and it improved the performances of the multi-classification task of DDI extraction and the two-classification task of DDI detection. Compared with baseline Basic-BERT, the proposed pre-training BERT with the asymmetrical Entity-Mask strategy could obtain better effects in downstream tasks and effectively limit “Other” samples’ effects. The model visualization results illustrated that EMSI-BERT could extract semantic information at different levels and granularities in a continuous space.
Collapse
|
16
|
Raza S, Schwartz B. Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach. BMC Med Inform Decis Mak 2023; 23:20. [PMID: 36703154 PMCID: PMC9879259 DOI: 10.1186/s12911-023-02117-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 01/20/2023] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. OBJECTIVE This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. METHODS The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. RESULTS The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease's presence and symptoms prevalence in patients. CONCLUSIONS A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.
Collapse
Affiliation(s)
- Shaina Raza
- grid.415400.40000 0001 1505 2354Public Health Ontario (PHO), Toronto, ON Canada ,grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada
| | - Brian Schwartz
- grid.415400.40000 0001 1505 2354Public Health Ontario (PHO), Toronto, ON Canada ,grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada
| |
Collapse
|
17
|
Yang J, Ding Y, Long S, Poon J, Han SC. DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction. Front Digit Health 2023; 5:1154133. [PMID: 37168529 PMCID: PMC10164961 DOI: 10.3389/fdgth.2023.1154133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/03/2023] [Indexed: 05/13/2023] Open
Abstract
Introduction Drug-drug interaction (DDI) may lead to adverse reactions in patients, thus it is important to extract such knowledge from biomedical texts. However, previously proposed approaches typically focus on capturing sentence-aspect information while ignoring valuable knowledge concerning the whole corpus. In this paper, we propose a Multi-aspect Graph-based DDI extraction model, named DDI-MuG. Methods We first employ a bio-specific pre-trained language model to obtain the token contextualized representations. Then we use two graphs to get syntactic information from input instance and word co-occurrence information within the entire corpus, respectively. Finally, we combine the representations of drug entities and verb tokens for the final classification. Results To validate the effectiveness of the proposed model, we perform extensive experiments on two widely used DDI extraction dataset, DDIExtraction-2013 and TAC 2018. It is encouraging to see that our model outperforms all twelve state-of-the-art models. Discussion In contrast to the majority of earlier models that rely on the black-box approach, our model enables visualization of crucial words and their interrelationships by utilizing edge information from two graphs. To the best of our knowledge, this is the first model that explores multi-aspect graphs to the DDI extraction task, and we hope it can establish a foundation for more robust multi-aspect works in the future.
Collapse
Affiliation(s)
- Jie Yang
- School of Computer Science, The University of Sydney, Sydney, NSW, Australia
| | - Yihao Ding
- School of Computer Science, The University of Sydney, Sydney, NSW, Australia
| | - Siqu Long
- School of Computer Science, The University of Sydney, Sydney, NSW, Australia
| | - Josiah Poon
- School of Computer Science, The University of Sydney, Sydney, NSW, Australia
| | - Soyeon Caren Han
- School of Computer Science, The University of Sydney, Sydney, NSW, Australia
- Department of Computer Science, University of Western Australia, Perth, WA, Australia
- Correspondence: Soyeon Caren Han ;
| |
Collapse
|
18
|
Shtar G, Greenstein-Messica A, Mazuz E, Rokach L, Shapira B. Predicting drug characteristics using biomedical text embedding. BMC Bioinformatics 2022; 23:526. [PMID: 36476573 PMCID: PMC9730627 DOI: 10.1186/s12859-022-05083-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Drug-drug interactions (DDIs) are preventable causes of medical injuries and often result in doctor and emergency room visits. Previous research demonstrates the effectiveness of using matrix completion approaches based on known drug interactions to predict unknown Drug-drug interactions. However, in the case of a new drug, where there is limited or no knowledge regarding the drug's existing interactions, such an approach is unsuitable, and other drug's preferences can be used to accurately predict new Drug-drug interactions. METHODS We propose adjacency biomedical text embedding (ABTE) to address this limitation by using a hybrid approach which combines known drugs' interactions and the drug's biomedical text embeddings to predict the DDIs of both new and well known drugs. RESULTS Our evaluation demonstrates the superiority of this approach compared to recently published DDI prediction models and matrix factorization-based approaches. Furthermore, we compared the use of different text embedding methods in ABTE, and found that the concept embedding approach, which involves biomedical information in the embedding process, provides the highest performance for this task. Additionally, we demonstrate the effectiveness of leveraging biomedical text embedding for additional drugs' biomedical prediction task by presenting text embedding's contribution to a multi-modal pregnancy drug safety classification. CONCLUSION Text and concept embeddings created by analyzing a domain-specific large-scale biomedical corpora can be used for predicting drug-related properties such as Drug-drug interactions and drug safety prediction. Prediction models based on the embeddings resulted in comparable results to hand-crafted features, however text embeddings do not require manual categorization or data collection and rely solely on the published literature.
Collapse
Affiliation(s)
- Guy Shtar
- grid.7489.20000 0004 1937 0511Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Asnat Greenstein-Messica
- grid.7489.20000 0004 1937 0511Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Eyal Mazuz
- grid.7489.20000 0004 1937 0511Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Lior Rokach
- grid.7489.20000 0004 1937 0511Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Bracha Shapira
- grid.7489.20000 0004 1937 0511Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
19
|
Ren ZH, You ZH, Yu CQ, Li LP, Guan YJ, Guo LX, Pan J. A biomedical knowledge graph-based method for drug-drug interactions prediction through combining local and global features with deep neural networks. Brief Bioinform 2022; 23:6692550. [PMID: 36070624 DOI: 10.1093/bib/bbac363] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/23/2022] [Accepted: 08/02/2022] [Indexed: 11/12/2022] Open
Abstract
Drug-drug interactions (DDIs) prediction is a challenging task in drug development and clinical application. Due to the extremely large complete set of all possible DDIs, computer-aided DDIs prediction methods are getting lots of attention in the pharmaceutical industry and academia. However, most existing computational methods only use single perspective information and few of them conduct the task based on the biomedical knowledge graph (BKG), which can provide more detailed and comprehensive drug lateral side information flow. To this end, a deep learning framework, namely DeepLGF, is proposed to fully exploit BKG fusing local-global information to improve the performance of DDIs prediction. More specifically, DeepLGF first obtains chemical local information on drug sequence semantics through a natural language processing algorithm. Then a model of BFGNN based on graph neural network is proposed to extract biological local information on drug through learning embedding vector from different biological functional spaces. The global feature information is extracted from the BKG by our knowledge graph embedding method. In DeepLGF, for fusing local-global features well, we designed four aggregating methods to explore the most suitable ones. Finally, the advanced fusing feature vectors are fed into deep neural network to train and predict. To evaluate the prediction performance of DeepLGF, we tested our method in three prediction tasks and compared it with state-of-the-art models. In addition, case studies of three cancer-related and COVID-19-related drugs further demonstrated DeepLGF's superior ability for potential DDIs prediction. The webserver of the DeepLGF predictor is freely available at http://120.77.11.78/DeepLGF/.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi'an 710100, China.,School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710100, China
| | - Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi 830052, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi'an 710100, China
| | - Lu-Xiang Guo
- School of Information Engineering, Xijing University, Xi'an 710100, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi'an 710100, China
| |
Collapse
|
20
|
Chen J, Sun X, Jin X, Sutcliffe R. Extracting drug-drug interactions from no-blinding texts using key semantic sentences and GHM loss. J Biomed Inform 2022; 135:104192. [PMID: 36064114 DOI: 10.1016/j.jbi.2022.104192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 08/28/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022]
Abstract
The extraction of drug-drug interactions (DDIs) is an important task in the field of biomedical research, which can reduce unexpected health risks during patient treatment. Previous work indicates that methods using external drug information have a much higher performance than those methods not using it. However, the use of external drug information is time-consuming and resource-costly. In this work, we propose a novel method for extracting DDIs which does not use external drug information, but still achieves comparable performance. First, we no longer convert the drug name to standard tokens such as DRUG0, the method commonly used in previous research. Instead, full drug names with drug entity marking are input to BioBERT, allowing us to enhance the selected drug entity pair. Second, we adopt the Key Semantic Sentence approach to emphasize the words closely related to the DDI relation of the selected drug pair. After the above steps, the misclassification of similar instances which are created from the same sentence but corresponding to different pairs of drug entities can be significantly reduced. Then, we employ the Gradient Harmonizing Mechanism (GHM) loss to reduce the weight of mislabeled instances and easy-to-classify instances, both of which can lead to poor performance in DDI extraction. Overall, we demonstrate in this work that it is better not to use drug blinding with BioBERT, and show that GHM performs better than Cross-Entropy loss if the proportion of label noise is less than 30%. The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus (a standard English DDI corpus), which fills the performance gap (4%) between methods that rely on and do not rely on external drug information.
Collapse
Affiliation(s)
- Jiacheng Chen
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China
| | - Xia Sun
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China.
| | - Xin Jin
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China
| | - Richard Sutcliffe
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China; School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK.
| |
Collapse
|
21
|
Nabożny A, Balcerzak B, Morzy M, Wierzbicki A, Savov P, Warpechowski K. Improving medical experts' efficiency of misinformation detection: an exploratory study. WORLD WIDE WEB 2022; 26:773-798. [PMID: 35975112 PMCID: PMC9371952 DOI: 10.1007/s11280-022-01084-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/03/2022] [Accepted: 07/04/2022] [Indexed: 06/15/2023]
Abstract
Fighting medical disinformation in the era of the pandemic is an increasingly important problem. Today, automatic systems for assessing the credibility of medical information do not offer sufficient precision, so human supervision and the involvement of medical expert annotators are required. Our work aims to optimize the utilization of medical experts' time. We also equip them with tools for semi-automatic initial verification of the credibility of the annotated content. We introduce a general framework for filtering medical statements that do not require manual evaluation by medical experts, thus focusing annotation efforts on non-credible medical statements. Our framework is based on the construction of filtering classifiers adapted to narrow thematic categories. This allows medical experts to fact-check and identify over two times more non-credible medical statements in a given time interval without applying any changes to the annotation flow. We verify our results across a broad spectrum of medical topic areas. We perform quantitative, as well as exploratory analysis on our output data. We also point out how those filtering classifiers can be modified to provide experts with different types of feedback without any loss of performance.
Collapse
Affiliation(s)
| | | | - Mikołaj Morzy
- Polish-Japanese Academy of Information Technology, Warsaw, Poland
- Poznań University of Technology, Poznań, Poland
| | - Adam Wierzbicki
- Polish-Japanese Academy of Information Technology, Warsaw, Poland
| | - Pavel Savov
- Polish-Japanese Academy of Information Technology, Warsaw, Poland
| | | |
Collapse
|
22
|
Call for papers: Semantics-enabled biomedical literature analytics. J Biomed Inform 2022; 132:104134. [PMID: 35850379 DOI: 10.1016/j.jbi.2022.104134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 07/04/2022] [Indexed: 11/20/2022]
|
23
|
Gated tree-structured RecurNN for detecting biomedical event trigger. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
Ren ZH, Yu CQ, Li LP, You ZH, Pan J, Guan YJ, Guo LX. BioChemDDI: Predicting Drug-Drug Interactions by Fusing Biochemical and Structural Information through a Self-Attention Mechanism. BIOLOGY 2022; 11:biology11050758. [PMID: 35625486 PMCID: PMC9138786 DOI: 10.3390/biology11050758] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/12/2022] [Accepted: 05/13/2022] [Indexed: 01/13/2023]
Abstract
Simple Summary Throughout history, combining drugs has been a common method in the fight against complex diseases. However, potential drug–drug interactions could give rise to unknown toxicity issues, which requires the urgent proposal of efficient methods to identify potential interactions.We use computer technology and machine learning techniques to propose a novel computational framework to calculate scores of drug–drug interaction probability for simplifying the screening process. Additionally, we built an online prescreening tool for biological researchers to further verify possible interactions in the fields of biomedicine and pharmacology. Overall, our study can provide new insights and approaches for rapidly identifying potential drug–drug interactions. Abstract During the development of drug and clinical applications, due to the co-administration of different drugs that have a high risk of interfering with each other’s mechanisms of action, correctly identifying potential drug–drug interactions (DDIs) is important to avoid a reduction in drug therapeutic activities and serious injuries to the organism. Therefore, to explore potential DDIs, we develop a computational method of integrating multi-level information. Firstly, the information of chemical sequence is fully captured by the Natural Language Processing (NLP) algorithm, and multiple biological function similarity information is fused by Similarity Network Fusion (SNF). Secondly, we extract deep network structure information through Hierarchical Representation Learning for Networks (HARP). Then, a highly representative comprehensive feature descriptor is constructed through the self-attention module that efficiently integrates biochemical and network features. Finally, a deep neural network (DNN) is employed to generate the prediction results. Contrasted with the previous supervision model, BioChemDDI innovatively introduced graph collapse for extracting a network structure and utilized the biochemical information during the pre-training process. The prediction results of the benchmark dataset indicate that BioChemDDI outperforms other existing models. Moreover, the case studies related to three cancer diseases, including breast cancer, hepatocellular carcinoma and malignancies, were analyzed using BioChemDDI. As a result, 24, 18 and 20 out of the top 30 predicted cancer-related drugs were confirmed by the databases. These experimental results demonstrate that BioChemDDI is a useful model to predict DDIs and can provide reliable candidates for biological experiments. The web server of BioChemDDI predictor is freely available to conduct further studies.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an 710123, China; (Z.-H.R.); (Y.-J.G.); (L.-X.G.); (J.P.)
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an 710123, China; (Z.-H.R.); (Y.-J.G.); (L.-X.G.); (J.P.)
- Correspondence: (C.-Q.Y.); (L.-P.L.); Tel.: +86-189-9118-5758 (C.-Q.Y.); +86-173-9276-3836 (L.-P.L.)
| | - Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi 830052, China
- Correspondence: (C.-Q.Y.); (L.-P.L.); Tel.: +86-189-9118-5758 (C.-Q.Y.); +86-173-9276-3836 (L.-P.L.)
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China;
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an 710123, China; (Z.-H.R.); (Y.-J.G.); (L.-X.G.); (J.P.)
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an 710123, China; (Z.-H.R.); (Y.-J.G.); (L.-X.G.); (J.P.)
| | - Lu-Xiang Guo
- School of Information Engineering, Xijing University, Xi’an 710123, China; (Z.-H.R.); (Y.-J.G.); (L.-X.G.); (J.P.)
| |
Collapse
|
25
|
He H, Chen G, Yu-Chian Chen C. 3DGT-DDI: 3D graph and text based neural network for drug-drug interaction prediction. Brief Bioinform 2022; 23:6576451. [PMID: 35511112 DOI: 10.1093/bib/bbac134] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/16/2022] [Accepted: 03/21/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Drug-drug interactions (DDIs) occur during the combination of drugs. Identifying potential DDI helps us to study the mechanism behind the combination medication or adverse reactions so as to avoid the side effects. Although many artificial intelligence methods predict and mine potential DDI, they ignore the 3D structure information of drug molecules and do not fully consider the contribution of molecular substructure in DDI. RESULTS We proposed a new deep learning architecture, 3DGT-DDI, a model composed of a 3D graph neural network and pre-trained text attention mechanism. We used 3D molecular graph structure and position information to enhance the prediction ability of the model for DDI, which enabled us to deeply explore the effect of drug substructure on DDI relationship. The results showed that 3DGT-DDI outperforms other state-of-the-art baselines. It achieved an 84.48% macro F1 score in the DDIExtraction 2013 shared task dataset. Also, our 3D graph model proves its performance and explainability through weight visualization on the DrugBank dataset. 3DGT-DDI can help us better understand and identify potential DDI, thereby helping to avoid the side effects of drug mixing. AVAILABILITY The source code and data are available at https://github.com/hehh77/3DGT-DDI.
Collapse
Affiliation(s)
- Haohuai He
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Guanxing Chen
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Calvin Yu-Chian Chen
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China.,Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
26
|
Vo TH, Nguyen NTK, Kha QH, Le NQK. On the road to explainable AI in drug-drug interactions prediction: a systematic review. Comput Struct Biotechnol J 2022; 20:2112-2123. [PMID: 35832629 PMCID: PMC9092071 DOI: 10.1016/j.csbj.2022.04.021] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/15/2022] [Accepted: 04/15/2022] [Indexed: 12/26/2022] Open
Abstract
A systematic review on applications of explainable AI in drug-drug interaction prediction. Review is conducted on a comprehensive set of 94 papers from five prestigious databases. Discussions on the promises and challenges of explainable AI algorithms for drug-drug interaction prediction.
Over the past decade, polypharmacy instances have been common in multi-diseases treatment. However, unwanted drug-drug interactions (DDIs) that might cause unexpected adverse drug events (ADEs) in multiple regimens therapy remain a significant issue. Since artificial intelligence (AI) is ubiquitous today, many AI prediction models have been developed to predict DDIs to support clinicians in pharmacotherapy-related decisions. However, even though DDI prediction models have great potential for assisting physicians in polypharmacy decisions, there are still concerns regarding the reliability of AI models due to their black-box nature. Building AI models with explainable mechanisms can augment their transparency to address the above issue. Explainable AI (XAI) promotes safety and clarity by showing how decisions are made in AI models, especially in critical tasks like DDI predictions. In this review, a comprehensive overview of AI-based DDI prediction, including the publicly available source for AI-DDIs studies, the methods used in data manipulation and feature preprocessing, the XAI mechanisms to promote trust of AI, especially for critical tasks as DDIs prediction, the modeling methods, is provided. Limitations and the future directions of XAI in DDIs are also discussed.
Collapse
Affiliation(s)
- Thanh Hoa Vo
- Master Program in Clinical Genomics and Proteomics, College of Pharmacy, Taipei Medical University, Taipei 110, Taiwan
| | - Ngan Thi Kim Nguyen
- School of Nutrition and Health Sciences, College of Nutrition, Taipei Medical University, Taipei 11031, Taiwan
| | - Quang Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
- Corresponding author at: Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.
| |
Collapse
|
27
|
Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med 2022; 28:31-38. [PMID: 35058619 DOI: 10.1038/s41591-021-01614-0] [Citation(s) in RCA: 519] [Impact Index Per Article: 259.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023]
Abstract
Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human-AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI's potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide.
Collapse
Affiliation(s)
- Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard University, Cambridge, MA, USA
| | - Emma Chen
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Oishi Banerjee
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Eric J Topol
- Scripps Translational Science Institute, San Diego, CA, USA.
| |
Collapse
|
28
|
Huang L, Lin J, Li X, Song L, Zheng Z, Wong KC. EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information. Brief Bioinform 2021; 23:6425806. [PMID: 34791012 DOI: 10.1093/bib/bbab451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/06/2021] [Accepted: 09/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pretrained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and packed BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pretrained language model BioGPT-2 where the generation sentences are selected based on filtering rules. RESULTS We evaluated the classification part on 'DDIs 2013' dataset and 'DTIs' dataset, achieving the F1 scores of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the existing growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset demonstrated the potential of EGFI to identify novel drug relationships. AVAILABILITY Source code are publicly available at https://github.com/Layne-Huang/EGFI.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, China
| | - Linqi Song
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.,Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
29
|
Majewska O, Collins C, Baker S, Björne J, Brown SW, Korhonen A, Palmer M. BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine. J Biomed Semantics 2021; 12:12. [PMID: 34266499 PMCID: PMC8280585 DOI: 10.1186/s13326-021-00247-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 07/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. Results We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. Conclusion This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.
Collapse
Affiliation(s)
- Olga Majewska
- Language Technology Laboratory, MMLL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.
| | - Charlotte Collins
- Language Technology Laboratory, MMLL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Simon Baker
- Language Technology Laboratory, MMLL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Jari Björne
- Department of Future Technologies, University of Turku, Vesilinnantie 5, Turku, 20500, Finland
| | - Susan Windisch Brown
- Department of Linguistics, University of Colorado Boulder, 295 UCB, Boulder, 80309-0295, Colorado, USA
| | - Anna Korhonen
- Language Technology Laboratory, MMLL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Martha Palmer
- Department of Linguistics, University of Colorado Boulder, 295 UCB, Boulder, 80309-0295, Colorado, USA
| |
Collapse
|