1
|
Yuan J, Zhang F, Qiu Y, Lin H, Zhang Y. Document-level biomedical relation extraction via hierarchical tree graph and relation segmentation module. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae418. [PMID: 38917409 DOI: 10.1093/bioinformatics/btae418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/27/2024] [Accepted: 06/24/2024] [Indexed: 06/27/2024]
Abstract
MOTIVATION Biomedical relation extraction at the document level (Bio-DocRE) involves extracting relation instances from biomedical texts that span multiple sentences, often containing various entity concepts such as genes, diseases, chemicals, variants, etc. Currently, this task is usually implemented based on graphs or transformers. However, most work directly models entity features to relation prediction, ignoring the effectiveness of entity pair information as an intermediate state for relation prediction. In this article, we decouple this task into a three-stage process to capture sufficient information for improving relation prediction. RESULTS We propose an innovative framework HTGRS for Bio-DocRE, which constructs a hierarchical tree graph (HTG) to integrate key information sources in the document, achieving relation reasoning based on entity. In addition, inspired by the idea of semantic segmentation, we conceptualize the task as a table-filling problem and develop a relation segmentation (RS) module to enhance relation reasoning based on the entity pair. Extensive experiments on three datasets show that the proposed framework outperforms the state-of-the-art methods and achieves superior performance. AVAILABILITY AND IMPLEMENTATION Our source code is available at https://github.com/passengeryjy/HTGRS.
Collapse
Affiliation(s)
- Jianyuan Yuan
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Fengyu Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yimeng Qiu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Shi Y, He M, Chen J, Han F, Cai Y. SubGE-DDI: A new prediction model for drug-drug interaction established through biomedical texts and drug-pairs knowledge subgraph enhancement. PLoS Comput Biol 2024; 20:e1011989. [PMID: 38626249 PMCID: PMC11051621 DOI: 10.1371/journal.pcbi.1011989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 04/26/2024] [Accepted: 03/11/2024] [Indexed: 04/18/2024] Open
Abstract
Biomedical texts provide important data for investigating drug-drug interactions (DDIs) in the field of pharmacovigilance. Although researchers have attempted to investigate DDIs from biomedical texts and predict unknown DDIs, the lack of accurate manual annotations significantly hinders the performance of machine learning algorithms. In this study, a new DDI prediction framework, Subgraph Enhance model, was developed for DDI (SubGE-DDI) to improve the performance of machine learning algorithms. This model uses drug pairs knowledge subgraph information to achieve large-scale plain text prediction without many annotations. This model treats DDI prediction as a multi-class classification problem and predicts the specific DDI type for each drug pair (e.g. Mechanism, Effect, Advise, Interact and Negative). The drug pairs knowledge subgraph was derived from a huge drug knowledge graph containing various public datasets, such as DrugBank, TwoSIDES, OffSIDES, DrugCentral, EntrezeGene, SMPDB (The Small Molecule Pathway Database), CTD (The Comparative Toxicogenomics Database) and SIDER. The SubGE-DDI was evaluated from the public dataset (SemEval-2013 Task 9 dataset) and then compared with other state-of-the-art baselines. SubGE-DDI achieves 83.91% micro F1 score and 84.75% macro F1 score in the test dataset, outperforming the other state-of-the-art baselines. These findings show that the proposed drug pairs knowledge subgraph-assisted model can effectively improve the prediction performance of DDIs from biomedical texts.
Collapse
Affiliation(s)
- Yiyang Shi
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Mingxiu He
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Junheng Chen
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Fangfang Han
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, China
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, China
| | - Yongming Cai
- School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, China
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, China
- Guangdong Provincial Traditional Chinese Medicine Precision Medicine Big Data Engineering Technology Research Center, Guangzhou, China
| |
Collapse
|
3
|
Lin S, Mao X, Hong L, Lin S, Wei DQ, Xiong Y. MATT-DDI: Predicting multi-type drug-drug interactions via heterogeneous attention mechanisms. Methods 2023; 220:1-10. [PMID: 37858611 DOI: 10.1016/j.ymeth.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/21/2023] Open
Abstract
The joint use of multiple drugs can result in adverse drug-drug interactions (DDIs) and side effects that harm the body. Accurate identification of DDIs is crucial for avoiding accidental drug side effects and understanding potential mechanisms underlying DDIs. Several computational methods have been proposed for multi-type DDI prediction, but most rely on the similarity profiles of drugs as the drug feature vectors, which may result in information leakage and overoptimistic performance when predicting interactions between new drugs. To address this issue, we propose a novel method, MATT-DDI, for predicting multi-type DDIs based on the original feature vectors of drugs and multiple attention mechanisms. MATT-DDI consists of three main modules: the top k most similar drug pair selection module, heterogeneous attention mechanism module and multi‑type DDI prediction module. Firstly, based on the feature vector of the input drug pair (IDP), k drug pairs that are most similar to the input drug pair from the training dataset are selected according to cosine similarity between drug pairs. Then, the vectors of k selected drug pairs are averaged to obtain a new drug pair (NDP). Next, IDP and NDP are fed into heterogeneous attention modules, including scaled dot product attention and bilinear attention, to extract latent feature vectors. Finally, these latent feature vectors are taken as input of the classification module to predict DDI types. We evaluated MATT-DDI on three different tasks. The experimental results show that MATT-DDI provides better or comparable performance compared to several state-of-the-art methods, and its feasibility is supported by case studies. MATT-DDI is a robust model for predicting multi-type DDIs with excellent performance and no information leakage.
Collapse
Affiliation(s)
- Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liang Hong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China; School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shuangjun Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Zhongjing Research and Industrialization Institute of Chinese Medicine, Nanyang 473006, China; Peng Cheng National Laboratory, Shenzhen 518055, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China.
| |
Collapse
|
4
|
Zhao Y, Yin J, Zhang L, Zhang Y, Chen X. Drug-drug interaction prediction: databases, web servers and computational models. Brief Bioinform 2023; 25:bbad445. [PMID: 38113076 PMCID: PMC10782925 DOI: 10.1093/bib/bbad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 11/14/2023] [Indexed: 12/21/2023] Open
Abstract
In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug-drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
5
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
6
|
Xie W, Fan K, Zhang S, Li L. Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature. J Biomed Semantics 2023; 14:5. [PMID: 37248476 PMCID: PMC10228061 DOI: 10.1186/s13326-023-00287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/29/2023] [Indexed: 05/31/2023] Open
Abstract
BACKGROUND Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.
Collapse
Affiliation(s)
- Weixin Xie
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Kunjie Fan
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Shijun Zhang
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Lang Li
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| |
Collapse
|
7
|
Liu S, Zhang Y, Cui Y, Qiu Y, Deng Y, Zhang Z, Zhang W. Enhancing Drug-Drug Interaction Prediction Using Deep Attention Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:976-985. [PMID: 35511833 DOI: 10.1109/tcbb.2022.3172421] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Drug-drug interactions are one of the main concerns in drug discovery. Accurate prediction of drug-drug interactions plays a key role in increasing the efficiency of drug research and safety when multiple drugs are co-prescribed. With various data sources that describe the relationships and properties between drugs, the comprehensive approach that integrates multiple data sources would be considerably effective in making high-accuracy prediction. In this paper, we propose a Deep Attention Neural Network based Drug-Drug Interaction prediction framework, abbreviated as DANN-DDI, to predict unobserved drug-drug interactions. First, we construct multiple drug feature networks and learn drug representations from these networks using the graph embedding method; then, we concatenate the learned drug embeddings and design an attention neural network to learn representations of drug-drug pairs; finally, we adopt a deep neural network to accurately predict drug-drug interactions. The experimental results demonstrate that our model DANN-DDI has improved prediction performance compared with state-of-the-art methods. Moreover, the proposed model can predict novel drug-drug interactions and drug-drug interaction-associated events.
Collapse
|
8
|
EMSI-BERT: Asymmetrical Entity-Mask Strategy and Symbol-Insert Structure for Drug–Drug Interaction Extraction Based on BERT. Symmetry (Basel) 2023. [DOI: 10.3390/sym15020398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Drug-drug interaction (DDI) extraction has seen growing usage of deep models, but their effectiveness has been restrained by limited domain-labeled data, a weak representation of co-occurring entities, and poor adaptation of downstream tasks. This paper proposes a novel EMSI-BERT method for drug–drug interaction extraction based on an asymmetrical Entity-Mask strategy and a Symbol-Insert structure. Firstly, the EMSI-BERT method utilizes the asymmetrical Entity-Mask strategy to address the weak representation of co-occurring entity information using the drug entity dictionary in the pre-training BERT task. Secondly, the EMSI-BERT method incorporates four symbols to distinguish different entity combinations of the same input sequence and utilizes the Symbol-Insert structure to address the week adaptation of downstream tasks in the fine-tuning stage of DDI classification. The experimental results showed that EMSI-BERT for DDI extraction achieved a 0.82 F1-score on DDI-Extraction 2013, and it improved the performances of the multi-classification task of DDI extraction and the two-classification task of DDI detection. Compared with baseline Basic-BERT, the proposed pre-training BERT with the asymmetrical Entity-Mask strategy could obtain better effects in downstream tasks and effectively limit “Other” samples’ effects. The model visualization results illustrated that EMSI-BERT could extract semantic information at different levels and granularities in a continuous space.
Collapse
|
9
|
Xie J, Zhao C, Ouyang J, He H, Huang D, Liu M, Wang J, Zhang W. TP-DDI: A Two-Pathway Deep Neural Network for Drug-Drug Interaction Prediction. Interdiscip Sci 2022; 14:895-905. [PMID: 35622314 DOI: 10.1007/s12539-022-00524-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/01/2022] [Accepted: 04/18/2022] [Indexed: 06/15/2023]
Abstract
Adverse drug-drug interactions (DDIs) can severely damage the body. Thus, it is essential to accurately predict DDIs. DDIs are complex processes in which many factors can cause interactions. Rather than merely considering one or two of the factors, we design a two-pathway drug-drug interaction framework named TP-DDI that uses multimodal data for DDI prediction. TP-DDI effectively explores the combined effect of a topological structure-based pathway and a biomedical object similarity-based pathway to obtain multimodal drug representations. For the topology-based pathway, we focus on drug chemistry structures through the self-attention mechanism, which can capture hidden critical relationships, especially between pairs of atoms at remote topological distances. For the similarity-based pathway, our model can emphasize useful biomedical objects according to the channel weights. Finally, the fusion of multimodal data provides a holistic view of DDIs by learning the complementary features. On a real-world dataset, experiments show that TP-DDI can achieve better performance than the state-of-the-art models. Moreover, we can find the most critical substructures with certain interpretability in the newly predicted DDIs.
Collapse
Affiliation(s)
- Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Chang Zhao
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Jiaming Ouyang
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Hongjian He
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Dingkai Huang
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Mengjiao Liu
- School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
| | - Jiao Wang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| | - Wenjun Zhang
- College of Information Technology, Shanghai Jianqiao University, Shanghai, 201306, China.
| |
Collapse
|
10
|
Zhu J, Liu Y, Zhang Y, Chen Z, Wu X. Multi-Attribute Discriminative Representation Learning for Prediction of Adverse Drug-Drug Interaction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10129-10144. [PMID: 34914581 DOI: 10.1109/tpami.2021.3135841] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Adverse drug-drug interaction (ADDI) is a significant life-threatening issue, posing a leading cause of hospitalizations and deaths in healthcare systems. This paper proposes a unified Multi-Attribute Discriminative Representation Learning (MADRL) model for ADDI prediction. Unlike the existing works that equally treat features of each attribute without discrimination and do not consider the underlying relationship among drugs, we first develop a regularized optimization problem based on CUR matrix decomposition for joint representative drug and discriminative feature selection such that the selected drugs and features can well approximate the original feature spaces and the critical factors discriminative to ADDIs can be properly explored. Different from the existing models that ignore the consistent and unique properties among attributes, a Generative Adversarial Network (GAN) framework is then designed to capture the inter-attribute shared and intra-attribute specific representations of adverse drug pairs for exploiting their consensus and complementary information in ADDI prediction. Meanwhile, MADRL is compatible with any kind of attributes and capable of exploring their respective effects on ADDI prediction. An iterative algorithm based on the alternating direction method of multipliers is developed for optimization. Experiments on publicly available dataset demonstrate the effectiveness of MADRL when compared with eleven baselines and its six variants.
Collapse
|
11
|
Chen J, Sun X, Jin X, Sutcliffe R. Extracting drug-drug interactions from no-blinding texts using key semantic sentences and GHM loss. J Biomed Inform 2022; 135:104192. [PMID: 36064114 DOI: 10.1016/j.jbi.2022.104192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 08/28/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022]
Abstract
The extraction of drug-drug interactions (DDIs) is an important task in the field of biomedical research, which can reduce unexpected health risks during patient treatment. Previous work indicates that methods using external drug information have a much higher performance than those methods not using it. However, the use of external drug information is time-consuming and resource-costly. In this work, we propose a novel method for extracting DDIs which does not use external drug information, but still achieves comparable performance. First, we no longer convert the drug name to standard tokens such as DRUG0, the method commonly used in previous research. Instead, full drug names with drug entity marking are input to BioBERT, allowing us to enhance the selected drug entity pair. Second, we adopt the Key Semantic Sentence approach to emphasize the words closely related to the DDI relation of the selected drug pair. After the above steps, the misclassification of similar instances which are created from the same sentence but corresponding to different pairs of drug entities can be significantly reduced. Then, we employ the Gradient Harmonizing Mechanism (GHM) loss to reduce the weight of mislabeled instances and easy-to-classify instances, both of which can lead to poor performance in DDI extraction. Overall, we demonstrate in this work that it is better not to use drug blinding with BioBERT, and show that GHM performs better than Cross-Entropy loss if the proportion of label noise is less than 30%. The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus (a standard English DDI corpus), which fills the performance gap (4%) between methods that rely on and do not rely on external drug information.
Collapse
Affiliation(s)
- Jiacheng Chen
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China
| | - Xia Sun
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China.
| | - Xin Jin
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China
| | - Richard Sutcliffe
- School of Information Science and Technology, Northwest University, Xi'an, 710127, China; School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK.
| |
Collapse
|
12
|
Duan B, Peng J, Zhang Y. IMSE: interaction information attention and molecular structure based drug drug interaction extraction. BMC Bioinformatics 2022; 23:338. [PMID: 35965308 PMCID: PMC9375903 DOI: 10.1186/s12859-022-04876-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 08/03/2022] [Indexed: 11/10/2022] Open
Abstract
Background Extraction of drug drug interactions from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around relation extraction using bidirectional long short-term memory networks (BiLSTM) and BERT model which does not attain the best feature representations. Results Our proposed DDI (drug drug interaction) prediction model provides multiple advantages: (1) The newly proposed attention vector is added to better deal with the problem of overlapping relations, (2) The molecular structure information of drugs is integrated into the model to better express the functional group structure of drugs, (3) We also added text features that combined the T-distribution and chi-square distribution to make the model more focused on drug entities and (4) it achieves similar or better prediction performance (F-scores up to 85.16%) compared to state-of-the-art DDI models when tested on benchmark datasets. Conclusions Our model that leverages state of the art transformer architecture in conjunction with multiple features can bolster the performances of drug drug interation tasks in the biomedical domain. In particular, we believe our research would be helpful in identification of potential adverse drug reactions.
Collapse
|
13
|
Li Y, Hui L, Zou L, Li H, Xu L, Wang X, Chua S. Relation Extraction in Biomedical Texts: Development of a Multi-Head Attention Model with Syntactic Dependency Feature (Preprint). JMIR Med Inform 2022; 10:e41136. [PMID: 36264604 PMCID: PMC9634522 DOI: 10.2196/41136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/27/2022] [Accepted: 09/07/2022] [Indexed: 11/19/2022] Open
Abstract
Background With the rapid expansion of biomedical literature, biomedical information extraction has attracted increasing attention from researchers. In particular, relation extraction between 2 entities is a long-term research topic. Objective This study aimed to perform 2 multiclass relation extraction tasks of Biomedical Natural Language Processing Workshop 2019 Open Shared Tasks: relation extraction of Bacteria-Biotope (BB-rel) task and binary relation extraction of plant seed development (SeeDev-binary) task. In essence, these 2 tasks are aimed at extracting the relation between annotated entity pairs from biomedical texts, which is a challenging problem. Methods Traditional research methods adopted feature- or kernel-based methods and achieved good performance. For these tasks, we propose a deep learning model based on a combination of several distributed features, such as domain-specific word embedding, part-of-speech embedding, entity-type embedding, distance embedding, and position embedding. The multi-head attention mechanism is used to extract the global semantic features of an entire sentence. Meanwhile, we introduced a dependency-type feature and the shortest dependency path connecting 2 candidate entities in the syntactic dependency graph to enrich the feature representation. Results Experiments show that our proposed model has excellent performance in biomedical relation extraction, achieving F1 scores of 65.56% and 38.04% on the test sets of the BB-rel and SeeDev-binary tasks. Especially in the SeeDev-binary task, the F1 score of our model is superior to that of other existing models and achieves state-of-the-art performance. Conclusions We demonstrated that the multi-head attention mechanism can learn relevant syntactic and semantic features in different representation subspaces and different positions to extract comprehensive feature representation. Moreover, syntactic dependency features can improve the performance of the model by learning dependency relation between the entities in biomedical texts.
Collapse
Affiliation(s)
- Yongbin Li
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Linhu Hui
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Liping Zou
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Huyang Li
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Luo Xu
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Xiaohua Wang
- School of Medical Information Engineering, Zunyi Medical University, Zunyi, China
| | - Stephanie Chua
- Faculty of Computer Science and Information Technology, University Malaysia Sarawak, Sarawak, Malaysia
| |
Collapse
|
14
|
Liu X, Tan J, Fan J, Tan K, Hu J, Dong S. A Syntax-enhanced model based on category keywords for biomedical relation extraction. J Biomed Inform 2022; 132:104135. [PMID: 35842217 DOI: 10.1016/j.jbi.2022.104135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/10/2022] [Accepted: 07/05/2022] [Indexed: 10/17/2022]
Abstract
Certain categories in multi-category biomedical relationship extraction have linguistic similarities to some extent. Keywords related to categories and syntax structures of samples between these categories have some notable features, which are very useful in biomedical relation extraction. The pre-trained model has been widely used and has achieved great success in biomedical relationship extraction, but it is still incapable of mining this kind of information accurately. To solve the problem, we present a syntax-enhanced model based on category keywords. First, we prune syntactic dependency trees in terms of category keywords obtained by the chi-square test. It reduces noisy information caused by current syntactic parsing tools and retains useful information related to categories. Next, to encode category-related syntactic dependency trees, a syntactic transformer is presented, which enhances the ability of the pre-trained model to capture syntax structures and to distinguish multiple categories. We evaluate our method on three biomedical datasets. Compared with state-of-the-art models, our method performs better on these datasets. We conduct further analysis to verify the effectiveness of our method.
Collapse
Affiliation(s)
- Xiaofeng Liu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Zhongshan Institute of Modern Industrial Technology, South China University of Technology, Zhongshan, China
| | - Jiajie Tan
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Jianye Fan
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Kaiwen Tan
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Jinlong Hu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Zhongshan Institute of Modern Industrial Technology, South China University of Technology, Zhongshan, China
| | - Shoubin Dong
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China; Zhongshan Institute of Modern Industrial Technology, South China University of Technology, Zhongshan, China.
| |
Collapse
|
15
|
Qiu Y, Zhang Y, Deng Y, Liu S, Zhang W. A Comprehensive Review of Computational Methods For Drug-Drug Interaction Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1968-1985. [PMID: 34003753 DOI: 10.1109/tcbb.2021.3081268] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The detection of drug-drug interactions (DDIs) is a crucial task for drug safety surveillance, which provides effective and safe co-prescriptions of multiple drugs. Since laboratory researches are often complicated, costly and time-consuming, it's urgent to develop computational approaches to detect drug-drug interactions. In this paper, we conduct a comprehensive review of state-of-the-art computational methods falling into three categories: literature-based extraction methods, machine learning-based prediction methods and pharmacovigilance-based data mining methods. Literature-based extraction methods detect DDIs from published literature using natural language processing techniques; machine learning-based prediction methods build prediction models based on the known DDIs in databases and predict novel ones; pharmacovigilance-based data mining methods usually apply statistical techniques on various electronic data to detect drug-drug interaction signals. We first present the taxonomy of drug-drug interaction detection methods and provide the outlines of three categories of methods. Afterwards, we respectively introduce research backgrounds and data sources of three categories, and illustrate their representative approaches as well as evaluation metrics. Finally, we discuss the current challenges of existing methods and highlight potential opportunities for future directions.
Collapse
|
16
|
Vo TH, Nguyen NTK, Kha QH, Le NQK. On the road to explainable AI in drug-drug interactions prediction: A systematic review. Comput Struct Biotechnol J 2022; 20:2112-2123. [PMID: 35832629 PMCID: PMC9092071 DOI: 10.1016/j.csbj.2022.04.021] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/15/2022] [Accepted: 04/15/2022] [Indexed: 12/26/2022] Open
Abstract
Over the past decade, polypharmacy instances have been common in multi-diseases treatment. However, unwanted drug-drug interactions (DDIs) that might cause unexpected adverse drug events (ADEs) in multiple regimens therapy remain a significant issue. Since artificial intelligence (AI) is ubiquitous today, many AI prediction models have been developed to predict DDIs to support clinicians in pharmacotherapy-related decisions. However, even though DDI prediction models have great potential for assisting physicians in polypharmacy decisions, there are still concerns regarding the reliability of AI models due to their black-box nature. Building AI models with explainable mechanisms can augment their transparency to address the above issue. Explainable AI (XAI) promotes safety and clarity by showing how decisions are made in AI models, especially in critical tasks like DDI predictions. In this review, a comprehensive overview of AI-based DDI prediction, including the publicly available source for AI-DDIs studies, the methods used in data manipulation and feature preprocessing, the XAI mechanisms to promote trust of AI, especially for critical tasks as DDIs prediction, the modeling methods, is provided. Limitations and the future directions of XAI in DDIs are also discussed.
Collapse
Affiliation(s)
- Thanh Hoa Vo
- Master Program in Clinical Genomics and Proteomics, College of Pharmacy, Taipei Medical University, Taipei 110, Taiwan
| | - Ngan Thi Kim Nguyen
- School of Nutrition and Health Sciences, College of Nutrition, Taipei Medical University, Taipei 11031, Taiwan
| | - Quang Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| |
Collapse
|
17
|
Shi Y, Quan P, Zhang T, Niu L. DREAM: Drug-Drug Interaction Extraction with Enhanced Dependency Graph and Attention Mechanism. Methods 2022; 203:152-159. [PMID: 35181524 DOI: 10.1016/j.ymeth.2022.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 12/10/2021] [Accepted: 02/06/2022] [Indexed: 11/28/2022] Open
Abstract
Drug-drug interactions (DDIs) aim at describing the effect relations produced by a combination of two or more drugs. It is an important semantic processing task in the field of bioinformatics such as pharmacovigilance and clinical research. Recently, graph neural networks are applied on dependency graph to promote the performance of DDI extraction with better semantic representations. However, current method concentrates more on first-order dependency relations and cannot discriminate the connected nodes properly. To better incorporate the dependency relations and improve the representations, we propose a novel DDI extraction method named Drug-drug Interactions extRaction with Enhanced Dependency Graph and Attention Mechanism in this work. Specifically, the dependency graph is enhanced with some potential long-range words to complete the semantic information and fit the aggregation process of graph neural networks. And graph attention mechanism is adopted to further improve word representation by discriminating the connected nodes according to the specific task. Numerical experiments on DDIExtraction 2013 corpus, the benchmark corpus for this domain, demonstrate the superiority of our proposed method.
Collapse
Affiliation(s)
- Yong Shi
- School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100049, China; Key Lab of Big Data Mining and Knowledge Management Chinese Academy of Sciences, Beijing, 100190 China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190 China; College of Information Science and Technology, University of Nebraska at Omaha, NE, 68182 USA.
| | - Pei Quan
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Tianlin Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China; National Centre for Text Mining, University of Manchester, Manchester, M1 7DN, United Kingdom.
| | - Lingfeng Niu
- School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
18
|
An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation. J Biomed Inform 2021; 125:103968. [PMID: 34871807 DOI: 10.1016/j.jbi.2021.103968] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/25/2021] [Accepted: 11/27/2021] [Indexed: 11/21/2022]
Abstract
Adverse drug event (ADE) relation extraction is a crucial task for drug safety surveillance which aims to discover potential relations between ADE mentions from unstructured medical texts. To date, the graph convolutional networks (GCN) have been the state-of-the-art solutions for improving the ability of relation extraction task. However, there are many challenging issues that should be addressed. Among these, the syntactic information is not fully exploited by GCN-based methods, especially the diversified dependency edges. Still, these methods fail to effectively extract complex relations that include nested, discontinuous and overlapping mentions. Besides, the task is primarily regarded as a classification problem where each candidate relation is treated independently which neglects the interaction between other relations. To deal with these issues, in this paper, we propose an attentive joint model with transformer-based weighted GCN for extracting ADE Relations, called ADERel. Firstly, the ADERel system formulates the ADE relation extraction task as an N-level sequence labelling so as to model the complex relations in different levels and capture greater interaction between relations. Then, it exploits our neural joint model to process the N-level sequences jointly. The joint model leverages the contextual and structural information by adopting a shared representation that combines a bidirectional encoder representation from transformers (BERT) and our proposed weighted GCN (WGCN). The latter assigns a score to each dependency edge within a sentence so as to capture rich syntactic features and determine the most influential edges for extracting ADE relations. Finally, the system employs a multi-head attention to exchange boundary knowledge across levels. We evaluate ADERel on two benchmark datasets from TAC 2017 and n2c2 2018 shared tasks. The experimental results show that ADERel is superior in performance compared with several state-of-the-art methods. The results also demonstrate that incorporating a transformer model with WGCN makes the proposed system more effective for extracting various types of ADE relations. The evaluations further highlight that ADERel takes advantage of joint learning, showing its effectiveness in recognizing complex relations.
Collapse
|
19
|
Martsevich SY, Lukina YV, Drapkina OM. Basic principles of combination therapy: focus on drug-drug interaction. КАРДИОВАСКУЛЯРНАЯ ТЕРАПИЯ И ПРОФИЛАКТИКА 2021. [DOI: 10.15829/1728-8800-2021-3031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The article is devoted to the issue of drug interactions in the combination regimens. Today, when drug therapy is the first-line approach for patients with noncommunicable diseases, and the world population ageing leads to an increase in the number of patients with severe comorbidity and polypharmacy, the problem of drug-drug interaction is especially relevant. The article discusses the main types of drug interactions — pharmacokinetic (related to absorption, distribution, metabolism and excretion of drugs) and pharmacodynamic ones, leading to synergy or antagonism of the pharmacological effects. The consequences of drug interactions can be desirable and undesirable, while the latter are much more common. Attention should be directed precisely to preventing such interactions. Also, using data from special scales and lists (Beers criteria, STOPP/START criteria), the options for various adverse drugdrug interactions are briefly described. In addition, the article provides a number of Internet resources that allow assessing the drug interaction risk when prescribing combination therapy.
Collapse
Affiliation(s)
- S. Yu. Martsevich
- National Medical Research Center for Therapy and Preventive Medicine
| | - Yu. V. Lukina
- National Medical Research Center for Therapy and Preventive Medicine
| | - O. M. Drapkina
- National Medical Research Center for Therapy and Preventive Medicine
| |
Collapse
|
20
|
Lin S, Wang Y, Zhang L, Chu Y, Liu Y, Fang Y, Jiang M, Wang Q, Zhao B, Xiong Y, Wei DQ. MDF-SA-DDI: predicting drug-drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Brief Bioinform 2021; 23:6406700. [PMID: 34671814 DOI: 10.1093/bib/bbab421] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/01/2021] [Accepted: 09/14/2021] [Indexed: 11/14/2022] Open
Abstract
One of the main problems with the joint use of multiple drugs is that it may cause adverse drug interactions and side effects that damage the body. Therefore, it is important to predict potential drug interactions. However, most of the available prediction methods can only predict whether two drugs interact or not, whereas few methods can predict interaction events between two drugs. Accurately predicting interaction events of two drugs is more useful for researchers to study the mechanism of the interaction of two drugs. In the present study, we propose a novel method, MDF-SA-DDI, which predicts drug-drug interaction (DDI) events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. MDF-SA-DDI is mainly composed of two parts: multi-source drug fusion and multi-source feature fusion. First, we combine two drugs in four different ways and input the combined drug feature representation into four different drug fusion networks (Siamese network, convolutional neural network and two auto-encoders) to obtain the latent feature vectors of the drug pairs, in which the two auto-encoders have the same structure, and their main difference is the number of neurons in the input layer of the two auto-encoders. Then, we use transformer blocks that include self-attention mechanism to perform latent feature fusion. We conducted experiments on three different tasks with two datasets. On the small dataset, the area under the precision-recall-curve (AUPR) and F1 scores of our method on task 1 reached 0.9737 and 0.8878, respectively, which were better than the state-of-the-art method. On the large dataset, the AUPR and F1 scores of our method on task 1 reached 0.9773 and 0.9117, respectively. In task 2 and task 3 of two datasets, our method also achieved the same or better performance as the state-of-the-art method. More importantly, the case studies on five DDI events are conducted and achieved satisfactory performance. The source codes and data are available at https://github.com/ShenggengLin/MDF-SA-DDI.
Collapse
Affiliation(s)
- Shenggeng Lin
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Lingfeng Zhang
- School of Electrical Engineering and Computer Science, University of Ottawa, Canada
| | - Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Yatong Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Yitian Fang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Mingming Jiang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Bowen Zhao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
21
|
Kpanou R, Osseni MA, Tossou P, Laviolette F, Corbeil J. On the robustness of generalization of drug-drug interaction models. BMC Bioinformatics 2021; 22:477. [PMID: 34607569 PMCID: PMC8489092 DOI: 10.1186/s12859-021-04398-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 09/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deep learning methods are a proven commodity in many fields and endeavors. One of these endeavors is predicting the presence of adverse drug-drug interactions (DDIs). The models generated can predict, with reasonable accuracy, the phenotypes arising from the drug interactions using their molecular structures. Nevertheless, this task requires improvement to be truly useful. Given the complexity of the predictive task, an extensive benchmarking on structure-based models for DDIs prediction was performed to evaluate their drawbacks and advantages. RESULTS We rigorously tested various structure-based models that predict drug interactions using different splitting strategies to simulate different real-world scenarios. In addition to the effects of different training and testing setups on the robustness and generalizability of the models, we then explore the contribution of traditional approaches such as multitask learning and data augmentation. CONCLUSION Structure-based models tend to generalize poorly to unseen drugs despite their ability to identify new DDIs among drugs seen during training accurately. Indeed, they efficiently propagate information between known drugs and could be valuable for discovering new DDIs in a database. However, these models will most probably fail when exposed to unknown drugs. While multitask learning does not help in our case to solve the problem, the use of data augmentation does at least mitigate it. Therefore, researchers must be cautious of the bias of the random evaluation scheme, especially if their goal is to discover new DDIs.
Collapse
Affiliation(s)
- Rogia Kpanou
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
- InVivo AI, Mila - 180 Corporate Lab L, 6650, 01 Rue Saint-Urbain, Montreal, CA H2S 3G9 Canada
| | - Mazid Abiodoun Osseni
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| | - Prudencio Tossou
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
- InVivo AI, Mila - 180 Corporate Lab L, 6650, 01 Rue Saint-Urbain, Montreal, CA H2S 3G9 Canada
| | - Francois Laviolette
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| | - Jacques Corbeil
- Department of Molecular Medicine, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| |
Collapse
|
22
|
Xie W, Wang L, Cheng Q, Wang X, Wang Y, Bi H, He B, Feng W. Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug-Drug Interaction Information Retrieval. Front Pharmacol 2021; 11:582470. [PMID: 34017245 PMCID: PMC8130007 DOI: 10.3389/fphar.2020.582470] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 11/30/2020] [Indexed: 11/13/2022] Open
Abstract
Clinical drug-drug interactions (DDIs) have been a major cause for not only medical error but also adverse drug events (ADEs). The published literature on DDI clinical toxicity continues to grow significantly, and high-performance DDI information retrieval (IR) text mining methods are in high demand. The effectiveness of IR and its machine learning (ML) algorithm depends on the availability of a large amount of training and validation data that have been manually reviewed and annotated. In this study, we investigated how active learning (AL) might improve ML performance in clinical safety DDI IR analysis. We recognized that a direct application of AL would not address several primary challenges in DDI IR from the literature. For instance, the vast majority of abstracts in PubMed will be negative, existing positive and negative labeled samples do not represent the general sample distributions, and potentially biased samples may arise during uncertainty sampling in an AL algorithm. Therefore, we developed several novel sampling and ML schemes to improve AL performance in DDI IR analysis. In particular, random negative sampling was added as a part of AL since it has no expanse in the manual data label. We also used two ML algorithms in an AL process to differentiate random negative samples from manually labeled negative samples, and updated both the training and validation samples during the AL process to avoid or reduce biased sampling. Two supervised ML algorithms, support vector machine (SVM) and logistic regression (LR), were used to investigate the consistency of our proposed AL algorithm. Because the ultimate goal of clinical safety DDI IR is to retrieve all DDI toxicity-relevant abstracts, a recall rate of 0.99 was set in developing the AL methods. When we used our newly proposed AL method with SVM, the precision in differentiating the positive samples from manually labeled negative samples improved from 0.45 in the first round to 0.83 in the second round, and the precision in differentiating the positive samples from random negative samples improved from 0.70 to 0.82 in the first and second rounds, respectively. When our proposed AL method was used with LR, the improvements in precision followed a similar trend. However, the other AL algorithms tested did not show improved precision largely because of biased samples caused by the uncertainty sampling or differences between training and validation data sets.
Collapse
Affiliation(s)
- Weixin Xie
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Limei Wang
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China.,Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Qi Cheng
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Xueying Wang
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Ying Wang
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Hongyuan Bi
- The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Bo He
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | - Weixing Feng
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| |
Collapse
|
23
|
Drug-Drug interaction extraction using a position and similarity fusion-based attention mechanism. J Biomed Inform 2021; 115:103707. [PMID: 33571676 DOI: 10.1016/j.jbi.2021.103707] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 12/21/2020] [Accepted: 02/03/2021] [Indexed: 11/20/2022]
Abstract
Taking multiple drugs at the same time can increase or decrease each drug's effectiveness or cause side effects. These drug-drug interactions (DDIs) may lead to an increase in the cost of medical care or even threaten patients' health and life. Thus, automatic extraction of DDIs is an important research field to improve patient safety. In this work, a deep neural network model is presented for extracting DDIs from medical texts. This model utilizes a novel attention mechanism for improving the discrimination of important words from others, based on the word similarities and their relative position with respect to candidate drugs. This approach is applied for calculating the attention weights for the outputs of a bi-directional long short-term memory (Bi-LSTM) model in the deep network structure before detecting the type of DDIs. The proposed method was tested on the standard DDI Extraction 2013 dataset and according to experimental results was able to achieve an F1-Score of 78.30 which is comparable to the best results reported for the state-of-the-art methods. A detailed study of the proposed method and its components is also provided.
Collapse
|
24
|
Zheng T, Xu Z, Li Y, Zhao Y, Wang B, Yang X. A Novel Conditional Knowledge Graph Representation and Construction. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93049-3_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
25
|
Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics 2020; 36:3818-3824. [PMID: 32207514 DOI: 10.1093/bioinformatics/btaa203] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 01/13/2020] [Accepted: 03/19/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Biological pathway is an important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only one-third of human genes in KEGG, and pathways are fragmented. For this reason, there are few computational methods to use pathways for cancer subtype classification. RESULTS We present an explainable deep-learning model with attention mechanism and network propagation for cancer subtype classification. Each pathway is modeled by a graph convolutional network. Then, a multi-attention-based ensemble model combines several hundreds of pathways in an explainable manner. Lastly, network propagation on pathway-gene network explains why gene expression profiles in subtypes are different. In experiments with five TCGA cancer datasets, our method achieved very good classification accuracies and, additionally, identified subtype-specific pathways and biological functions. AVAILABILITY AND IMPLEMENTATION The source code is available at http://biohealth.snu.ac.kr/software/GCN_MAE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sangseon Lee
- Department of Computer Science and Engineering, Institute of Engineering Research
| | | | - Taeheon Lee
- Department of Computer Science and Engineering, Institute of Engineering Research
| | - Inyoung Sung
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Institute of Engineering Research.,Bioinformatics Institute.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
26
|
Abstract
OBJECTIVES We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basic IE tasks, named entity recognition and relation extraction, for two selected semantic classes-diseases and drugs (or medications)-and relations between them. METHODS For the time period from 2017 to early 2020, we searched for relevant publications from three major scientific communities: medicine and medical informatics, natural language processing, as well as neural networks and artificial intelligence. RESULTS In the past decade, the field of Natural Language Processing (NLP) has undergone a profound methodological shift from symbolic to distributed representations based on the paradigm of Deep Learning (DL). Meanwhile, this trend is, although with some delay, also reflected in the medical NLP community. In the reporting period, overwhelming experimental evidence has been gathered, as illustrated in this survey for medical IE, that DL-based approaches outperform non-DL ones by often large margins. Still, small-sized and access-limited corpora create intrinsic problems for data-greedy DL as do special linguistic phenomena of medical sublanguages that have to be overcome by adaptive learning strategies. CONCLUSIONS The paradigm shift from (feature-engineered) ML to DNNs changes the fundamental methodological rules of the game for medical NLP. This change is by no means restricted to medical IE but should also deeply influence other areas of medical informatics, either NLP- or non-NLP-based.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany
| | - Michel Oleynik
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria
| |
Collapse
|
27
|
Zhu Y, Li L, Lu H, Zhou A, Qin X. Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions. J Biomed Inform 2020; 106:103451. [PMID: 32454243 DOI: 10.1016/j.jbi.2020.103451] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 04/10/2020] [Accepted: 05/07/2020] [Indexed: 12/18/2022]
Abstract
Drug-drug interactions (DDIs) extraction is one of the important tasks in the field of biomedical relation extraction, which plays an important role in the field of pharmacovigilance. Previous neural network based models have achieved good performance in DDIs extraction. However, most of the previous models did not make good use of the information of drug entity names, which can help to judge the relation between drugs. This is mainly because drug names are often very complex, leading to the fact that neural network models cannot understand their semantics directly. To address this issue, we propose a DDIs extraction model using multiple entity-aware attentions with various entity information. We use an output-modified bidirectional transformer (BioBERT) and a bidirectional gated recurrent unit layer (BiGRU) to obtain the vector representation of sentences. The vectors of drug description documents encoded by Doc2Vec are used as drug description information, which is an external knowledge to our model. Then we construct three different kinds of entity-aware attentions to get the sentence representations with entity information weighted, including attentions using the drug description information. The outputs of attention layers are concatenated and fed into a multi-layer perception layer. Finally, we get the result by a softmax classifier. The F-score is used to evaluate our model, which is also adopted by most previous DDIs extraction models. We evaluate our proposed model on the DDIExtraction 2013 corpus, which is the benchmark corpus of this domain, and achieves the state-of-the-art result (80.9% in F-score).
Collapse
Affiliation(s)
- Yu Zhu
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Lishuang Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Hongbin Lu
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Anqiao Zhou
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Xueyang Qin
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| |
Collapse
|
28
|
Lee CY, Chen YPP. Prediction of drug adverse events using deep learning in pharmaceutical discovery. Brief Bioinform 2020; 22:1884-1901. [PMID: 32349125 DOI: 10.1093/bib/bbaa040] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 02/08/2020] [Accepted: 02/25/2020] [Indexed: 01/11/2023] Open
Abstract
Traditional machine learning methods used to detect the side effects of drugs pose significant challenges as feature engineering processes are labor-intensive, expert-dependent, time-consuming and cost-ineffective. Moreover, these methods only focus on detecting the association between drugs and their side effects or classifying drug-drug interaction. Motivated by technological advancements and the availability of big data, we provide a review on the detection and classification of side effects using deep learning approaches. It is shown that the effective integration of heterogeneous, multidimensional drug data sources, together with the innovative deployment of deep learning approaches, helps reduce or prevent the occurrence of adverse drug reactions (ADRs). Deep learning approaches can also be exploited to find replacements for drugs which have side effects or help to diversify the utilization of drugs through drug repurposing.
Collapse
Affiliation(s)
- Chun Yen Lee
- Department of Computer Science and Information Technology, La Trobe University
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University
| |
Collapse
|
29
|
Wu H, Xing Y, Ge W, Liu X, Zou J, Zhou C, Liao J. Drug-drug interaction extraction via hybrid neural networks on biomedical literature. J Biomed Inform 2020; 106:103432. [PMID: 32335223 DOI: 10.1016/j.jbi.2020.103432] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 04/15/2020] [Accepted: 04/20/2020] [Indexed: 01/16/2023]
Abstract
Adverse events caused by drug-drug interaction (DDI) not only pose a serious threat to health, but also increase additional medical care expenditure. However, despite the emergence of many excellent text mining-based DDI classification methods, achieving a balance between using simpler method and better model performance is still unsatisfactory. In this article, we present a deep learning method of stacked bidirectional Gated Recurrent Unit (GRU)- convolutional neural network (SGRU-CNN) model which apply stacked bidirectional GRU (BiGRU) network and convolutional neural network (CNN) on lexical information and entity position information respectively to conduct DDIs extraction task. Furthermore, SGRU-CNN model assigns the weights of each word feature to improve performance with one attentive pooling layer. On the condition that other values are not inferior to other algorithms, experimental results on the DDI Extraction 2013 corpus show that our model achieves a 1.54% improvement in recall value. And the proposed SGRU-CNN model reaches great performance (F1-score: 0.75) with the fewest features, indicating an excellent balance between avoiding redundant preprocessing task and higher accuracy in relation extraction on biomedical literature using our method.
Collapse
Affiliation(s)
- Hong Wu
- School of science, China Pharmaceutical University, Nanjing, China
| | - Yan Xing
- School of science, China Pharmaceutical University, Nanjing, China
| | - Weihong Ge
- Department of Pharmacy, Nanjing Drum Tower Hospital, Nanjing, China; School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xiaoquan Liu
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Jianjun Zou
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China; Department of Clinical Pharmacology, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Changjiang Zhou
- School of science, China Pharmaceutical University, Nanjing, China
| | - Jun Liao
- School of science, China Pharmaceutical University, Nanjing, China.
| |
Collapse
|
30
|
Jettakul A, Wichadakul D, Vateekul P. Relation extraction between bacteria and biotopes from biomedical texts with attention mechanisms and domain-specific contextual representations. BMC Bioinformatics 2019; 20:627. [PMID: 31795930 PMCID: PMC6889521 DOI: 10.1186/s12859-019-3217-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/12/2019] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based models; others have presented deep-learning-based models such as convolutional and recurrent neural networks used with the shortest dependency paths (SDPs). Although SDPs contain valuable and concise information, some parts of crucial information that is required to define bacterial location relationships are often neglected. Moreover, the traditional word-embedding used in previous studies may suffer from word ambiguation across linguistic contexts. RESULTS Here, we present a deep learning model for biomedical RE. The model incorporates feature combinations of SDPs and full sentences with various attention mechanisms. We also used pre-trained contextual representations based on domain-specific vocabularies. To assess the model's robustness, we introduced a mean F1 score on many models using different random seeds. The experiments were conducted on the standard BB corpus in BioNLP-ST'16. Our experimental results revealed that the model performed better (in terms of both maximum and average F1 scores; 60.77% and 57.63%, respectively) compared with other existing models. CONCLUSIONS We demonstrated that our proposed contributions to this task can be used to extract rich lexical, syntactic, and semantic features that effectively boost the model's performance. Moreover, we analyzed the trade-off between precision and recall to choose the proper cut-off to use in real-world applications.
Collapse
Affiliation(s)
- Amarin Jettakul
- Chulalongkorn University Big Data Analytics and IoT Center (CUBIC), Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
| | - Duangdao Wichadakul
- Chulalongkorn University Big Data Analytics and IoT Center (CUBIC), Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
| | - Peerapon Vateekul
- Chulalongkorn University Big Data Analytics and IoT Center (CUBIC), Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
31
|
Zhang T, Leng J, Liu Y. Deep learning for drug–drug interaction extraction from the literature: a review. Brief Bioinform 2019; 21:1609-1627. [DOI: 10.1093/bib/bbz087] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 06/20/2019] [Accepted: 06/21/2019] [Indexed: 01/07/2023] Open
Abstract
Abstract
Drug–drug interactions (DDIs) are crucial for drug research and pharmacovigilance. These interactions may cause adverse drug effects that threaten public health and patient safety. Therefore, the DDIs extraction from biomedical literature has been widely studied and emphasized in modern biomedical research. The previous rules-based and machine learning approaches rely on tedious feature engineering, which is labourious, time-consuming and unsatisfactory. With the development of deep learning technologies, this problem is alleviated by learning feature representations automatically. Here, we review the recent deep learning methods that have been applied to the extraction of DDIs from biomedical literature. We describe each method briefly and compare its performance in the DDI corpus systematically. Next, we summarize the advantages and disadvantages of these deep learning models for this task. Furthermore, we discuss some challenges and future perspectives of DDI extraction via deep learning methods. This review aims to serve as a useful guide for interested researchers to further advance bioinformatics algorithms for DDIs extraction from the literature.
Collapse
Affiliation(s)
- Tianlin Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, China
| | - Jiaxu Leng
- School of Computer Science and Technology, University of Chinese Academy of Sciences, China
| | - Ying Liu
- University of Chinese Academy of Sciences, Key Lab of Big Data Mining and Knowledge Management
| |
Collapse
|
32
|
Executive summary of the 2019 ASHP Commission on Goals: Impact of artificial intelligence on healthcare and pharmacy practice. Am J Health Syst Pharm 2019; 76:2087-2092. [DOI: 10.1093/ajhp/zxz205] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
33
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
34
|
Zhou H, Lang C, Liu Z, Ning S, Lin Y, Du L. Knowledge-guided convolutional networks for chemical-disease relation extraction. BMC Bioinformatics 2019; 20:260. [PMID: 31113357 PMCID: PMC6528333 DOI: 10.1186/s12859-019-2873-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 05/02/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Automatic extraction of chemical-disease relations (CDR) from unstructured text is of essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior knowledge about chemicals and diseases. Prior knowledge provides strong support for CDR extraction. How to make full use of it is worth studying. RESULTS This paper proposes a novel model called "Knowledge-guided Convolutional Networks (KCN)" to leverage prior knowledge for CDR extraction. The proposed model first learns knowledge representations including entity embeddings and relation embeddings from KBs. Then, entity embeddings are used to control the propagation of context features towards a chemical-disease pair with gated convolutions. After that, relation embeddings are employed to further capture the weighted context features by a shared attention pooling. Finally, the weighted context features containing additional knowledge information are used for CDR extraction. Experiments on the BioCreative V CDR dataset show that the proposed KCN achieves 71.28% F1-score, which outperforms most of the state-of-the-art systems. CONCLUSIONS This paper proposes a novel CDR extraction model KCN to make full use of prior knowledge. Experimental results demonstrate that KCN could effectively integrate prior knowledge and contexts for the performance improvement.
Collapse
Affiliation(s)
- Huiwei Zhou
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.
| | - Chengkun Lang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Shixian Ning
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Yingyu Lin
- School of Foreign Languages, Dalian University of Technology, Arts Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| | - Lei Du
- School of Mathematical Sciences, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China
| |
Collapse
|
35
|
Shen Y, Yuan K, Yang M, Tang B, Li Y, Du N, Lei K. KMR: knowledge-oriented medicine representation learning for drug-drug interaction and similarity computation. J Cheminform 2019; 11:22. [PMID: 30874969 PMCID: PMC6419809 DOI: 10.1186/s13321-019-0342-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 03/01/2019] [Indexed: 02/07/2023] Open
Abstract
Efficient representations of drugs provide important support for healthcare analytics, such as drug-drug interaction (DDI) prediction and drug-drug similarity (DDS) computation. However, incomplete annotated data and drug feature sparseness create substantial barriers for drug representation learning, making it difficult to accurately identify new drug properties prior to public release. To alleviate these deficiencies, we propose KMR, a knowledge-oriented feature-driven method which can learn drug related knowledge with an accurate representation. We conduct series of experiments on real-world medical datasets to demonstrate that KMR is capable of drug representation learning. KMR can support to discover meaningful DDI with an accuracy rate of 92.19%, demonstrating that techniques developed in KMR significantly improve the prediction quality for new drugs not seen at training. Experimental results also indicate that KMR can identify DDS with an accuracy rate of 88.7% by facilitating drug knowledge, outperforming existing state-of-the-art drug similarity measures.
Collapse
Affiliation(s)
- Ying Shen
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
| | - Kaiqi Yuan
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
| | - Min Yang
- SIAT, Chinese Academy of Sciences, 518055 Shenzhen, People’s Republic of China
| | - Buzhou Tang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055 People’s Republic of China
| | | | - Nan Du
- Tencent Medical AI Lab, Palo Alto, USA
| | - Kai Lei
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
- PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
36
|
Lamurias A, Sousa D, Clarke LA, Couto FM. BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies. BMC Bioinformatics 2019; 20:10. [PMID: 30616557 PMCID: PMC6323831 DOI: 10.1186/s12859-018-2584-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/12/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Recent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are continuously being updated. Biomedical ontologies are nowadays a mainstream approach to formalize existing knowledge about entities, such as genes, chemicals, phenotypes, and disorders. These resources contain supplementary information that may not be yet encoded in training data, particularly in domains with limited labeled data. RESULTS We propose a new model to detect and classify relations in text, BO-LSTM, that takes advantage of domain-specific ontologies, by representing each entity as the sequence of its ancestors in the ontology. We implemented BO-LSTM as a recurrent neural network with long short-term memory units and using open biomedical ontologies, specifically Chemical Entities of Biological Interest (ChEBI), Human Phenotype, and Gene Ontology. We assessed the performance of BO-LSTM with drug-drug interactions mentioned in a publicly available corpus from an international challenge, composed of 792 drug descriptions and 233 scientific abstracts. By using the domain-specific ontology in addition to word embeddings and WordNet, BO-LSTM improved the F1-score of both the detection and classification of drug-drug interactions, particularly in a document set with a limited number of annotations. We adapted an existing DDI extraction model with our ontology-based method, obtaining a higher F1 score than the original model. Furthermore, we developed and made available a corpus of 228 abstracts annotated with relations between genes and phenotypes, and demonstrated how BO-LSTM can be applied to other types of relations. CONCLUSIONS Our findings demonstrate that besides the high performance of current deep learning techniques, domain-specific ontologies can still be useful to mitigate the lack of labeled data.
Collapse
Affiliation(s)
- Andre Lamurias
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749 016 Portugal
- University of Lisboa, Faculty of Sciences, BioISI - Biosystems & Integrative Sciences Institute, Campo Grande, C8 bdg, Lisboa, 1749 016 Portugal
| | - Diana Sousa
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749 016 Portugal
| | - Luka A. Clarke
- University of Lisboa, Faculty of Sciences, BioISI - Biosystems & Integrative Sciences Institute, Campo Grande, C8 bdg, Lisboa, 1749 016 Portugal
| | - Francisco M. Couto
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749 016 Portugal
| |
Collapse
|
37
|
Grizzle AJ, Horn J, Collins C, Schneider J, Malone DC, Stottlemyer B, Boyce RD. Identifying Common Methods Used by Drug Interaction Experts for Finding Evidence About Potential Drug-Drug Interactions: Web-Based Survey. J Med Internet Res 2019; 21:e11182. [PMID: 30609981 PMCID: PMC6682289 DOI: 10.2196/11182] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/05/2018] [Accepted: 09/27/2018] [Indexed: 12/22/2022] Open
Abstract
Background Preventing drug interactions is an important goal to maximize patient benefit from medications. Summarizing potential drug-drug interactions (PDDIs) for clinical decision support is challenging, and there is no single repository for PDDI evidence. Additionally, inconsistencies across compendia and other sources have been well documented. Standard search strategies for complete and current evidence about PDDIs have not heretofore been developed or validated. Objective This study aimed to identify common methods for conducting PDDI literature searches used by experts who routinely evaluate such evidence. Methods We invited a convenience sample of 70 drug information experts, including compendia editors, knowledge-base vendors, and clinicians, via emails to complete a survey on identifying PDDI evidence. We created a Web-based survey that included questions regarding the (1) development and conduct of searches; (2) resources used, for example, databases, compendia, search engines, etc; (3) types of keywords used to search for the specific PDDI information; (4) study types included and excluded in searches; and (5) search terms used. Search strategy questions focused on 6 topics of the PDDI information—(1) that a PDDI exists; (2) seriousness; (3) clinical consequences; (4) management options; (5) mechanism; and (6) health outcomes. Results Twenty participants (response rate, 20/70, 29%) completed the survey. The majority (17/20, 85%) were drug information specialists, drug interaction researchers, compendia editors, or clinical pharmacists, with 60% (12/20) having >10 years’ experience. Over half (11/20, 55%) worked for clinical solutions vendors or knowledge-base vendors. Most participants developed (18/20, 90%) and conducted (19/20, 95%) search strategies without librarian assistance. PubMed (20/20, 100%) and Google Scholar (11/20, 55%) were most commonly searched for papers, followed by Google Web Search (7/20, 35%) and EMBASE (3/20, 15%). No respondents reported using Scopus. A variety of subscription and open-access databases were used, most commonly Lexicomp (9/20, 45%), Micromedex (8/20, 40%), Drugs@FDA (17/20, 85%), and DailyMed (13/20, 65%). Facts and Comparisons was the most commonly used compendia (8/20, 40%). Across the 6 attributes of interest, generic drug name was the most common keyword used. Respondents reported using more types of keywords when searching to identify the existence of PDDIs and determine their mechanism than when searching for the other 4 attributes (seriousness, consequences, management, and health outcomes). Regarding the types of evidence useful for evaluating a PDDI, clinical trials, case reports, and systematic reviews were considered relevant, while animal and in vitro data studies were not. Conclusions This study suggests that drug interaction experts use various keyword strategies and various database and Web resources depending on the PDDI evidence they are seeking. Greater automation and standardization across search strategies could improve one’s ability to identify PDDI evidence. Hence, future research focused on enhancing the existing search tools and designing recommended standards is needed.
Collapse
Affiliation(s)
- Amy J Grizzle
- Center for Health Outcomes & PharmacoEconomic Research, College of Pharmacy, University of Arizona, Tucson, AZ, United States
| | - John Horn
- School of Pharmacy, University of Washington, Seattle, WA, United States
| | - Carol Collins
- School of Pharmacy, University of Washington, Seattle, WA, United States
| | - Jodi Schneider
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, United States
| | - Daniel C Malone
- Center for Health Outcomes & PharmacoEconomic Research, College of Pharmacy Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ, United States
| | - Britney Stottlemyer
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Richard David Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
38
|
Wang H, Liu X, Tao Y, Ye W, Jin Q, Cohen WW, Xing EP. Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:112-123. [PMID: 30864315 PMCID: PMC6417822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the exibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiang Liu
- Chinese University of Hong Kong Shenzhen, China
| | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenting Ye
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiao Jin
- Tsinghua University Beijing, China
| | - William W. Cohen
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Google AI Pittsburgh, PA, USA
| | - Eric P. Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Pettum Inc. Pittsburgh, PA, USA
| |
Collapse
|
39
|
He B, Guan Y, Dai R. Classifying medical relations in clinical text via convolutional neural networks. Artif Intell Med 2019; 93:43-49. [PMID: 29778673 DOI: 10.1016/j.artmed.2018.05.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Revised: 02/27/2018] [Accepted: 05/04/2018] [Indexed: 11/15/2022]
Abstract
Deep learning research on relation classification has achieved solid performance in the general domain. This study proposes a convolutional neural network (CNN) architecture with a multi-pooling operation for medical relation classification on clinical records and explores a loss function with a category-level constraint matrix. Experiments using the 2010 i2b2/VA relation corpus demonstrate these models, which do not depend on any external features, outperform previous single-model methods and our best model is competitive with the existing ensemble-based method.
Collapse
Affiliation(s)
- Bin He
- Research Center of Language Technology, Harbin Institute of Technology, Harbin, China.
| | - Yi Guan
- Research Center of Language Technology, Harbin Institute of Technology, Harbin, China.
| | - Rui Dai
- Department of Mathematics, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
40
|
Zhou H, Liu Z, Ning S, Yang Y, Lang C, Lin Y, Ma K. Leveraging prior knowledge for protein-protein interaction extraction with memory network. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5053999. [PMID: 30010731 PMCID: PMC6047414 DOI: 10.1093/database/bay071] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/14/2018] [Indexed: 11/14/2022]
Abstract
Automatically extracting protein-protein interactions (PPIs) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset. The paper also shows that multiple computational layers over an external memory are superior to long short-term memory networks with the local memories.Database URL: http://www.biocreative.org/tasks/biocreative-vi/track-4/.
Collapse
Affiliation(s)
- Huiwei Zhou
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Shixian Ning
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Yunlong Yang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Chengkun Lang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Yingyu Lin
- School of Foreign Languages, Dalian University of Technology, Arts Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Kun Ma
- School of Life Science and Medicine, Dalian University of Technology, F03 Building, No. 2 Dagong Road, Liaodongwan District, Panjin, Liaoning, China
| |
Collapse
|
41
|
Sahu SK, Anand A. Drug-drug interaction extraction from biomedical texts using long short-term memory network. J Biomed Inform 2018; 86:15-24. [DOI: 10.1016/j.jbi.2018.08.005] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 08/07/2018] [Indexed: 12/15/2022]
|
42
|
Zheng W, Lin H, Liu X, Xu B. A document level neural model integrated domain knowledge for chemical-induced disease relations. BMC Bioinformatics 2018; 19:328. [PMID: 30223767 PMCID: PMC6142695 DOI: 10.1186/s12859-018-2316-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 08/14/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven't been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models. RESULTS In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems. CONCLUSIONS Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.
Collapse
Affiliation(s)
- Wei Zheng
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.,College of Software, Dalian JiaoTong University, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| | - Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| |
Collapse
|
43
|
Thompson P, Daikou S, Ueno K, Batista-Navarro R, Tsujii J, Ananiadou S. Annotation and detection of drug effects in text for pharmacovigilance. J Cheminform 2018; 10:37. [PMID: 30105604 PMCID: PMC6089860 DOI: 10.1186/s13321-018-0290-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 07/20/2018] [Indexed: 02/02/2023] Open
Abstract
Pharmacovigilance (PV) databases record the benefits and risks of different drugs, as a means to ensure their safe and effective use. Creating and maintaining such resources can be complex, since a particular medication may have divergent effects in different individuals, due to specific patient characteristics and/or interactions with other drugs being administered. Textual information from various sources can provide important evidence to curators of PV databases about the usage and effects of drug targets in different medical subjects. However, the efficient identification of relevant evidence can be challenging, due to the increasing volume of textual data. Text mining (TM) techniques can support curators by automatically detecting complex information, such as interactions between drugs, diseases and adverse effects. This semantic information supports the quick identification of documents containing information of interest (e.g., the different types of patients in which a given adverse drug reaction has been observed to occur). TM tools are typically adapted to different domains by applying machine learning methods to corpora that are manually labelled by domain experts using annotation guidelines to ensure consistency. We present a semantically annotated corpus of 597 MEDLINE abstracts, PHAEDRA, encoding rich information on drug effects and their interactions, whose quality is assured through the use of detailed annotation guidelines and the demonstration of high levels of inter-annotator agreement (e.g., 92.6% F-Score for identifying named entities and 78.4% F-Score for identifying complex events, when relaxed matching criteria are applied). To our knowledge, the corpus is unique in the domain of PV, according to the level of detail of its annotations. To illustrate the utility of the corpus, we have trained TM tools based on its rich labels to recognise drug effects in text automatically. The corpus and annotation guidelines are available at: http://www.nactem.ac.uk/PHAEDRA/ .
Collapse
Affiliation(s)
- Paul Thompson
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Sophia Daikou
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Kenju Ueno
- Artificial Intelligence Research Center, National Research and Development Agency (AIST), Tokyo Waterfront 2-3-2 Aomi, Koto-ku, Tokyo, 135-0064 Japan
| | - Riza Batista-Navarro
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Jun’ichi Tsujii
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
- Artificial Intelligence Research Center, National Research and Development Agency (AIST), Tokyo Waterfront 2-3-2 Aomi, Koto-ku, Tokyo, 135-0064 Japan
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| |
Collapse
|
44
|
Chemical-induced disease relation extraction with dependency information and prior knowledge. J Biomed Inform 2018; 84:171-178. [DOI: 10.1016/j.jbi.2018.07.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/09/2018] [Accepted: 07/11/2018] [Indexed: 11/18/2022]
|
45
|
Zheng W, Lin H, Li Z, Liu X, Li Z, Xu B, Zhang Y, Yang Z, Wang J. An effective neural model extracting document level chemical-induced disease relations from biomedical literature. J Biomed Inform 2018; 83:1-9. [DOI: 10.1016/j.jbi.2018.05.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 03/14/2018] [Accepted: 05/04/2018] [Indexed: 01/06/2023]
|