1
|
Zhao W, Zhang J, Yang J, Jiang X, He T. Document-Level Chemical-Induced Disease Relation Extraction via Hierarchical Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2782-2793. [PMID: 34077368 DOI: 10.1109/tcbb.2021.3086090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Over the past decades, Chemical-induced Disease (CID) relations have attracted extensive attention in biomedical community, reflecting wide applications in biomedical research and healthcare field. However, prior efforts fail to make full use of the interaction between local and global contexts in biomedical document, and the derived performance needs to be improved accordingly. In this paper, we propose a novel framework for document-level CID relation extraction. More specifically, a stacked Hypergraph Aggregation Neural Network (HANN) layers are introduced to model the complicated interaction between local and global contexts, based on which better contextualized representations are obtained for CID relation extraction. In addition, the CID Relation Heterogeneous Graph is constructed to capture the information with different granularities and improve further the performance of CID relation classification. Experiments on a real-world dataset demonstrate the effectiveness of the proposed framework.
Collapse
|
2
|
Li Z, Wang M, Peng D, Liu J, Xie Y, Dai Z, Zou X. Identification of Chemical-Disease Associations Through Integration of Molecular Fingerprint, Gene Ontology and Pathway Information. Interdiscip Sci 2022; 14:683-696. [PMID: 35391615 DOI: 10.1007/s12539-022-00511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 06/14/2023]
Abstract
The identification of chemical-disease association types is helpful not only to discovery lead compounds and study drug repositioning, but also to treat disease and decipher pathomechanism. It is very urgent to develop computational method for identifying potential chemical-disease association types, since wet methods are usually expensive, laborious and time-consuming. In this study, molecular fingerprint, gene ontology and pathway are utilized to characterize chemicals and diseases. A novel predictor is proposed to recognize potential chemical-disease associations at the first layer, and further distinguish whether their relationships belong to biomarker or therapeutic relations at the second layer. The prediction performance of current method is assessed using the benchmark dataset based on ten-fold cross-validation. The practical prediction accuracies of the first layer and the second layer are 78.47% and 72.07%, respectively. The recognition ability for lead compounds, new drug indications, potential and true chemical-disease association pairs has also been investigated and confirmed by constructing a variety of datasets and performing a series of experiments. It is anticipated that the current method can be considered as a powerful high-throughput virtual screening tool for drug researches and developments.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, People's Republic of China.
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, 510006, People's Republic of China.
| | - Mengru Wang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yun Xie
- HuiZhou University, Huizhou, 516007, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| |
Collapse
|
3
|
Le HQ, Can DC, Collier N. Exploiting document graphs for inter sentence relation extraction. J Biomed Semantics 2022; 13:15. [PMID: 35659292 PMCID: PMC9166375 DOI: 10.1186/s13326-022-00267-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Abstract
Background Most previous relation extraction (RE) studies have focused on intra sentence relations and have ignored relations that span sentences, i.e. inter sentence relations. Such relations connect entities at the document level rather than as relational facts in a single sentence. Extracting facts that are expressed across sentences leads to some challenges and requires different approaches than those usually applied in recent intra sentence relation extraction. Despite recent results, there are still limitations to be overcome. Results We present a novel representation for a sequence of consecutive sentences, namely document subgraph, to extract inter sentence relations. Experiments on the BioCreative V Chemical-Disease Relation corpus demonstrate the advantages and robustness of our novel system to extract both intra- and inter sentence relations in biomedical literature abstracts. The experimental results are comparable to state-of-the-art approaches and show the potential by demonstrating the effectiveness of graphs, deep learning-based model, and other processing techniques. Experiments were also carried out to verify the rationality and impact of various additional information and model components. Conclusions Our proposed graph-based representation helps to extract ∼50% of inter sentence relations and boosts the model performance on both precision and recall compared to the baseline model. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-022-00267-3).
Collapse
Affiliation(s)
- Hoang-Quynh Le
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam.
| | - Duy-Cat Can
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Nigel Collier
- Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
| |
Collapse
|
4
|
Chen J, Hu B, Peng W, Chen Q, Tang B. Biomedical relation extraction via knowledge-enhanced reading comprehension. BMC Bioinformatics 2022; 23:20. [PMID: 34991458 PMCID: PMC8734165 DOI: 10.1186/s12859-021-04534-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 12/13/2021] [Indexed: 12/01/2022] Open
Abstract
Background In biomedical research, chemical and disease relation extraction from unstructured biomedical literature is an essential task. Effective context understanding and knowledge integration are two main research problems in this task. Most work of relation extraction focuses on classification for entity mention pairs. Inspired by the effectiveness of machine reading comprehension (RC) in the respect of context understanding, solving biomedical relation extraction with the RC framework at both intra-sentential and inter-sentential levels is a new topic worthy to be explored. Except for the unstructured biomedical text, many structured knowledge bases (KBs) provide valuable guidance for biomedical relation extraction. Utilizing knowledge in the RC framework is also worthy to be investigated. We propose a knowledge-enhanced reading comprehension (KRC) framework to leverage reading comprehension and prior knowledge for biomedical relation extraction. First, we generate questions for each relation, which reformulates the relation extraction task to a question answering task. Second, based on the RC framework, we integrate knowledge representation through an efficient knowledge-enhanced attention interaction mechanism to guide the biomedical relation extraction. Results The proposed model was evaluated on the BioCreative V CDR dataset and CHR dataset. Experiments show that our model achieved a competitive document-level F1 of 71.18% and 93.3%, respectively, compared with other methods. Conclusion Result analysis reveals that open-domain reading comprehension data and knowledge representation can help improve biomedical relation extraction in our proposed KRC framework. Our work can encourage more research on bridging reading comprehension and biomedical relation extraction and promote the biomedical relation extraction.
Collapse
Affiliation(s)
- Jing Chen
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Baotian Hu
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China.
| | - Weihua Peng
- Baidu International Technology (Shenzhen) Co., Ltd, Shenzhen, China
| | - Qingcai Chen
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China. .,Peng Cheng Laboratory, Shenzhen, China.
| | - Buzhou Tang
- Intelligent Computing Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
5
|
Li Z, Chen H, Qi R, Lin H, Chen H. DocR-BERT: Document-level R-BERT for Chemical-induced Disease Relation Extraction via Gaussian Probability Distribution. IEEE J Biomed Health Inform 2021; 26:1341-1352. [PMID: 34591774 DOI: 10.1109/jbhi.2021.3116769] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Chemical-induced disease (CID) relation extraction from biomedical articles plays an important role in disease treatment and drug development. Existing methods are insufficient for capturing complete document level semantic information due to ignoring semantic information of entities in different sentences. In this work, we proposed an effective document-level relation extraction model to automatically extract intra-/inter-sentential CID relations from articles. Firstly, our model employed BERT to generate contextual semantic representations of the title, abstract and shortest dependency paths (SDPs). Secondly, to enhance the semantic representation of the whole document, cross attention with self-attention (named cross2self-attention) between abstract, title and SDPs was proposed to learn the mutual semantic information. Thirdly, to distinguish the importance of the target entity in different sentences, the Gaussian probability distribution was utilized to compute the weights of the co-occurrence sentence and its adjacent entity sentences. More complete semantic information of the target entity is collected from all entities occurring in the document via our presented document-level R-BERT (DocR-BERT). Finally, the related representations were concatenated and fed into the softmax function to extract CIDs. We evaluated the model on the CDR corpus provided by BioCreative V. The proposed model without external resources is superior in performance as compared with other state-of-the-art models (our model achieves 53.5%, 70%, and 63.7% of the F1-score on inter-/intra-sentential and overall CDR dataset). The experimental results indicate that cross2self-attention, the Gaussian probability distribution and DocR-BERT can effectively improve the CID extraction performance. Furthermore, the mutual semantic information learned by the cross self-attention from abstract towards title can significantly influence the extraction performance of document-level biomedical relation extraction tasks.
Collapse
|
6
|
Document-level relation extraction via graph transformer networks and temporal convolutional networks. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.06.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
7
|
Lu H, Li L, Li Z, Zhao S. Extracting chemical-induced disease relation by integrating a hierarchical concentrative attention and a hybrid graph-based neural network. J Biomed Inform 2021; 121:103874. [PMID: 34298157 DOI: 10.1016/j.jbi.2021.103874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 07/09/2021] [Accepted: 07/18/2021] [Indexed: 10/20/2022]
Abstract
Extracting the chemical-induced disease relation from literatures is important for biomedical research. On one hand, it is challenging to capture the interactions among remote words and the long-distance information is not adequately exploited by existing systems for document-level relation extraction. On the other hand, there is some information particularly important to the target relations in documents, which should attract more attention than the less relevant information for the relation extraction. However, this issue is not well addressed in existing methods. In this paper, we present a method that integrates a hybrid graph and a hierarchical concentrative attention to overcome these problems. The hybrid graph is constructed by synthesizing the syntactic graph and Abstract Meaning Representation graph to acquire the long-distance information for document-level relation extraction. Meanwhile, the concentrative attention is used to focus on the most important information, and alleviate the disturbance brought by the less relevant items in the document. The experimental results demonstrate that our model yields competitive performance on the dataset of chemical-induced disease relations.
Collapse
Affiliation(s)
- Hongbin Lu
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Lishuang Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| | - Zuocheng Li
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China
| | - Shiyi Zhao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.
| |
Collapse
|
8
|
Gajendran S, D M, Sugumaran V. Character level and word level embedding with bidirectional LSTM - Dynamic recurrent neural network for biomedical named entity recognition from literature. J Biomed Inform 2020; 112:103609. [PMID: 33122119 DOI: 10.1016/j.jbi.2020.103609] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 10/14/2020] [Accepted: 10/22/2020] [Indexed: 12/22/2022]
Abstract
Named Entity Recognition is the process of identifying different entities in a given context. Biomedical Named Entity Recognition (BNER) is the task of extracting chemical names from biomedical texts to support biomedical and translational research. The aim of the system is to extract useful chemical names from biomedical literature text without a lot of handcrafted engineering features. This approach introduces a novel neural network architecture with the composition of bidirectional long short-term memory (BLSTM), dynamic recurrent neural network (RNN) and conditional random field (CRF) that uses character level and word level embedding as the only features to identify the chemical entities. Using this approach we have achieved the F1 score of 89.98 on BioCreAtIvE II GM corpus and 90.84 on NCBI corpus by outperforming the existing systems. Our system is based on the deep neural architecture that uses both character and word level embedding which captures the morphological and orthographic information eliminating the need for handcrafted engineering features. The proposed system outperforms the existing systems without a lot of handcrafted engineering features. The embedding concept along with the bidirectional LSTM network proved to be an effective method to identify most of the chemical entities.
Collapse
Affiliation(s)
- Sudhakaran Gajendran
- Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India.
| | - Manjula D
- Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India.
| | - Vijayan Sugumaran
- Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA; Department of Decision and Information Sciences, School of Business Administration, Oakland University, Rochester, MI, USA.
| |
Collapse
|
9
|
Perera N, Dehmer M, Emmert-Streib F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front Cell Dev Biol 2020; 8:673. [PMID: 32984300 PMCID: PMC7485218 DOI: 10.3389/fcell.2020.00673] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/02/2020] [Indexed: 12/29/2022] Open
Abstract
The number of scientific publications in the literature is steadily growing, containing our knowledge in the biomedical, health, and clinical sciences. Since there is currently no automatic archiving of the obtained results, much of this information remains buried in textual details not readily available for further usage or analysis. For this reason, natural language processing (NLP) and text mining methods are used for information extraction from such publications. In this paper, we review practices for Named Entity Recognition (NER) and Relation Detection (RD), allowing, e.g., to identify interactions between proteins and drugs or genes and diseases. This information can be integrated into networks to summarize large-scale details on a particular biomedical or clinical problem, which is then amenable for easy data management and further analysis. Furthermore, we survey novel deep learning methods that have recently been introduced for such tasks.
Collapse
Affiliation(s)
- Nadeesha Perera
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
| | - Matthias Dehmer
- Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), Hall in Tirol, Austria
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
- Faculty of Medicine and Health Technology, Institute of Biosciences and Medical Technology, Tampere University, Tampere, Finland
| |
Collapse
|
10
|
Li Z, Yang Z, Xiang Y, Luo L, Sun Y, Lin H. Exploiting sequence labeling framework to extract document-level relations from biomedical texts. BMC Bioinformatics 2020; 21:125. [PMID: 32216746 PMCID: PMC7099809 DOI: 10.1186/s12859-020-3457-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/18/2020] [Indexed: 12/02/2022] Open
Abstract
Background Both intra- and inter-sentential semantic relations in biomedical texts provide valuable information for biomedical research. However, most existing methods either focus on extracting intra-sentential relations and ignore inter-sentential ones or fail to extract inter-sentential relations accurately and regard the instances containing entity relations as being independent, which neglects the interactions between relations. We propose a novel sequence labeling-based biomedical relation extraction method named Bio-Seq. In the method, sequence labeling framework is extended by multiple specified feature extractors so as to facilitate the feature extractions at different levels, especially at the inter-sentential level. Besides, the sequence labeling framework enables Bio-Seq to take advantage of the interactions between relations, and thus, further improves the precision of document-level relation extraction. Results Our proposed method obtained an F1-score of 63.5% on BioCreative V chemical disease relation corpus, and an F1-score of 54.4% on inter-sentential relations, which was 10.5% better than the document-level classification baseline. Also, our method achieved an F1-score of 85.1% on n2c2-ADE sub-dataset. Conclusion Sequence labeling method can be successfully used to extract document-level relations, especially for boosting the performance on inter-sentential relation extraction. Our work can facilitate the research on document-level biomedical text mining.
Collapse
Affiliation(s)
- Zhiheng Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Yang Xiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, 77030, USA
| | - Ling Luo
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Yuanyuan Sun
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| |
Collapse
|
11
|
Extracting causal relations from the literature with word vector mapping. Comput Biol Med 2019; 115:103524. [PMID: 31698234 DOI: 10.1016/j.compbiomed.2019.103524] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 10/09/2019] [Accepted: 10/25/2019] [Indexed: 11/23/2022]
Abstract
Causal graphs play an essential role in the determination of causalities and have been applied in many domains including biology and medicine. Traditional causal graph construction methods are usually data-driven and may not deliver the desired accuracy of a graph. Considering the vast number of publications with causality knowledge, extracting causal relations from the literature to help to establish causal graphs becomes possible. Current supervised-learning-based causality extraction methods requires sufficient labeled data to train a model, and rule-based causality extraction methods are limited by the predefined patterns. This paper proposes a causality extraction framework by integrating rule-based methods and unsupervised learning models to overcome these limitations. The proposed method consists of three modules, including data preprocessing, syntactic pattern matching, and causality determination. In data preprocessing, abstracts are crawled based on attribute names before sentences are extracted and simplified. In syntactic pattern matching, these simplified sentences are parsed to obtain the part-of-speech tags, and triples are achieved based on these tags by matching the two designed syntactic patterns. In causality determination, four verb seed sets are initialized, and word vectors are constructed for the verbs in both the seed sets and the triples by applying an unsupervised machine learning model. Causal relations are identified by comparing the similarity between the verbs in each triple and that in each seed set to overcome the limitation of the seed sets. Causality extraction results on the attributes from the risk factors for Alzheimer's disease show that our method outperforms Bui's method and Alashri's method in terms of precision, recall, specificity, accuracy and F-score, with increases in the F-score of 8.29% and 5.37%, respectively.
Collapse
|
12
|
Neural network-based approaches for biomedical relation classification: A review. J Biomed Inform 2019; 99:103294. [DOI: 10.1016/j.jbi.2019.103294] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 06/02/2019] [Accepted: 09/21/2019] [Indexed: 12/14/2022]
|
13
|
Chen T, Wu M, Li H. A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5645655. [PMID: 31800044 PMCID: PMC6892305 DOI: 10.1093/database/baz116] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 07/16/2019] [Accepted: 09/02/2019] [Indexed: 01/07/2023]
Abstract
The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.
Collapse
Affiliation(s)
- Tao Chen
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| | - Mingfen Wu
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| | - Hexi Li
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| |
Collapse
|
14
|
Onye SC, Akkeleş A, Dimililer N. relSCAN - A system for extracting chemical-induced disease relation from biomedical literature. J Biomed Inform 2018; 87:79-87. [PMID: 30296491 DOI: 10.1016/j.jbi.2018.09.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 09/17/2018] [Accepted: 09/30/2018] [Indexed: 11/20/2022]
Abstract
This paper proposes an effective and robust approach for Chemical-Induced Disease (CID) relation extraction from PubMed articles. The study was performed on the Chemical Disease Relation (CDR) task of BioCreative V track-3 corpus. The proposed system, named relSCAN, is an efficient CID relation extraction system with two phases to classify relation instances from the Co-occurrence and Non-Co-occurrence mention levels. We describe the case of chemical and disease mentions that occur in the same sentence as 'Co-occurrence', or as 'Non-Co-occurrence' otherwise. In the first phase, the relation instances are constructed on both mention levels. In the second phase, we employ a hybrid feature set to classify the relation instances at both of these mention levels using the combination of two Machine Learning (ML) classifiers (Support Vector Machine (SVM) and J48 Decision tree). This system is entirely corpus dependent and does not rely on information from external resources in order to boost its performance. We achieved good results, which are comparable with the other state-of-the-art CID relation extraction systems on the BioCreative V corpus. Furthermore, our system achieves the best performance on the Non-Co-occurrence mention level.
Collapse
Affiliation(s)
- Stanley Chika Onye
- Department of Applied Mathematics and Computer Science, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey.
| | - Arif Akkeleş
- Department of Mathematics, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| | - Nazife Dimililer
- Department of Information Technology, School of Computing and Technology, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| |
Collapse
|
15
|
Zheng W, Lin H, Liu X, Xu B. A document level neural model integrated domain knowledge for chemical-induced disease relations. BMC Bioinformatics 2018; 19:328. [PMID: 30223767 PMCID: PMC6142695 DOI: 10.1186/s12859-018-2316-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 08/14/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven't been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models. RESULTS In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems. CONCLUSIONS Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.
Collapse
Affiliation(s)
- Wei Zheng
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.,College of Software, Dalian JiaoTong University, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| | - Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| |
Collapse
|