1
|
Zhao W, Zhang J, Yang J, Jiang X, He T. Document-Level Chemical-Induced Disease Relation Extraction via Hierarchical Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2782-2793. [PMID: 34077368 DOI: 10.1109/tcbb.2021.3086090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Over the past decades, Chemical-induced Disease (CID) relations have attracted extensive attention in biomedical community, reflecting wide applications in biomedical research and healthcare field. However, prior efforts fail to make full use of the interaction between local and global contexts in biomedical document, and the derived performance needs to be improved accordingly. In this paper, we propose a novel framework for document-level CID relation extraction. More specifically, a stacked Hypergraph Aggregation Neural Network (HANN) layers are introduced to model the complicated interaction between local and global contexts, based on which better contextualized representations are obtained for CID relation extraction. In addition, the CID Relation Heterogeneous Graph is constructed to capture the information with different granularities and improve further the performance of CID relation classification. Experiments on a real-world dataset demonstrate the effectiveness of the proposed framework.
Collapse
|
2
|
Li Z, Wang M, Peng D, Liu J, Xie Y, Dai Z, Zou X. Identification of Chemical-Disease Associations Through Integration of Molecular Fingerprint, Gene Ontology and Pathway Information. Interdiscip Sci 2022; 14:683-696. [PMID: 35391615 DOI: 10.1007/s12539-022-00511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 06/14/2023]
Abstract
The identification of chemical-disease association types is helpful not only to discovery lead compounds and study drug repositioning, but also to treat disease and decipher pathomechanism. It is very urgent to develop computational method for identifying potential chemical-disease association types, since wet methods are usually expensive, laborious and time-consuming. In this study, molecular fingerprint, gene ontology and pathway are utilized to characterize chemicals and diseases. A novel predictor is proposed to recognize potential chemical-disease associations at the first layer, and further distinguish whether their relationships belong to biomarker or therapeutic relations at the second layer. The prediction performance of current method is assessed using the benchmark dataset based on ten-fold cross-validation. The practical prediction accuracies of the first layer and the second layer are 78.47% and 72.07%, respectively. The recognition ability for lead compounds, new drug indications, potential and true chemical-disease association pairs has also been investigated and confirmed by constructing a variety of datasets and performing a series of experiments. It is anticipated that the current method can be considered as a powerful high-throughput virtual screening tool for drug researches and developments.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
- NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, People's Republic of China.
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, 510006, People's Republic of China.
| | - Mengru Wang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Jie Liu
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yun Xie
- HuiZhou University, Huizhou, 516007, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| |
Collapse
|
3
|
Zaslavsky L, Cheng T, Gindulyte A, He S, Kim S, Li Q, Thiessen P, Yu B, Bolton EE. Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem. Front Res Metr Anal 2021; 6:689059. [PMID: 34322655 PMCID: PMC8311438 DOI: 10.3389/frma.2021.689059] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/17/2021] [Indexed: 11/13/2022] Open
Abstract
The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.
Collapse
Affiliation(s)
- Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Paul Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
4
|
Mitra S, Saha S, Hasanuzzaman M. A Multi-View Deep Neural Network Model for Chemical-Disease Relation Extraction From Imbalanced Datasets. IEEE J Biomed Health Inform 2020; 24:3315-3325. [DOI: 10.1109/jbhi.2020.2983365] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
5
|
Yu L, Yu S. Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction. Int J Med Inform 2020; 141:104234. [PMID: 32693245 PMCID: PMC7357526 DOI: 10.1016/j.ijmedinf.2020.104234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 07/01/2020] [Accepted: 07/11/2020] [Indexed: 11/25/2022]
Abstract
Wikipedia contains rich biomedical information that can support medical informatics studies and applications. Identifying the subset of medical articles of Wikipedia has many benefits, such as facilitating medical knowledge extraction, serving as a corpus for language modeling, or simply making the size of data easy to work with. However, due to the extremely low prevalence of medical articles in the entire Wikipedia, articles identified by generic text classifiers would be bloated by irrelevant pages. To control the false discovery rate while maintaining a high recall, we developed a mechanism that leverages the rich page elements and the connected nature of Wikipedia and uses a crawling classification strategy to achieve accurate classification. Structured assertional knowledge in Infoboxes and Wikidata items associated with the identified medical articles were also extracted. This automatic mechanism is aimed to run periodically to update the results and share them with the informatics community.
Collapse
Affiliation(s)
- Lishan Yu
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; Institute for Data Science, Tsinghua University, Beijing, China.
| |
Collapse
|
6
|
Wang J, Chen X, Zhang Y, Zhang Y, Wen J, Lin H, Yang Z, Wang X. Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation. JMIR Med Inform 2020; 8:e17638. [PMID: 32459636 PMCID: PMC7458061 DOI: 10.2196/17638] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 04/14/2020] [Accepted: 04/25/2020] [Indexed: 11/22/2022] Open
Abstract
Background Automatically extracting relations between chemicals and diseases plays an important role in biomedical text mining. Chemical-disease relation (CDR) extraction aims at extracting complex semantic relationships between entities in documents, which contain intrasentence and intersentence relations. Most previous methods did not consider dependency syntactic information across the sentences, which are very valuable for the relations extraction task, in particular, for extracting the intersentence relations accurately. Objective In this paper, we propose a novel end-to-end neural network based on the graph convolutional network (GCN) and multihead attention, which makes use of the dependency syntactic information across the sentences to improve CDR extraction task. Methods To improve the performance of intersentence relation extraction, we constructed a document-level dependency graph to capture the dependency syntactic information across sentences. GCN is applied to capture the feature representation of the document-level dependency graph. The multihead attention mechanism is employed to learn the relatively important context features from different semantic subspaces. To enhance the input representation, the deep context representation is used in our model instead of traditional word embedding. Results We evaluate our method on CDR corpus. The experimental results show that our method achieves an F-measure of 63.5%, which is superior to other state-of-the-art methods. In the intrasentence level, our method achieves a precision, recall, and F-measure of 59.1%, 81.5%, and 68.5%, respectively. In the intersentence level, our method achieves a precision, recall, and F-measure of 47.8%, 52.2%, and 49.9%, respectively. Conclusions The GCN model can effectively exploit the across sentence dependency information to improve the performance of intersentence CDR extraction. Both the deep context representation and multihead attention are helpful in the CDR extraction task.
Collapse
Affiliation(s)
- Jian Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Xiaoyu Chen
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yu Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yijia Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jiabin Wen
- Department of VIP, The Second Hospital of Dalian Medical University, Dalian, China
| | - Hongfei Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Zhihao Yang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Xin Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
7
|
Zhou H, Yang Y, Ning S, Liu Z, Lang C, Lin Y, Huang D. Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1879-1889. [PMID: 29994540 DOI: 10.1109/tcbb.2018.2838661] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems.
Collapse
|
8
|
Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via attention-based distant supervision. BMC Bioinformatics 2019; 20:403. [PMID: 31331263 PMCID: PMC6647285 DOI: 10.1186/s12859-019-2884-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 05/08/2019] [Indexed: 11/24/2022] Open
Abstract
Background Automatically understanding chemical-disease relations (CDRs) is crucial in various areas of biomedical research and health care. Supervised machine learning provides a feasible solution to automatically extract relations between biomedical entities from scientific literature, its success, however, heavily depends on large-scale biomedical corpora manually annotated with intensive labor and tremendous investment. Results We present an attention-based distant supervision paradigm for the BioCreative-V CDR extraction task. Training examples at both intra- and inter-sentence levels are generated automatically from the Comparative Toxicogenomics Database (CTD) without any human intervention. An attention-based neural network and a stacked auto-encoder network are applied respectively to induce learning models and extract relations at both levels. After merging the results of both levels, the document-level CDRs can be finally extracted. It achieves the precision/recall/F1-score of 60.3%/73.8%/66.4%, outperforming the state-of-the-art supervised learning systems without using any annotated corpus. Conclusion Our experiments demonstrate that distant supervision is promising for extracting chemical disease relations from biomedical literature, and capturing both local and global attention features simultaneously is effective in attention-based distantly supervised learning.
Collapse
Affiliation(s)
- Jinghang Gu
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China.,Big Data Group, Baidu Inc., Beijing, China
| | - Fuqing Sun
- Department of Gynecology Minimally Invasive Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, China
| | - Longhua Qian
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China.
| | - Guodong Zhou
- Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| |
Collapse
|
9
|
Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019; 10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open
Abstract
Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.
Collapse
Affiliation(s)
- Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Andigoni Malousi
- Laboratory of Biological Chemistry, Department of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France.,Public Health and Medical Information Unit, University Hospital of Saint-Etienne, Saint-Étienne, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
10
|
Chen T, Wu M, Li H. A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5645655. [PMID: 31800044 PMCID: PMC6892305 DOI: 10.1093/database/baz116] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 07/16/2019] [Accepted: 09/02/2019] [Indexed: 01/07/2023]
Abstract
The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.
Collapse
Affiliation(s)
- Tao Chen
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| | - Mingfen Wu
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| | - Hexi Li
- Department of Computer Science and Engineering, Faculty of Intelligent Manufacturing, Wuyi University, No.22, Dongcheng village, Pengjiang district, Jiangmen City, Guangdong Province, 529020, China
| |
Collapse
|
11
|
Zhou H, Liu Z, Ning S, Yang Y, Lang C, Lin Y, Ma K. Leveraging prior knowledge for protein-protein interaction extraction with memory network. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5053999. [PMID: 30010731 PMCID: PMC6047414 DOI: 10.1093/database/bay071] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/14/2018] [Indexed: 11/14/2022]
Abstract
Automatically extracting protein-protein interactions (PPIs) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset. The paper also shows that multiple computational layers over an external memory are superior to long short-term memory networks with the local memories.Database URL: http://www.biocreative.org/tasks/biocreative-vi/track-4/.
Collapse
Affiliation(s)
- Huiwei Zhou
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Zhuang Liu
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Shixian Ning
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Yunlong Yang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Chengkun Lang
- School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Yingyu Lin
- School of Foreign Languages, Dalian University of Technology, Arts Building, No. 2 Linggong Road, Ganjingzi District, Dalian, Liaoning, China
| | - Kun Ma
- School of Life Science and Medicine, Dalian University of Technology, F03 Building, No. 2 Dagong Road, Liaodongwan District, Panjin, Liaoning, China
| |
Collapse
|
12
|
Onye SC, Akkeleş A, Dimililer N. relSCAN - A system for extracting chemical-induced disease relation from biomedical literature. J Biomed Inform 2018; 87:79-87. [PMID: 30296491 DOI: 10.1016/j.jbi.2018.09.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 09/17/2018] [Accepted: 09/30/2018] [Indexed: 11/20/2022]
Abstract
This paper proposes an effective and robust approach for Chemical-Induced Disease (CID) relation extraction from PubMed articles. The study was performed on the Chemical Disease Relation (CDR) task of BioCreative V track-3 corpus. The proposed system, named relSCAN, is an efficient CID relation extraction system with two phases to classify relation instances from the Co-occurrence and Non-Co-occurrence mention levels. We describe the case of chemical and disease mentions that occur in the same sentence as 'Co-occurrence', or as 'Non-Co-occurrence' otherwise. In the first phase, the relation instances are constructed on both mention levels. In the second phase, we employ a hybrid feature set to classify the relation instances at both of these mention levels using the combination of two Machine Learning (ML) classifiers (Support Vector Machine (SVM) and J48 Decision tree). This system is entirely corpus dependent and does not rely on information from external resources in order to boost its performance. We achieved good results, which are comparable with the other state-of-the-art CID relation extraction systems on the BioCreative V corpus. Furthermore, our system achieves the best performance on the Non-Co-occurrence mention level.
Collapse
Affiliation(s)
- Stanley Chika Onye
- Department of Applied Mathematics and Computer Science, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey.
| | - Arif Akkeleş
- Department of Mathematics, Faculty of Arts & Sciences, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| | - Nazife Dimililer
- Department of Information Technology, School of Computing and Technology, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
| |
Collapse
|
13
|
Zheng W, Lin H, Liu X, Xu B. A document level neural model integrated domain knowledge for chemical-induced disease relations. BMC Bioinformatics 2018; 19:328. [PMID: 30223767 PMCID: PMC6142695 DOI: 10.1186/s12859-018-2316-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 08/14/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The effective combination of texts and knowledge may improve performances of natural language processing tasks. For the recognition of chemical-induced disease (CID) relations which may span sentence boundaries in an article, although existing CID systems explored the utilization for knowledge bases, the effects of different knowledge on the identification of a special CID haven't been distinguished by these systems. Moreover, systems based on neural network only constructed sentence or mention level models. RESULTS In this work, we proposed an effective document level neural model integrated domain knowledge to extract CID relations from biomedical articles. Basic semantic information of an article with respect to a special CID candidate pair was learned from the document level sub-network module. Furthermore, knowledge attention depending on the representation of the article was proposed to distinguish the influences of different knowledge on the special CID pair and then the final representation of knowledge was formed by aggregating weighed knowledge. Finally, the integrated representations of texts and knowledge were passed to a softmax classifier to perform the CID recognition. Experimental results on the chemical-disease relation corpus proposed by BioCreative V show that our proposed system integrated knowledge achieves a good overall performance compared with other state-of-the-art systems. CONCLUSIONS Experimental analyses demonstrate that the introduced attention mechanism on domain knowledge plays a significant role in distinguishing influences of different knowledge on the judgment for a special CID relation.
Collapse
Affiliation(s)
- Wei Zheng
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.,College of Software, Dalian JiaoTong University, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| | - Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
| |
Collapse
|
14
|
Chemical-induced disease relation extraction with dependency information and prior knowledge. J Biomed Inform 2018; 84:171-178. [DOI: 10.1016/j.jbi.2018.07.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/09/2018] [Accepted: 07/11/2018] [Indexed: 11/18/2022]
|
15
|
Zheng W, Lin H, Li Z, Liu X, Li Z, Xu B, Zhang Y, Yang Z, Wang J. An effective neural model extracting document level chemical-induced disease relations from biomedical literature. J Biomed Inform 2018; 83:1-9. [DOI: 10.1016/j.jbi.2018.05.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 03/14/2018] [Accepted: 05/04/2018] [Indexed: 01/06/2023]
|
16
|
Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via convolutional neural network. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3098440. [PMID: 28415073 PMCID: PMC5467558 DOI: 10.1093/database/bax024] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/01/2017] [Indexed: 01/08/2023]
Abstract
This article describes our work on the BioCreative-V chemical–disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach. Database URL:http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/
Collapse
Affiliation(s)
- Jinghang Gu
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| | - Fuqing Sun
- Department of Gynecology Minimally Invasive Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, 17 Qihelou Street, Beijing, China
| | - Longhua Qian
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| | - Guodong Zhou
- School of Computer Science and Technology, Soochow University, 1 Shizi Street, Suzhou, China
| |
Collapse
|