Santiso S, Pérez A, Casillas A. Smoothing dense spaces for improved relation extraction between drugs and adverse reactions.
Int J Med Inform 2019;
128:39-45. [PMID:
31160010 DOI:
10.1016/j.ijmedinf.2019.05.009]
[Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 01/28/2019] [Accepted: 05/11/2019] [Indexed: 11/17/2022]
Abstract
BACKGROUND AND OBJECTIVE
This work aims at extracting Adverse Drug Reactions (ADRs), i.e. a harm directly caused by a drug at normal doses, from Electronic Health Records (EHRs). The lack of readily available EHRs because of confidentiality issues and their lexical variability make the ADR extraction challenging. Furthermore, ADRs are rare events. Therefore, efficient representations against data sparsity are needed.
METHODS
Embedding-based characterizations are able to group semantically related words. However, dense spaces suffer from data sparsity. We employed context-aware continuous representations to enhance the modelling of infrequent events through their context and we turned to simple smoothing techniques to increase the proximity between similar words (e.g. direction cosines, truncation, Principal Component Analysis (PCA) and clustering) in an attempt to cope with data sparsity.
RESULTS
An F-measure of 0.639 for the ADR classification was achieved, obtaining an improvement of approximately 0.300 in comparison with the results obtained by a word-based characterization.
CONCLUSION
The embbeding-based representation together with the smoothing techniques increased the robustness of the ADR characterization. It was proven particularly appropriate to cope with lexical variability and data sparsity.
Collapse