Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Lin H, Yang Z, Wang J, Zhang S, Sun Y, Yang L. A hybrid model based on neural networks for biomedical relation extraction. J Biomed Inform 2018;81:83-92. [DOI: 10.1016/j.jbi.2018.03.011] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 03/15/2018] [Accepted: 03/16/2018] [Indexed: 11/27/2022]

For:	Zhang Y, Lin H, Yang Z, Wang J, Zhang S, Sun Y, Yang L. A hybrid model based on neural networks for biomedical relation extraction. J Biomed Inform 2018;81:83-92. [DOI: 10.1016/j.jbi.2018.03.011] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 03/15/2018] [Accepted: 03/16/2018] [Indexed: 11/27/2022]

Number

Cited by Other Article(s)

Zhang Y, Yang Z, Yang Y, Lin H, Wang J. Location-enhanced syntactic knowledge for biomedical relation extraction. J Biomed Inform 2024;156:104676. [PMID: 38876451 DOI: 10.1016/j.jbi.2024.104676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 06/08/2024] [Accepted: 06/10/2024] [Indexed: 06/16/2024]

Singla A, Khanna R, Kaur M, Kelm K, Zaiane O, Rosenfelt CS, Bui TA, Rezaei N, Nicholas D, Reformat MZ, Majnemer A, Ogourtsova T, Bolduc F. Developing a Chatbot to Support Individuals With Neurodevelopmental Disorders: Tutorial. J Med Internet Res 2024;26:e50182. [PMID: 38888947 PMCID: PMC11220430 DOI: 10.2196/50182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/27/2023] [Accepted: 04/19/2024] [Indexed: 06/20/2024] Open

Abstract

Families of individuals with neurodevelopmental disabilities or differences (NDDs) often struggle to find reliable health information on the web. NDDs encompass various conditions affecting up to 14% of children in high-income countries, and most individuals present with complex phenotypes and related conditions. It is challenging for their families to develop literacy solely by searching information on the internet. While in-person coaching can enhance care, it is only available to a minority of those with NDDs. Chatbots, or computer programs that simulate conversation, have emerged in the commercial sector as useful tools for answering questions, but their use in health care remains limited. To address this challenge, the researchers developed a chatbot named CAMI (Coaching Assistant for Medical/Health Information) that can provide information about trusted resources covering core knowledge and services relevant to families of individuals with NDDs. The chatbot was developed, in collaboration with individuals with lived experience, to provide information about trusted resources covering core knowledge and services that may be of interest. The developers used the Django framework (Django Software Foundation) for the development and used a knowledge graph to depict the key entities in NDDs and their relationships to allow the chatbot to suggest web resources that may be related to the user queries. To identify NDD domain-specific entities from user input, a combination of standard sources (the Unified Medical Language System) and other entities were used which were identified by health professionals as well as collaborators. Although most entities were identified in the text, some were not captured in the system and therefore went undetected. Nonetheless, the chatbot was able to provide resources addressing most user queries related to NDDs. The researchers found that enriching the vocabulary with synonyms and lay language terms for specific subdomains enhanced entity detection. By using a data set of numerous individuals with NDDs, the researchers developed a knowledge graph that established meaningful connections between entities, allowing the chatbot to present related symptoms, diagnoses, and resources. To the researchers' knowledge, CAMI is the first chatbot to provide resources related to NDDs. Our work highlighted the importance of engaging end users to supplement standard generic ontologies to named entities for language recognition. It also demonstrates that complex medical and health-related information can be integrated using knowledge graphs and leveraging existing large datasets. This has multiple implications: generalizability to other health domains as well as reducing the need for experts and optimizing their input while keeping health care professionals in the loop. The researchers' work also shows how health and computer science domains need to collaborate to achieve the granularity needed to make chatbots truly useful and impactful.

Collapse

Gill JK, Chetty M, Lim S, Hallinan J. Large language model based framework for automated extraction of genetic interactions from unstructured data. PLoS One 2024;19:e0303231. [PMID: 38771886 PMCID: PMC11108146 DOI: 10.1371/journal.pone.0303231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/23/2024] [Indexed: 05/23/2024] Open

Abstract

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.

Collapse

Huang MS, Han JC, Lin PY, You YT, Tsai RTH, Hsu WL. Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource. Brief Bioinform 2024;25:bbae132. [PMID: 38609331 PMCID: PMC11014787 DOI: 10.1093/bib/bbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/06/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2024] Open

Zou S, Liu Z, Wang K, Cao J, Liu S, Xiong W, Li S. A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024;21:1489-1507. [PMID: 38303474 DOI: 10.3934/mbe.2024064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]

Nachtegael C, De Stefani J, Lenaerts T. A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction. PLoS One 2023;18:e0292356. [PMID: 38100453 PMCID: PMC10723703 DOI: 10.1371/journal.pone.0292356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 09/19/2023] [Indexed: 12/17/2023] Open

Abstract

Automatic biomedical relation extraction (bioRE) is an essential task in biomedical research in order to generate high-quality labelled data that can be used for the development of innovative predictive methods. However, building such fully labelled, high quality bioRE data sets of adequate size for the training of state-of-the-art relation extraction models is hindered by an annotation bottleneck due to limitations on time and expertise of researchers and curators. We show here how Active Learning (AL) plays an important role in resolving this issue and positively improve bioRE tasks, effectively overcoming the labelling limits inherent to a data set. Six different AL strategies are benchmarked on seven bioRE data sets, using PubMedBERT as the base model, evaluating their area under the learning curve (AULC) as well as intermediate results measurements. The results demonstrate that uncertainty-based strategies, such as Least-Confident or Margin Sampling, are statistically performing better in terms of F1-score, accuracy and precision, than other types of AL strategies. However, in terms of recall, a diversity-based strategy, called Core-set, outperforms all strategies. AL strategies are shown to reduce the annotation need (in order to reach a performance at par with training on all data), from 6% to 38%, depending on the data set; with Margin Sampling and Least-Confident Sampling strategies moreover obtaining the best AULCs compared to the Random Sampling baseline. We show through the experiments the importance of using AL methods to reduce the amount of labelling needed to construct high-quality data sets leading to optimal performance of deep learning models. The code and data sets to reproduce all the results presented in the article are available at https://github.com/oligogenic/Deep_active_learning_bioRE.

Collapse

Hu Y, Chen Y, Qin Y, Huang R. Learning entity-oriented representation for biomedical relation extraction. J Biomed Inform 2023;147:104527. [PMID: 37852347 DOI: 10.1016/j.jbi.2023.104527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 10/15/2023] [Indexed: 10/20/2023]

Xiao Y, Ji Z, Li J, Zhu Q. CLART: A cascaded lattice-and-radical transformer network for Chinese medical named entity recognition. Heliyon 2023;9:e20692. [PMID: 37876457 PMCID: PMC10590790 DOI: 10.1016/j.heliyon.2023.e20692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/01/2023] [Accepted: 10/04/2023] [Indexed: 10/26/2023] Open

Wang T, You J, Gong X, Yang S, Wang L, Chang Z. Probabilistic Bayesian Deep Learning Approach for Online Forecasting of Fed-Batch Fermentation. ACS OMEGA 2023;8:25272-25278. [PMID: 37483241 PMCID: PMC10357427 DOI: 10.1021/acsomega.3c02387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 06/22/2023] [Indexed: 07/25/2023]

Deng H, Li Q, Liu Y, Zhu J. MTMG: A multi-task model with multi-granularity information for drug-drug interaction extraction. Heliyon 2023;9:e16819. [PMID: 37484258 PMCID: PMC10360954 DOI: 10.1016/j.heliyon.2023.e16819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 07/25/2023] Open

Xie W, Fan K, Zhang S, Li L. Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature. J Biomed Semantics 2023;14:5. [PMID: 37248476 PMCID: PMC10228061 DOI: 10.1186/s13326-023-00287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/29/2023] [Indexed: 05/31/2023] Open

Abstract

BACKGROUND

Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.

RESULTS

PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.

CONCLUSIONS

By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

Collapse

Shi B, Fan R, Zhang L, Huang J, Xiong N, Vasilakos A, Wan J, Zhang L. A Joint Extraction System Based on Conditional Layer Normalization for Health Monitoring. SENSORS (BASEL, SWITZERLAND) 2023;23:4812. [PMID: 37430725 DOI: 10.3390/s23104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/10/2023] [Accepted: 05/11/2023] [Indexed: 07/12/2023]

Al-Sabri R, Gao J, Chen J, Oloulade BM, Lyu T. Multi-View Graph Neural Architecture Search for Biomedical Entity and Relation Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1221-1233. [PMID: 36074877 DOI: 10.1109/tcbb.2022.3205113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Chasseray Y, Barthe-Delanoë AM, Négny S, Le Lann JM. Knowledge extraction from textual data and performance evaluation in an unsupervised context. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]

Weber L, Sänger M, Garda S, Barth F, Alt C, Leser U. Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models. Database (Oxford) 2022;2022:6833204. [PMID: 36399413 PMCID: PMC9674024 DOI: 10.1093/database/baac098] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 11/19/2022]

Zhang Z, Chen ALP. Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning. BMC Bioinformatics 2022;23:458. [PMID: 36329384 PMCID: PMC9632084 DOI: 10.1186/s12859-022-04994-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open

An automatic hypothesis generation for plausible linkage between xanthium and diabetes. Sci Rep 2022;12:17547. [PMID: 36266295 PMCID: PMC9585073 DOI: 10.1038/s41598-022-20752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/19/2022] [Indexed: 01/13/2023] Open

Abstract

There has been a significant increase in text mining implementation for biomedical literature in recent years. Previous studies introduced the implementation of text mining and literature-based discovery to generate hypotheses of potential candidates for drug development. By conducting a hypothesis-generation step and using evidence from published journal articles or proceedings, previous studies have managed to reduce experimental time and costs. First, we applied the closed discovery approach from Swanson's ABC model to collect publications related to 36 Xanthium compounds or diabetes. Second, we extracted biomedical entities and relations using a knowledge extraction engine, the Public Knowledge Discovery Engine for Java or PKDE4J. Third, we built a knowledge graph using the obtained bio entities and relations and then generated paths with Xanthium compounds as source nodes and diabetes as the target node. Lastly, we employed graph embeddings to rank each path and evaluated the results based on domain experts' opinions and literature. Among 36 Xanthium compounds, 35 had direct paths to five diabetes-related nodes. We ranked 2,740,314 paths in total between 35 Xanthium compounds and three diabetes-related phrases: type 1 diabetes, type 2 diabetes, and diabetes mellitus. Based on the top five percentile paths, we concluded that adenosine, choline, beta-sitosterol, rhamnose, and scopoletin were potential candidates for diabetes drug development using natural products. Our framework for hypothesis generation employs a closed discovery from Swanson's ABC model that has proven very helpful in discovering biological linkages between bio entities. The PKDE4J tools we used to capture bio entities from our document collection could label entities into five categories: genes, compounds, phenotypes, biological processes, and molecular functions. Using the BioPREP model, we managed to interpret the semantic relatedness between two nodes and provided paths containing valuable hypotheses. Lastly, using a graph-embedding algorithm in our path-ranking analysis, we exploited the semantic relatedness while preserving the graph structure properties.

Collapse

Turki H, Jemielniak D, Hadj Taieb MA, Labra Gayo JE, Ben Aouicha M, Banat M, Shafee T, Prud’hommeaux E, Lubiana T, Das D, Mietchen D. Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata. PeerJ Comput Sci 2022;8:e1085. [PMID: 36262159 PMCID: PMC9575845 DOI: 10.7717/peerj-cs.1085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 08/15/2022] [Indexed: 06/16/2023]

Chen J, Sun X, Jin X, Sutcliffe R. Extracting drug-drug interactions from no-blinding texts using key semantic sentences and GHM loss. J Biomed Inform 2022;135:104192. [PMID: 36064114 DOI: 10.1016/j.jbi.2022.104192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 08/28/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022]

Luo L, Lai PT, Wei CH, Lu Z. A sequence labeling framework for extracting drug-protein relations from biomedical literature. Database (Oxford) 2022;2022:baac058. [PMID: 35856889 PMCID: PMC9297941 DOI: 10.1093/database/baac058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 07/14/2022] [Indexed: 06/15/2023]

Liu X, Tan J, Fan J, Tan K, Hu J, Dong S. A Syntax-enhanced model based on category keywords for biomedical relation extraction. J Biomed Inform 2022;132:104135. [PMID: 35842217 DOI: 10.1016/j.jbi.2022.104135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/10/2022] [Accepted: 07/05/2022] [Indexed: 10/17/2022]

Feng B, Gao J. AnthraxKP: a knowledge graph-based, Anthrax Knowledge Portal mined from biomedical literature. Database (Oxford) 2022;2022:6598946. [PMID: 35653350 PMCID: PMC9216567 DOI: 10.1093/database/baac037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 04/13/2022] [Accepted: 05/13/2022] [Indexed: 11/15/2022]

Vo TH, Nguyen NTK, Kha QH, Le NQK. On the road to explainable AI in drug-drug interactions prediction: A systematic review. Comput Struct Biotechnol J 2022;20:2112-2123. [PMID: 35832629 PMCID: PMC9092071 DOI: 10.1016/j.csbj.2022.04.021] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/15/2022] [Accepted: 04/15/2022] [Indexed: 12/26/2022] Open

Xiong Y, Peng H, Xiang Y, Wong KC, Chen Q, Yan J, Tang B. Leveraging Multi-source Knowledge for Chinese Clinical Named Entity Recognition via Relational Graph Convolutional Network. J Biomed Inform 2022;128:104035. [PMID: 35217186 DOI: 10.1016/j.jbi.2022.104035] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 02/04/2022] [Accepted: 02/18/2022] [Indexed: 11/29/2022]

Abstract

OBJECTIVE

External knowledge, such as lexicon of words in Chinese and domain knowledge graph (KG) of concepts, has been recently adopted to improve the performance of machine learning methods for named entity recognition (NER) as it can provide additional information beyond context. However, most existing studies only consider knowledge from one source (i.e., either lexicon or knowledge graph) in different ways and consider lexicon words or KG concepts independently with their boundaries. In this paper, we focus on leveraging multi-source knowledge in a unified manner where lexicon words or KG concepts are well combined with their boundaries for Chinese Clinical NER (CNER).

MATERIAL AND METHODS

We propose a novel method based on relational graph convolutional network (RGCN), called MKRGCN, to utilize multi-source knowledge in a unified manner for CNER. For any sentence, a relational graph based on words or concepts in each knowledge source is constructed, where lexicon words or KG concepts appearing in the sentence are linked to the containing tokens with the boundary information of the lexicon words or KG concepts. RGCN is used to model all relational graphs constructed from multi-source knowledge, and the representations of tokens from multi-source knowledge are integrated into the context representations of tokens via an attention mechanism. Based on the knowledge-enhanced representations of tokens, we deploy a conditional random field (CRF) layer for named entity label prediction. In this study, a lexicon of words and a medical knowledge graph are used as knowledge sources for Chinese CNER.

RESULTS

Our proposed method achieves the best performance on CCKS2017 and CCKS2018 in Chinese with F1-scores of 91.88% and 89.91%, respectively, significantly outperforming existing methods. The extended experiments on NCBI-Disease and BC2GM in English also prove the effectiveness of our method when only considering one knowledge source via RGCN.

CONCLUSION

The MKRGCN model can integrate knowledge from the external lexicon and knowledge graph effectively for Chinese CNER and has the potential to be applied to English NER.

Collapse

Pourreza Shahri M, Kahanda I. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes. BMC Bioinformatics 2021;22:500. [PMID: 34656098 PMCID: PMC8520253 DOI: 10.1186/s12859-021-04421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results

In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions

This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Collapse

Protein-protein interaction relation extraction based on multigranularity semantic fusion. J Biomed Inform 2021;123:103931. [PMID: 34628063 DOI: 10.1016/j.jbi.2021.103931] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/12/2021] [Accepted: 10/04/2021] [Indexed: 01/02/2023]

Huang K, Xiao C, Glass LM, Critchlow CW, Gibson G, Sun J. Machine learning applications for therapeutic tasks with genomics data. PATTERNS (NEW YORK, N.Y.) 2021;2:100328. [PMID: 34693370 PMCID: PMC8515011 DOI: 10.1016/j.patter.2021.100328] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021;11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open

Kanjirangat V, Rinaldi F. Enhancing Biomedical Relation Extraction with Transformer Models using Shortest Dependency Path Features and Triplet Information. J Biomed Inform 2021;122:103893. [PMID: 34481058 DOI: 10.1016/j.jbi.2021.103893] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 08/17/2021] [Accepted: 08/22/2021] [Indexed: 10/20/2022]

Abstract

Entity relation extraction plays an important role in the biomedical, healthcare, and clinical research areas. Recently, pre-trained models based on transformer architectures and their variants have shown remarkable performances in various natural language processing tasks. Most of these variants were based on slight modifications in the architectural components, representation schemes and augmenting data using distant supervision methods. In distantly supervised methods, one of the main challenges is pruning out noisy samples. A similar situation can arise when the training samples are not directly available but need to be constructed from the given dataset. The BioCreative V Chemical Disease Relation (CDR) task provides a dataset that does not explicitly offer mention-level gold annotations and hence replicates the above scenario. Selecting the representative sentences from the given abstract or document text that could convey a potential entity relationship becomes essential. Most of the existing methods in literature propose to either consider the entire text or all the sentences which contain the entity mentions. This could be a computationally expensive and time consuming approach. This paper presents a novel approach to handle such scenarios, specifically in biomedical relation extraction. We propose utilizing the Shortest Dependency Path (SDP) features for constructing data samples by pruning out noisy information and selecting the most representative samples for model learning. We also utilize triplet information in model learning using the biomedical variant of BERT, viz., BioBERT. The problem is represented as a sentence pair classification task using the sentence and the entity-relation pair as input. We analyze the approach on both intra-sentential and inter-sentential relations in the CDR dataset. The proposed approach that utilizes the SDP and triplet features presents promising results, specifically on the inter-sentential relation extraction task. We make the code used for this work publicly available on Github.¹.

Collapse

Hong G, Kim Y, Choi Y, Song M. BioPREP: Deep learning-based predicate classification with SemMedDB. J Biomed Inform 2021;122:103888. [PMID: 34411707 DOI: 10.1016/j.jbi.2021.103888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 06/03/2021] [Accepted: 08/13/2021] [Indexed: 11/16/2022]

Yang X, Wu C, Nenadic G, Wang W, Lu K. Mining a stroke knowledge graph from literature. BMC Bioinformatics 2021;22:387. [PMID: 34325669 PMCID: PMC8319697 DOI: 10.1186/s12859-021-04292-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 01/01/2023] Open

Fei H, Zhang Y, Ren Y, Ji D. A span-graph neural model for overlapping entity relation extraction in biomedical texts. Bioinformatics 2021;37:1581-1589. [PMID: 33245108 DOI: 10.1093/bioinformatics/btaa993] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 10/25/2020] [Accepted: 11/17/2020] [Indexed: 01/14/2023] Open

Zhao D, Wang J, Lin H, Wang X, Yang Z, Zhang Y. Biomedical cross-sentence relation extraction via multihead attention and graph convolutional networks. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107230] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Azer K, Kaddi CD, Barrett JS, Bai JPF, McQuade ST, Merrill NJ, Piccoli B, Neves-Zaph S, Marchetti L, Lombardo R, Parolo S, Immanuel SRC, Baliga NS. History and Future Perspectives on the Discipline of Quantitative Systems Pharmacology Modeling and Its Applications. Front Physiol 2021;12:637999. [PMID: 33841175 PMCID: PMC8027332 DOI: 10.3389/fphys.2021.637999] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 01/25/2021] [Indexed: 12/24/2022] Open

Lee CY, Chen YP. Descriptive prediction of drug side‐effects using a hybrid deep learning model. INT J INTELL SYST 2021. [DOI: 10.1002/int.22389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Drug-Drug interaction extraction using a position and similarity fusion-based attention mechanism. J Biomed Inform 2021;115:103707. [PMID: 33571676 DOI: 10.1016/j.jbi.2021.103707] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 12/21/2020] [Accepted: 02/03/2021] [Indexed: 11/20/2022]

Chasseray Y, Barthe-Delanoë AM, Négny S, Le Lann JM. A generic metamodel for data extraction and generic ontology population. J Inf Sci 2021. [DOI: 10.1177/0165551521989641] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Zhang L, Hu J, Xu Q, Li F, Rao G, Tao C. A semantic relationship mining method among disorders, genes, and drugs from different biomedical datasets. BMC Med Inform Decis Mak 2020;20:283. [PMID: 33317518 PMCID: PMC7734713 DOI: 10.1186/s12911-020-01274-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 09/22/2020] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

Semantic web technology has been applied widely in the biomedical informatics field. Large numbers of biomedical datasets are available online in the resource description framework (RDF) format. Semantic relationship mining among genes, disorders, and drugs is widely used in, for example, precision medicine and drug repositioning. However, most of the existing studies focused on a single dataset. It is not easy to find the most current relationships among disorder-gene-drug relationships since the relationships are distributed in heterogeneous datasets. How to mine their semantic relationships from different biomedical datasets is an important issue.

METHODS

First, a variety of biomedical datasets were converted into RDF triple data; then, multisource biomedical datasets were integrated into a storage system using a data integration algorithm. Second, nine query patterns among genes, disorders, and drugs from different biomedical datasets were designed. Third, the gene-disorder-drug semantic relationship mining algorithm is presented. This algorithm can query the relationships among various entities from different datasets.

RESULTS AND CONCLUSIONS

We focused on mining the putative and the most current disorder-gene-drug relationships about Parkinson's disease (PD). The results demonstrate that our method has significant advantages in mining and integrating multisource heterogeneous biomedical datasets. Twenty-five new relationships among the genes, disorders, and drugs were mined from four different datasets. The query results showed that most of them came from different datasets. The precision of the method increased by 2.51% compared to that of the multisource linked open data fusion method presented in the 4th International Workshop on Semantics-Powered Data Mining and Analytics (SEPDA 2019). Moreover, the number of query results increased by 7.7%, and the number of correct queries increased by 9.5%.

Collapse

DECAB-LSTM: Deep Contextualized Attentional Bidirectional LSTM for cancer hallmark classification. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106486] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Wang J, Li M, Diao Q, Lin H, Yang Z, Zhang Y. Biomedical document triage using a hierarchical attention-based capsule network. BMC Bioinformatics 2020;21:380. [PMID: 32938366 PMCID: PMC7495737 DOI: 10.1186/s12859-020-03673-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Perera N, Dehmer M, Emmert-Streib F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front Cell Dev Biol 2020;8:673. [PMID: 32984300 PMCID: PMC7485218 DOI: 10.3389/fcell.2020.00673] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/02/2020] [Indexed: 12/29/2022] Open

Xie J, Jiang J, Wang Y, Guan Y, Guo X. Learning an expandable EMR-based medical knowledge network to enhance clinical diagnosis. Artif Intell Med 2020;107:101927. [PMID: 32828460 DOI: 10.1016/j.artmed.2020.101927] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 10/04/2019] [Accepted: 07/02/2020] [Indexed: 01/10/2023]

Chinese medical relation extraction based on multi-hop self-attention mechanism. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01176-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-0189-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Bio-semantic relation extraction with attention-based external knowledge reinforcement. BMC Bioinformatics 2020;21:213. [PMID: 32448122 PMCID: PMC7245897 DOI: 10.1186/s12859-020-3540-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 05/07/2020] [Indexed: 12/13/2022] Open

Lee CY, Chen YPP. Prediction of drug adverse events using deep learning in pharmaceutical discovery. Brief Bioinform 2020;22:1884-1901. [PMID: 32349125 DOI: 10.1093/bib/bbaa040] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 02/08/2020] [Accepted: 02/25/2020] [Indexed: 01/11/2023] Open

Wu H, Xing Y, Ge W, Liu X, Zou J, Zhou C, Liao J. Drug-drug interaction extraction via hybrid neural networks on biomedical literature. J Biomed Inform 2020;106:103432. [PMID: 32335223 DOI: 10.1016/j.jbi.2020.103432] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 04/15/2020] [Accepted: 04/20/2020] [Indexed: 01/16/2023]

Döring K, Qaseem A, Becer M, Li J, Mishra P, Gao M, Kirchner P, Sauter F, Telukunta KK, Moumbock AFA, Thomas P, Günther S. Automated recognition of functional compound-protein relationships in literature. PLoS One 2020;15:e0220925. [PMID: 32126064 PMCID: PMC7053725 DOI: 10.1371/journal.pone.0220925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 01/29/2020] [Indexed: 11/18/2022] Open

Zhang Y, Lin H, Yang Z, Wang J, Sun Y. Chemical-protein interaction extraction via contextualized word representations and multihead attention. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2019:5498050. [PMID: 31125403 PMCID: PMC6534182 DOI: 10.1093/database/baz054] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/16/2019] [Accepted: 04/02/2019] [Indexed: 12/17/2022]

Jettakul A, Wichadakul D, Vateekul P. Relation extraction between bacteria and biotopes from biomedical texts with attention mechanisms and domain-specific contextual representations. BMC Bioinformatics 2019;20:627. [PMID: 31795930 PMCID: PMC6889521 DOI: 10.1186/s12859-019-3217-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/12/2019] [Indexed: 12/21/2022] Open