1
|
Bai L, Zhang ZT, Guan H, Liu W, Chen L, Yuan D, Chen P, Xue M, Yan G. Rapid and accurate quality evaluation of Angelicae Sinensis Radix based on near-infrared spectroscopy and Bayesian optimized LSTM network. Talanta 2024; 275:126098. [PMID: 38640523 DOI: 10.1016/j.talanta.2024.126098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 04/08/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024]
Abstract
The authentic traditional Chinese medicines (TCMs) including Angelicae Sinensis Radix (ASR) are the representative of high-quality herbals in China. However, ASR from authentic region being adulterated or counterfeited is frequently occurring, and there is still a lack of rapid quality evaluation methods for identifying the authentic ASR. In this study, the color features of ASR were firstly characterized. The results showed that the authentic ASR cannot be fully identified by color characteristics. Then near-infrared (NIR) spectroscopy combined with Bayesian optimized long short-term memory (BO-LSTM) was used to evaluate the quality of ASR, and the performance of BO-LSTM with common classification and regression algorithms was compared. The results revealed that following the pretreatment of NIR spectra, the optimal NIR spectra combined with BO-LSTM not only successfully distinguished authentic, non-authentic, and adulterated ASR with 100 % accuracy, but also accurately predicted the adulteration concentration of authentic ASR (R2 > 0.99). Moreover, BO-LSTM demonstrated excellent performance in classification and regression compared with common algorithms (ANN, SVM, PLSR, etc.). Overall, the proposed strategy could quickly and accurately evaluate the quality of ASR, which provided a reference for other TCMs.
Collapse
Affiliation(s)
- Lei Bai
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Zhi-Tong Zhang
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Huanhuan Guan
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Wenjian Liu
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Li Chen
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Dongping Yuan
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Pan Chen
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China
| | - Mei Xue
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing 210023, China.
| | - Guojun Yan
- School of Pharmacy, Nanjing University of Chinese Medicine, Jiangsu Engineering Research Center for Development and Application of External Drugs in Traditional Chinese Medicine, Jiangsu Province Engineering Research Center of Classical Prescription, Nanjing 210023, China.
| |
Collapse
|
2
|
Singh A, Krishnamoorthy S, Ortega JE. NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:353-369. [PMID: 38681752 PMCID: PMC11052986 DOI: 10.1007/s41666-023-00136-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 05/08/2023] [Accepted: 07/03/2023] [Indexed: 05/01/2024]
Abstract
One of the common tasks in clinical natural language processing is medical entity linking (MEL) which involves mention detection followed by linking the mention to an entity in a knowledge base. One reason that MEL has not been solved is due to a problem that occurs in language where ambiguous texts can be resolved to several named entities. This problem is exacerbated when processing the text found in electronic health records. Recent work has shown that deep learning models based on transformers outperform previous methods on linking at higher rates of performance. We introduce NeighBERT, a custom pre-training technique which extends BERT (Devlin et al [1]) by encoding how entities are related within a knowledge graph. This technique adds relational context that has been traditionally missing in original BERT, helping resolve the ambiguity found in clinical text. In our experiments, NeighBERT improves the precision, recall, and F1-score of the state of the art by 1-3 points for named entity recognition and 10-15 points for MEL on two widely known clinical datasets. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00136-3.
Collapse
Affiliation(s)
- Ayush Singh
- inQbator AI, Evernorth Health Services, Saint Louis, MO USA
| | | | - John E. Ortega
- inQbator AI, Evernorth Health Services, Saint Louis, MO USA
| |
Collapse
|
3
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
4
|
Molina M, Jiménez C, Montenegro C. Improving Drug-Drug Interaction Extraction with Gaussian Noise. Pharmaceutics 2023; 15:1823. [PMID: 37514010 PMCID: PMC10385013 DOI: 10.3390/pharmaceutics15071823] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/28/2023] [Accepted: 06/12/2023] [Indexed: 07/30/2023] Open
Abstract
Drug-Drug Interactions (DDIs) produce essential and valuable insights for healthcare professionals, since they provide data on the impact of concurrent administration of medications to patients during therapy. In that sense, some relevant works, related to the DDIExtraction2013 Challenge, are available in the current technical literature. This study aims to improve previous results, using two models, where a Gaussian noise layer is added to achieve better DDI relationship extraction. (1) A Piecewise Convolutional Neural Network (PW-CNN) model is used to capture relationships among pharmacological entities described in biomedical databases. Additionally, the model incorporates multichannel words to enrich a person's vocabulary and reduce unfamiliar words. (2) The model uses the pre-trained BERT language model to classify relationships, while also integrating data from the target entities. After identifying the target entities, the model transfers the relevant information through the pre-trained architecture and integrates the encoded data for both entities. The results of the experiment show an improved performance, with respect to previous models.
Collapse
Affiliation(s)
- Marco Molina
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| | - Cristina Jiménez
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| | - Carlos Montenegro
- Department of Informatics and Computer Science, Faculty of Systems Engineering, Escuela Politécnica Nacional, Av. Ladron de Guevara E11-25, Quito 170525, Ecuador
| |
Collapse
|
5
|
Shah NM, Jang HJ, Liang Y, Maeng JH, Tzeng SC, Wu A, Basri NL, Qu X, Fan C, Li A, Katz B, Li D, Xing X, Evans BS, Wang T. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat Genet 2023; 55:631-639. [PMID: 36973455 DOI: 10.1038/s41588-023-01349-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 02/23/2023] [Indexed: 03/29/2023]
Abstract
Cryptic promoters within transposable elements (TEs) can be transcriptionally reactivated in tumors to create new TE-chimeric transcripts, which can produce immunogenic antigens. We performed a comprehensive screen for these TE exaptation events in 33 TCGA tumor types, 30 GTEx adult tissues and 675 cancer cell lines, and identified 1,068 TE-exapted candidates with the potential to generate shared tumor-specific TE-chimeric antigens (TS-TEAs). Whole-lysate and HLA-pulldown mass spectrometry data confirmed that TS-TEAs are presented on the surface of cancer cells. In addition, we highlight tumor-specific membrane proteins transcribed from TE promoters that constitute aberrant epitopes on the extracellular surface of cancer cells. Altogether, we showcase the high pan-cancer prevalence of TS-TEAs and atypical membrane proteins that could potentially be therapeutically exploited and targeted.
Collapse
Affiliation(s)
- Nakul M Shah
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Yonghao Liang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Angela Wu
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Noah L Basri
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xuan Qu
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Changxu Fan
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Amy Li
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Benjamin Katz
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xiaoyun Xing
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
6
|
Ontology Learning Applications of Knowledge Base Construction for Microelectronic Systems Information. INFORMATION 2023. [DOI: 10.3390/info14030176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023] Open
Abstract
Knowledge base construction (KBC) using AI has been one of the key goals of this highly popular technology since its emergence, as it helps to comprehend everything, including relations, around us. The construction of knowledge bases can summarize a piece of text in a machine-processable and understandable way. This can prove to be valuable and assistive to knowledge engineers. In this paper, we present the application of natural language processing in the construction of knowledge bases. We demonstrate how a trained bidirectional long short-term memory or bi-LSTM neural network model can be used to construct knowledge bases in accordance with the exact ISO26262 definitions as defined in the GENIAL! Basic Ontology. We provide the system with an electronic text document from the microelectronics domain and the system attempts to create a knowledge base from the available information in textual format. This information is then expressed in the form of graphs when queried by the user. This method of information retrieval presents the user with a much more technical and comprehensive understanding of an expert piece of text. This is achieved by applying the process of named entity recognition (NER) for knowledge extraction. This paper provides a result report of the current status of our knowledge construction process and knowledge base content, as well as describes our challenges and experiences.
Collapse
|
7
|
Lopes A, Carbonera J, Schmidt D, Garcia L, Rodrigues F, Abel M. Using terms and informal definitions to classify domain entities into top-level ontology concepts: An approach based on language models. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
8
|
Sousa D, Couto FM. Biomedical Relation Extraction with Knowledge Graph-based Recommendations. IEEE J Biomed Health Inform 2022; 26:4207-4217. [PMID: 35536818 DOI: 10.1109/jbhi.2022.3173558] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Biomedical Relation Extraction (RE) systems identify and classify relations between biomedical entities to enhance our knowledge of biological and medical processes. Most state-of-the-art systems use deep learning approaches, mainly to target relations between entities of the same type, such as proteins or pharmacological substances. However, these systems are mostly restricted to what they directly identify on the text and ignore specialized domain knowledge bases, such as ontologies, that formalize and integrate biomedical information typically structured as direct acyclic graphs. On the other hand, Knowledge Graph (KG)-based recommendation systems already showed the importance of integrating KGs to add additional features to items. Typical systems have users as people and items that can range from movies to books, which people saw or read and classified according to their satisfaction rate. This work proposes to integrate KGs into biomedical RE through a recommendation model to further improve their range of action. We developed a new RE system, named K-BiOnt, by integrating a baseline state-of-the-art deep biomedical RE system with an existing KG-based recommendation state-of-the-art system. Our results show that adding recommendations from KG-based recommendation improves the system's ability to identify true relations that the baseline deep RE model could not extract from the text. All the software and data supporting our work will be made publicly available upon acceptance of this manuscript.
Collapse
|
9
|
Deep learning for predicting respiratory rate from biosignals. Comput Biol Med 2022; 144:105338. [DOI: 10.1016/j.compbiomed.2022.105338] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/27/2022] [Accepted: 02/10/2022] [Indexed: 12/23/2022]
|
10
|
Vo TH, Nguyen NTK, Kha QH, Le NQK. On the road to explainable AI in drug-drug interactions prediction: A systematic review. Comput Struct Biotechnol J 2022; 20:2112-2123. [PMID: 35832629 PMCID: PMC9092071 DOI: 10.1016/j.csbj.2022.04.021] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/15/2022] [Accepted: 04/15/2022] [Indexed: 12/26/2022] Open
Abstract
Over the past decade, polypharmacy instances have been common in multi-diseases treatment. However, unwanted drug-drug interactions (DDIs) that might cause unexpected adverse drug events (ADEs) in multiple regimens therapy remain a significant issue. Since artificial intelligence (AI) is ubiquitous today, many AI prediction models have been developed to predict DDIs to support clinicians in pharmacotherapy-related decisions. However, even though DDI prediction models have great potential for assisting physicians in polypharmacy decisions, there are still concerns regarding the reliability of AI models due to their black-box nature. Building AI models with explainable mechanisms can augment their transparency to address the above issue. Explainable AI (XAI) promotes safety and clarity by showing how decisions are made in AI models, especially in critical tasks like DDI predictions. In this review, a comprehensive overview of AI-based DDI prediction, including the publicly available source for AI-DDIs studies, the methods used in data manipulation and feature preprocessing, the XAI mechanisms to promote trust of AI, especially for critical tasks as DDIs prediction, the modeling methods, is provided. Limitations and the future directions of XAI in DDIs are also discussed.
Collapse
Affiliation(s)
- Thanh Hoa Vo
- Master Program in Clinical Genomics and Proteomics, College of Pharmacy, Taipei Medical University, Taipei 110, Taiwan
| | - Ngan Thi Kim Nguyen
- School of Nutrition and Health Sciences, College of Nutrition, Taipei Medical University, Taipei 11031, Taiwan
| | - Quang Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| |
Collapse
|
11
|
Wang S, Zhang H, Liu Z, Liu Y. A Novel Deep Learning Method to Predict Lung Cancer Long-Term Survival With Biological Knowledge Incorporated Gene Expression Images and Clinical Data. Front Genet 2022; 13:800853. [PMID: 35368657 PMCID: PMC8964372 DOI: 10.3389/fgene.2022.800853] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 02/01/2022] [Indexed: 01/22/2023] Open
Abstract
Lung cancer is the leading cause of the cancer deaths. Therefore, predicting the survival status of lung cancer patients is of great value. However, the existing methods mainly depend on statistical machine learning (ML) algorithms. Moreover, they are not appropriate for high-dimensionality genomics data, and deep learning (DL), with strong high-dimensional data learning capability, can be used to predict lung cancer survival using genomics data. The Cancer Genome Atlas (TCGA) is a great database that contains many kinds of genomics data for 33 cancer types. With this enormous amount of data, researchers can analyze key factors related to cancer therapy. This paper proposes a novel method to predict lung cancer long-term survival using gene expression data from TCGA. Firstly, we select the most relevant genes to the target problem by the supervised feature selection method called mutual information selector. Secondly, we propose a method to convert gene expression data into two kinds of images with KEGG BRITE and KEGG Pathway data incorporated, so that we could make good use of the convolutional neural network (CNN) model to learn high-level features. Afterwards, we design a CNN-based DL model and added two kinds of clinical data to improve the performance, so that we finally got a multimodal DL model. The generalized experiments results indicated that our method performed much better than the ML models and unimodal DL models. Furthermore, we conduct survival analysis and observe that our model could better divide the samples into high-risk and low-risk groups.
Collapse
Affiliation(s)
- Shuo Wang
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Zhen Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.,Graduate School of Engineering, Nagasaki Institute of Applied Science, Nagasaki, Japan
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
12
|
Tang W, Wang J, Lin H, Zhao D, Xu B, Zhang Y, Yang Z. A syntactic information-based classification model for medical literature: algorithm development and validation study (Preprint). JMIR Med Inform 2022; 10:e37817. [PMID: 35917162 PMCID: PMC9382554 DOI: 10.2196/37817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/01/2022] [Accepted: 06/27/2022] [Indexed: 11/24/2022] Open
Abstract
Background The ever-increasing volume of medical literature necessitates the classification of medical literature. Medical relation extraction is a typical method of classifying a large volume of medical literature. With the development of arithmetic power, medical relation extraction models have evolved from rule-based models to neural network models. The single neural network model discards the shallow syntactic information while discarding the traditional rules. Therefore, we propose a syntactic information–based classification model that complements and equalizes syntactic information to enhance the model. Objective We aim to complete a syntactic information–based relation extraction model for more efficient medical literature classification. Methods We devised 2 methods for enhancing syntactic information in the model. First, we introduced shallow syntactic information into the convolutional neural network to enhance nonlocal syntactic interactions. Second, we devise a cross-domain pruning method to equalize local and nonlocal syntactic interactions. Results We experimented with 3 data sets related to the classification of medical literature. The F1 values were 65.5% and 91.5% on the BioCreative ViCPR (CPR) and Phenotype-Gene Relationship data sets, respectively, and the accuracy was 88.7% on the PubMed data set. Our model outperforms the current state-of-the-art baseline model in the experiments. Conclusions Our model based on syntactic information effectively enhances medical relation extraction. Furthermore, the results of the experiments show that shallow syntactic information helps obtain nonlocal interaction in sentences and effectively reinforces syntactic features. It also provides new ideas for future research directions.
Collapse
Affiliation(s)
- Wentai Tang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Di Zhao
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Bo Xu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Yijia Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
13
|
Conceição SIR, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules 2021; 11:biom11101430. [PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 12/15/2022] Open
Abstract
In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.
Collapse
|
14
|
Sousa D, Lamurias A, Couto FM. Using Neural Networks for Relation Extraction from Biomedical Literature. Methods Mol Biol 2021; 2190:289-305. [PMID: 32804372 DOI: 10.1007/978-1-0716-0826-5_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.
Collapse
Affiliation(s)
- Diana Sousa
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| | - Andre Lamurias
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| | - Francisco M Couto
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| |
Collapse
|
15
|
Sousa D, Lamurias A, Couto FM. A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing. Database (Oxford) 2020; 2020:baaa104. [PMID: 33258966 PMCID: PMC7706181 DOI: 10.1093/database/baaa104] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 09/02/2020] [Accepted: 11/12/2020] [Indexed: 12/14/2022]
Abstract
Biomedical relation extraction (RE) datasets are vital in the construction of knowledge bases and to potentiate the discovery of new interactions. There are several ways to create biomedical RE datasets, some more reliable than others, such as resorting to domain expert annotations. However, the emerging use of crowdsourcing platforms, such as Amazon Mechanical Turk (MTurk), can potentially reduce the cost of RE dataset construction, even if the same level of quality cannot be guaranteed. There is a lack of power of the researcher to control who, how and in what context workers engage in crowdsourcing platforms. Hence, allying distant supervision with crowdsourcing can be a more reliable alternative. The crowdsourcing workers would be asked only to rectify or discard already existing annotations, which would make the process less dependent on their ability to interpret complex biomedical sentences. In this work, we use a previously created distantly supervised human phenotype-gene relations (PGR) dataset to perform crowdsourcing validation. We divided the original dataset into two annotation tasks: Task 1, 70% of the dataset annotated by one worker, and Task 2, 30% of the dataset annotated by seven workers. Also, for Task 2, we added an extra rater on-site and a domain expert to further assess the crowdsourcing validation quality. Here, we describe a detailed pipeline for RE crowdsourcing validation, creating a new release of the PGR dataset with partial domain expert revision, and assess the quality of the MTurk platform. We applied the new dataset to two state-of-the-art deep learning systems (BiOnt and BioBERT) and compared its performance with the original PGR dataset, as well as combinations between the two, achieving a 0.3494 increase in average F-measure. The code supporting our work and the new release of the PGR dataset is available at https://github.com/lasigeBioTM/PGR-crowd.
Collapse
Affiliation(s)
- Diana Sousa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Andre Lamurias
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Francisco M Couto
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| |
Collapse
|
16
|
Ruas P, Lamurias A, Couto FM. Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature. J Cheminform 2020; 12:57. [PMID: 33430995 PMCID: PMC7507273 DOI: 10.1186/s13321-020-00461-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 09/12/2020] [Indexed: 12/03/2022] Open
Abstract
Background Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. Findings This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. Conclusions We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.
Collapse
Affiliation(s)
- Pedro Ruas
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisbon, Portugal.
| | - Andre Lamurias
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisbon, Portugal
| | - Francisco M Couto
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisbon, Portugal
| |
Collapse
|
17
|
Dhombres F, Charlet J. Design and Use of Semantic Resources: Findings from the Section on Knowledge Representation and Management of the 2020 International Medical Informatics Association Yearbook. Yearb Med Inform 2020; 29:163-168. [PMID: 32823311 PMCID: PMC7442529 DOI: 10.1055/s-0040-1702010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE To select, present, and summarize the best papers in the field of Knowledge Representation and Management (KRM) published in 2019. METHODS A comprehensive and standardized review of the biomedical informatics literature was performed to select the most interesting papers of KRM published in 2019, based on PubMed and ISI Web Of Knowledge queries. RESULTS Four best papers were selected among 1,189 publications retrieved, following the usual International Medical Informatics Association Yearbook reviewing process. In 2019, research areas covered by pre-selected papers were represented by the design of semantic resources (methods, visualization, curation) and the application of semantic representations for the integration/enrichment of biomedical data. Besides new ontologies and sound methodological guidance to rethink knowledge bases design, we observed large scale applications, promising results for phenotypes characterization, semantic-aware machine learning solutions for biomedical data analysis, and semantic provenance information representations for scientific reproducibility evaluation. CONCLUSION In the KRM selection for 2019, research on knowledge representation demonstrated significant contributions both in the design and in the application of semantic resources. Semantic representations serve a great variety of applications across many medical domains, with actionable results.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne Université, Université Paris Nord, INSERM, UMR_S 1142, LIMICS, Paris, France
- Médecine Sorbonne Université, Service de Médecine Fœtale, Hôpital Armand Trousseau, Paris, France
| | - Jean Charlet
- Sorbonne Université, Université Paris Nord, INSERM, UMR_S 1142, LIMICS, Paris, France
- AP-HP, DRCI, Paris, France
| | | |
Collapse
|
18
|
Wu H, Xing Y, Ge W, Liu X, Zou J, Zhou C, Liao J. Drug-drug interaction extraction via hybrid neural networks on biomedical literature. J Biomed Inform 2020; 106:103432. [PMID: 32335223 DOI: 10.1016/j.jbi.2020.103432] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 04/15/2020] [Accepted: 04/20/2020] [Indexed: 01/16/2023]
Abstract
Adverse events caused by drug-drug interaction (DDI) not only pose a serious threat to health, but also increase additional medical care expenditure. However, despite the emergence of many excellent text mining-based DDI classification methods, achieving a balance between using simpler method and better model performance is still unsatisfactory. In this article, we present a deep learning method of stacked bidirectional Gated Recurrent Unit (GRU)- convolutional neural network (SGRU-CNN) model which apply stacked bidirectional GRU (BiGRU) network and convolutional neural network (CNN) on lexical information and entity position information respectively to conduct DDIs extraction task. Furthermore, SGRU-CNN model assigns the weights of each word feature to improve performance with one attentive pooling layer. On the condition that other values are not inferior to other algorithms, experimental results on the DDI Extraction 2013 corpus show that our model achieves a 1.54% improvement in recall value. And the proposed SGRU-CNN model reaches great performance (F1-score: 0.75) with the fewest features, indicating an excellent balance between avoiding redundant preprocessing task and higher accuracy in relation extraction on biomedical literature using our method.
Collapse
Affiliation(s)
- Hong Wu
- School of science, China Pharmaceutical University, Nanjing, China
| | - Yan Xing
- School of science, China Pharmaceutical University, Nanjing, China
| | - Weihong Ge
- Department of Pharmacy, Nanjing Drum Tower Hospital, Nanjing, China; School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xiaoquan Liu
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Jianjun Zou
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China; Department of Clinical Pharmacology, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Changjiang Zhou
- School of science, China Pharmaceutical University, Nanjing, China
| | - Jun Liao
- School of science, China Pharmaceutical University, Nanjing, China.
| |
Collapse
|
19
|
BiOnt: Deep Learning Using Multiple Biomedical Ontologies for Relation Extraction. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7148040 DOI: 10.1007/978-3-030-45442-5_46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Successful biomedical relation extraction can provide evidence to researchers and clinicians about possible unknown associations between biomedical entities, advancing the current knowledge we have about those entities and their inherent mechanisms. Most biomedical relation extraction systems do not resort to external sources of knowledge, such as domain-specific ontologies. However, using deep learning methods, along with biomedical ontologies, has been recently shown to effectively advance the biomedical relation extraction field. To perform relation extraction, our deep learning system, BiOnt, employs four types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype Ontology, the Human Disease Ontology, and the Chemical Entities of Biological Interest, regarding gene-products, phenotypes, diseases, and chemical compounds, respectively. We tested our system with three data sets that represent three different types of relations of biomedical entities. BiOnt achieved, in F-score, an improvement of 4.93% points for drug-drug interactions (DDI corpus), 4.99% points for phenotype-gene relations (PGR corpus), and 2.21% points for chemical-induced disease relations (BC5CDR corpus), relatively to the state-of-the-art. The code supporting this system is available at https://github.com/lasigeBioTM/BiONT.
Collapse
|
20
|
Ali F, El-Sappagh S, Kwak D. Fuzzy Ontology and LSTM-Based Text Mining: A Transportation Network Monitoring System for Assisting Travel. SENSORS 2019; 19:s19020234. [PMID: 30634527 PMCID: PMC6358771 DOI: 10.3390/s19020234] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/31/2018] [Accepted: 01/07/2019] [Indexed: 12/31/2022]
Abstract
Intelligent Transportation Systems (ITSs) utilize a sensor network-based system to gather and interpret traffic information. In addition, mobility users utilize mobile applications to collect transport information for safe traveling. However, these types of information are not sufficient to examine all aspects of the transportation networks. Therefore, both ITSs and mobility users need a smart approach and social media data, which can help ITSs examine transport services, support traffic and control management, and help mobility users travel safely. People utilize social networks to share their thoughts and opinions regarding transportation, which are useful for ITSs and travelers. However, user-generated text on social media is short in length, unstructured, and covers a broad range of dynamic topics. The application of recent Machine Learning (ML) approach is inefficient for extracting relevant features from unstructured data, detecting word polarity of features, and classifying the sentiment of features correctly. In addition, ML classifiers consistently miss the semantic feature of the word meaning. A novel fuzzy ontology-based semantic knowledge with Word2vec model is proposed to improve the task of transportation features extraction and text classification using the Bi-directional Long Short-Term Memory (Bi-LSTM) approach. The proposed fuzzy ontology describes semantic knowledge about entities and features and their relation in the transportation domain. Fuzzy ontology and smart methodology are developed in Web Ontology Language and Java, respectively. By utilizing word embedding with fuzzy ontology as a representation of text, Bi-LSTM shows satisfactory improvement in both the extraction of features and the classification of the unstructured text of social media.
Collapse
Affiliation(s)
- Farman Ali
- Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea.
| | - Shaker El-Sappagh
- Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea.
- Department of Information Systems, Benha University, Banha 13518, Egypt.
| | - Daehan Kwak
- Department of Computer Science, Kean University, Union, NJ 07083, USA.
| |
Collapse
|