Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sung M, Jeong M, Choi Y, Kim D, Lee J, Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics 2022;38:4837-4839. [PMID: 36053172 PMCID: PMC9563680 DOI: 10.1093/bioinformatics/btac598] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 07/09/2022] [Accepted: 08/31/2022] [Indexed: 11/14/2022] Open

For:	Sung M, Jeong M, Choi Y, Kim D, Lee J, Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics 2022;38:4837-4839. [PMID: 36053172 PMCID: PMC9563680 DOI: 10.1093/bioinformatics/btac598] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 07/09/2022] [Accepted: 08/31/2022] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Yang Y, Lu Y, Zheng Z, Wu H, Lin Y, Qian F, Yan W. MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer. Comput Struct Biotechnol J 2024;23:1339-1347. [PMID: 38585647 PMCID: PMC10995799 DOI: 10.1016/j.csbj.2024.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/24/2024] [Indexed: 04/09/2024] Open

Abstract

Over the past decade, information for precision disease medicine has accumulated in the form of textual data. To effectively utilize this expanding medical text, we proposed a multi-task learning-based framework based on hard parameter sharing for knowledge graph construction (MKG), and then used it to automatically extract gastric cancer (GC)-related biomedical knowledge from the literature and identify GC drug candidates. In MKG, we designed three separate modules, MT-BGIPN, MT-SGTF and MT-ScBERT, for entity recognition, entity normalization, and relation classification, respectively. To address the challenges posed by the long and irregular naming of medical entities, the MT-BGIPN utilized bidirectional gated recurrent unit and interactive pointer network techniques, significantly improving entity recognition accuracy to an average F1 value of 84.5% across datasets. In MT-SGTF, we employed the term frequency-inverse document frequency and the gated attention unit. These combine both semantic and characteristic features of entities, resulting in an average Hits@ 1 score of 94.5% across five datasets. The MT-ScBERT integrated cross-text, entity, and context features, yielding an average F1 value of 86.9% across 11 relation classification datasets. Based on the MKG, we then developed a specific knowledge graph for GC (MKG-GC), which encompasses a total of 9129 entities and 88,482 triplets. Lastly, the MKG-GC was used to predict potential GC drugs using a pre-trained language model called BioKGE-BERT and a drug-disease discriminant model based on CNN-BiLSTM. Remarkably, nine out of the top ten predicted drugs have been previously reported as effective for gastric cancer treatment. Finally, an online platform was created for exploration and visualization of MKG-GC at https://www.yanglab-mi.org.cn/MKG-GC/.

Collapse

Jonker RAA, Almeida T, Antunes R, Almeida JR, Matos S. Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes. Database (Oxford) 2024;2024:baae068. [PMID: 39083461 PMCID: PMC11290360 DOI: 10.1093/database/baae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/15/2024] [Accepted: 07/08/2024] [Indexed: 08/02/2024]

Abstract

The identification of medical concepts from clinical narratives has a large interest in the biomedical scientific community due to its importance in treatment improvements or drug development research. Biomedical named entity recognition (NER) in clinical texts is crucial for automated information extraction, facilitating patient record analysis, drug development, and medical research. Traditional approaches often focus on single-class NER tasks, yet recent advancements emphasize the necessity of addressing multi-class scenarios, particularly in complex biomedical domains. This paper proposes a strategy to integrate a multi-head conditional random field (CRF) classifier for multi-class NER in Spanish clinical documents. Our methodology overcomes overlapping entity instances of different types, a common challenge in traditional NER methodologies, by using a multi-head CRF model. This architecture enhances computational efficiency and ensures scalability for multi-class NER tasks, maintaining high performance. By combining four diverse datasets, SympTEMIST, MedProcNER, DisTEMIST, and PharmaCoNER, we expand the scope of NER to encompass five classes: symptoms, procedures, diseases, chemicals, and proteins. To the best of our knowledge, these datasets combined create the largest Spanish multi-class dataset focusing on biomedical entity recognition and linking for clinical notes, which is important to train a biomedical model in Spanish. We also provide entity linking to the multi-lingual Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) vocabulary, with the eventual goal of performing biomedical relation extraction. Through experimentation and evaluation of Spanish clinical documents, our strategy provides competitive results against single-class NER models. For NER, our system achieves a combined micro-averaged F1-score of 78.73, with clinical mentions normalized to SNOMED CT with an end-to-end F1-score of 54.51. The code to run our system is publicly available at https://github.com/ieeta-pt/Multi-Head-CRF. Database URL: https://github.com/ieeta-pt/Multi-Head-CRF.

Collapse

Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Fröhlich H. Transformer models in biomedicine. BMC Med Inform Decis Mak 2024;24:214. [PMID: 39075407 PMCID: PMC11287876 DOI: 10.1186/s12911-024-02600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open

Almeida T, Jonker RAA, Antunes R, Almeida JR, Matos S. Towards discovery: an end-to-end system for uncovering novel biomedical relations. Database (Oxford) 2024;2024:baae057. [PMID: 38994795 PMCID: PMC11240158 DOI: 10.1093/database/baae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/20/2024] [Accepted: 06/19/2024] [Indexed: 07/13/2024]

Nédellec C, Sauvion C, Bossy R, Borovikova M, Deléger L. TaeC: A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature. PLoS One 2024;19:e0305475. [PMID: 38870159 PMCID: PMC11175518 DOI: 10.1371/journal.pone.0305475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open

Abstract

Wheat varieties show a large diversity of traits and phenotypes. Linking them to genetic variability is essential for shorter and more efficient wheat breeding programs. A growing number of plant molecular information networks provide interlinked interoperable data to support the discovery of gene-phenotype interactions. A large body of scientific literature and observational data obtained in-field and under controlled conditions document wheat breeding experiments. The cross-referencing of this complementary information is essential. Text from databases and scientific publications has been identified early on as a relevant source of information. However, the wide variety of terms used to refer to traits and phenotype values makes it difficult to find and cross-reference the textual information, e.g. simple dictionary lookup methods miss relevant terms. Corpora with manually annotated examples are thus needed to evaluate and train textual information extraction methods. While several corpora contain annotations of human and animal phenotypes, no corpus is available for plant traits. This hinders the evaluation of text mining-based crop knowledge graphs (e.g. AgroLD, KnetMiner, WheatIS-FAIDARE) and limits the ability to train machine learning methods and improve the quality of information. The Triticum aestivum trait Corpus is a new gold standard for traits and phenotypes of wheat. It consists of 528 PubMed references that are fully annotated by trait, phenotype, and species. We address the interoperability challenge of crossing sparse assay data and publications by using the Wheat Trait and Phenotype Ontology to normalize trait mentions and the species taxonomy of the National Center for Biotechnology Information to normalize species. The paper describes the construction of the corpus. A study of the performance of state-of-the-art language models for both named entity recognition and linking tasks trained on the corpus shows that it is suitable for training and evaluation. This corpus is currently the most comprehensive manually annotated corpus for natural language processing studies on crop phenotype information from the literature.

Collapse

Wang M, Vijayaraghavan A, Beck T, Posma JM. Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition. J Proteome Res 2024;23:1915-1925. [PMID: 38733346 PMCID: PMC11165580 DOI: 10.1021/acs.jproteome.3c00367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 01/30/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]

Huang DL, Zeng Q, Xiong Y, Liu S, Pang C, Xia M, Fang T, Ma Y, Qiang C, Zhang Y, Zhang Y, Li H, Yuan Y. A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature. Interdiscip Sci 2024;16:333-344. [PMID: 38340264 PMCID: PMC11289304 DOI: 10.1007/s12539-024-00605-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/12/2024]

Di Maria A, Bellomo L, Billeci F, Cardillo A, Alaimo S, Ferragina P, Ferro A, Pulvirenti A. NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. BIOINFORMATICS (OXFORD, ENGLAND) 2024;40:btae194. [PMID: 38597890 DOI: 10.1093/bioinformatics/btae194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024]

Valero-Rello A, Baeza-Delgado C, Andreu-Moreno I, Sanjuán R. Cellular receptors for mammalian viruses. PLoS Pathog 2024;20:e1012021. [PMID: 38377111 PMCID: PMC10906839 DOI: 10.1371/journal.ppat.1012021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 03/01/2024] [Accepted: 02/02/2024] [Indexed: 02/22/2024] Open

Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. PATTERNS (NEW YORK, N.Y.) 2024;5:100887. [PMID: 38264716 PMCID: PMC10801236 DOI: 10.1016/j.patter.2023.100887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 01/25/2024]

Skoufos G, Kakoulidis P, Tastsoglou S, Zacharopoulou E, Kotsira V, Miliotis M, Mavromati G, Grigoriadis D, Zioga M, Velli A, Koutou I, Karagkouni D, Stavropoulos S, Kardaras F, Lifousi A, Vavalou E, Ovsepian A, Skoulakis A, Tasoulis S, Georgakopoulos S, Plagianakos V, Hatzigeorgiou A. TarBase-v9.0 extends experimentally supported miRNA-gene interactions to cell-types and virally encoded miRNAs. Nucleic Acids Res 2024;52:D304-D310. [PMID: 37986224 PMCID: PMC10767993 DOI: 10.1093/nar/gkad1071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/18/2023] [Accepted: 11/02/2023] [Indexed: 11/22/2023] Open

Affiliation(s)

Giorgos Skoufos DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Panos Kakoulidis Dept. of Informatics and Telecommunications, National and Kapodistrian Univ. of Athens, Athens, Greece Biomedical Research Foundation of the Academy of Athens, 11527Athens, Greece
Spyros Tastsoglou DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Elissavet Zacharopoulou DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Vasiliki Kotsira DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Marios Miliotis DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Galatea Mavromati DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Dimitris Grigoriadis DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Maria Zioga DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Angeliki Velli DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Ioanna Koutou DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Dimitra Karagkouni DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Steve Stavropoulos Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Filippos S Kardaras DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Anna Lifousi Technical University of Denmark – Department of Health Technology, Copenhagen, Denmark
Eustathia Vavalou Department of Biology, National and Kapodistrian University of Athens, 15784Athens, Greece
Armen Ovsepian DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Anargyros Skoulakis DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece
Sotiris K Tasoulis Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Spiros V Georgakopoulos Department of Mathematics, University of Thessaly, Greece
Vassilis P Plagianakos Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Artemis G Hatzigeorgiou DIANA-Lab, Dept. of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece Hellenic Pasteur Institute, Athens11521, Greece

Collapse

Cui C, Zhong B, Fan R, Cui Q. HMDD v4.0: a database for experimentally supported human microRNA-disease associations. Nucleic Acids Res 2024;52:D1327-D1332. [PMID: 37650649 PMCID: PMC10767894 DOI: 10.1093/nar/gkad717] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/07/2023] [Accepted: 08/19/2023] [Indexed: 09/01/2023] Open

Collins C, Baker S, Brown J, Zheng H, Chan A, Stenius U, Narita M, Korhonen A. Text mining for contexts and relationships in cancer genomics literature. Bioinformatics 2024;40:btae021. [PMID: 38258418 PMCID: PMC10822582 DOI: 10.1093/bioinformatics/btae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/27/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open

Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. ARXIV 2023:arXiv:2308.06294v2. [PMID: 37986722 PMCID: PMC10659449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]

Garda S, Weber-Genzel L, Martin R, Leser U. BELB: a biomedical entity linking benchmark. Bioinformatics 2023;39:btad698. [PMID: 37975879 PMCID: PMC10681865 DOI: 10.1093/bioinformatics/btad698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/30/2023] [Accepted: 11/16/2023] [Indexed: 11/19/2023] Open

Wei CH, Luo L, Islamaj R, Lai PT, Lu Z. GNorm2: an improved gene name recognition and normalization system. Bioinformatics 2023;39:btad599. [PMID: 37878810 PMCID: PMC10612401 DOI: 10.1093/bioinformatics/btad599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 09/06/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

Domingo-Fernández D, Gadiya Y, Mubeen S, Bollerman TJ, Healy MD, Chanana S, Sadovsky RG, Healey D, Colluru V. Modern drug discovery using ethnobotany: A large-scale cross-cultural analysis of traditional medicine reveals common therapeutic uses. iScience 2023;26:107729. [PMID: 37701812 PMCID: PMC10494464 DOI: 10.1016/j.isci.2023.107729] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/08/2023] [Accepted: 08/22/2023] [Indexed: 09/14/2023] Open

Neves M, Klippert A, Knöspel F, Rudeck J, Stolz A, Ban Z, Becker M, Diederich K, Grune B, Kahnau P, Ohnesorge N, Pucher J, Schönfelder G, Bert B, Butzke D. Automatic classification of experimental models in biomedical literature to support searching for alternative methods to animal experiments. J Biomed Semantics 2023;14:13. [PMID: 37658458 PMCID: PMC10472567 DOI: 10.1186/s13326-023-00292-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 07/29/2023] [Indexed: 09/03/2023] Open

Abstract

Current animal protection laws require replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of experimental biomedical publications. The identification of potentially relevant methods, e.g. organ or cell culture models, or computer simulations, can be supported with text mining tools specifically built for this purpose. Such tools are trained (or fine tuned) on relevant data sets labeled by human experts. We developed the GoldHamster corpus, composed of 1,600 PubMed (Medline) articles (titles and abstracts), in which we manually identified the used experimental model according to a set of eight labels, namely: "in vivo", "organs", "primary cells", "immortal cell lines", "invertebrates", "humans", "in silico" and "other" (models). We recruited 13 annotators with expertise in the biomedical domain and assigned each article to two individuals. Four additional rounds of annotation aimed at improving the quality of the annotations with disagreements in the first round. Furthermore, we conducted various machine learning experiments based on supervised learning to evaluate the corpus for our classification task. We obtained more than 7,000 document-level annotations for the above labels. After the first round of annotation, the inter-annotator agreement (kappa coefficient) varied among labels, and ranged from 0.42 (for "others") to 0.82 (for "invertebrates"), with an overall score of 0.62. All disagreements were resolved in the subsequent rounds of annotation. The best-performing machine learning experiment used the PubMedBERT pre-trained model with fine-tuning to our corpus, which gained an overall f-score of 0.83. We obtained a corpus with high agreement for all labels, and our evaluation demonstrated that our corpus is suitable for training reliable predictive models for automatic classification of biomedical literature according to the used experimental models. Our SMAFIRA - "Smart feature-based interactive" - search tool ( https://smafira.bf3r.de ) will employ this classifier for supporting the retrieval of alternative methods to animal experiments. The corpus is available for download ( https://doi.org/10.5281/zenodo.7152295 ), as well as the source code ( https://github.com/mariananeves/goldhamster ) and the model ( https://huggingface.co/SMAFIRA/goldhamster ).

Collapse

Affiliation(s)

Mariana Neves German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany.
Antonina Klippert German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany Current affiliation: Nuvisan ICB GmbH, Müllerstraße 178, 13353, Berlin, Germany
Fanny Knöspel German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Juliane Rudeck German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Ailine Stolz German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Zsofia Ban German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Markus Becker German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Kai Diederich German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Barbara Grune German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Pia Kahnau German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Nils Ohnesorge German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Johannes Pucher German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Gilbert Schönfelder German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany Institute of Clinical Pharmacology and Toxicology, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
Bettina Bert German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
Daniel Butzke German Centre for the Protection of Laboratory Animals (Bf3R), German Federal Institute for Risk Assessment (BfR), Berlin, Germany

Collapse

Jeong M, Kang J. Consistency enhancement of model prediction on document-level named entity recognition. Bioinformatics 2023;39:btad361. [PMID: 37261870 PMCID: PMC10272703 DOI: 10.1093/bioinformatics/btad361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/17/2023] [Accepted: 05/31/2023] [Indexed: 06/02/2023] Open

Sun Z, Tao C. Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:558-564. [PMID: 38283164 PMCID: PMC10815931 DOI: 10.1109/ichi57859.2023.00100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Abstract

Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.

Collapse

Li M, Yang H, Liu Y. Biomedical named entity recognition based on fusion multi-features embedding. Technol Health Care 2023;31:111-121. [PMID: 37038786 DOI: 10.3233/thc-] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]

Li M, Yang H, Liu Y. Biomedical named entity recognition based on fusion multi-features embedding. Technol Health Care 2023;31:111-121. [PMID: 37038786 DOI: 10.3233/thc-236011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Unifying the identification of biomedical entities with the Bioregistry. Sci Data 2022;9:714. [DOI: 10.1038/s41597-022-01807-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 10/26/2022] [Indexed: 11/21/2022] Open

Kim H, Sung M, Yoon W, Park S, Kang J. Full-text chemical identification with improved generalizability and tagging consistency. Database (Oxford) 2022;2022:6726385. [PMID: 36170114 PMCID: PMC9518746 DOI: 10.1093/database/baac074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/11/2022] [Accepted: 08/22/2022] [Indexed: 11/14/2022]