Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Cook HV, Jensen LJ. A Guide to Dictionary-Based Text Mining. Methods Mol Biol 2019;1939:73-89. [PMID: 30848457 DOI: 10.1007/978-1-4939-9089-4_5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Number

Cited by Other Article(s)

Wu J, Dong H, Li Z, Wang H, Li R, Patra A, Dai C, Ali W, Scordis P, Wu H. A hybrid framework with large language models for rare disease phenotyping. BMC Med Inform Decis Mak 2024;24:289. [PMID: 39375687 PMCID: PMC11460004 DOI: 10.1186/s12911-024-02698-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 09/26/2024] [Indexed: 10/09/2024] Open

Abstract

PURPOSE

Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports.

METHODS

We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary. SemEHR, a dictionary-based NLP tool, is employed to extract rare disease mentions from clinical notes. To refine the results and improve accuracy, we leverage various LLMs, including LLaMA3, Phi3-mini, and domain-specific models like OpenBioLLM and BioMistral. Different prompting strategies, such as zero-shot, few-shot, and knowledge-augmented generation, are explored to optimize the LLMs' performance.

RESULTS

The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. LLaMA3 and Phi3-mini achieve the highest F1 scores in rare disease identification. Few-shot prompting with 1-3 examples yields the best results, while knowledge-augmented generation shows limited improvement. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients.

CONCLUSION

The hybrid approach combining dictionary-based NLP tools with LLMs shows great promise for improving rare disease identification from unstructured clinical reports. By leveraging the strengths of both techniques, the method demonstrates superior performance and the potential to uncover hidden rare disease cases. Further research is needed to address limitations related to ontology mapping and overlapping case identification, and to integrate the approach into clinical practice for early diagnosis and improved patient outcomes.

Collapse

Barakat A, Munro G, Heegaard AM. Finding new analgesics: Computational pharmacology faces drug discovery challenges. Biochem Pharmacol 2024;222:116091. [PMID: 38412924 DOI: 10.1016/j.bcp.2024.116091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/10/2024] [Accepted: 02/23/2024] [Indexed: 02/29/2024]

Raveau MP, Goñi JI, Rodríguez JF, Paiva-Mack I, Barriga F, Hermosilla MP, Fuentes-Bravo C, Eyheramendy S. Natural language processing analysis of the psychosocial stressors of mental health disorders during the pandemic. NPJ MENTAL HEALTH RESEARCH 2023;2:17. [PMID: 38609516 PMCID: PMC10955824 DOI: 10.1038/s44184-023-00039-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 09/21/2023] [Indexed: 04/14/2024]

Feng Z, Shen Z, Li H, Li S. e-TSN: an interactive visual exploration platform for target-disease knowledge mapping from literature. Brief Bioinform 2022;23:bbac465. [PMID: 36347537 PMCID: PMC9677481 DOI: 10.1093/bib/bbac465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 11/10/2022] Open

Singh G, Papoutsoglou EA, Keijts-Lalleman F, Vencheva B, Rice M, Visser RG, Bachem CW, Finkers R. Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait. BMC PLANT BIOLOGY 2021;21:198. [PMID: 33894758 PMCID: PMC8070292 DOI: 10.1186/s12870-021-02943-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]

Abstract

BACKGROUND

Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes.

RESULTS

We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature.

CONCLUSIONS

Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.

Collapse

Ji X, Tan W, Zhang C, Zhai Y, Hsueh Y, Zhang Z, Zhang C, Lu Y, Duan B, Tan G, Na R, Deng G, Niu G. TWIRLS, a knowledge-mining technology, suggests a possible mechanism for the pathological changes in the human host after coronavirus infection via ACE2. Drug Dev Res 2020;81:1004-1018. [PMID: 32657473 PMCID: PMC7404951 DOI: 10.1002/ddr.21717] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/05/2020] [Accepted: 06/27/2020] [Indexed: 12/12/2022]

Abstract

Faced with the current large-scale public health emergency, collecting, sorting, and analyzing biomedical information related to the "SARS-CoV-2" should be done as quickly as possible to gain a global perspective, which is a basic requirement for strengthening epidemic control capacity. However, for human researchers studying viruses and hosts, the vast amount of information available cannot be processed effectively and in a timely manner, particularly if our scientific understanding is also limited, which further lowers the information processing efficiency. We present TWIRLS (Topic-wise inference engine of massive biomedical literatures), a method that can deal with various scientific problems, such as liver cancer, acute myeloid leukemia, and so forth, which can automatically acquire, organize, and classify information. Additionally, this information can be combined with independent functional data sources to build an inference system via a machine-based approach, which can provide relevant knowledge to help human researchers quickly establish subject cognition and to make more effective decisions. Using TWIRLS, we automatically analyzed more than three million words in more than 14,000 literature articles in only 4 hr. We found that an important regulatory factor angiotensin-converting enzyme 2 (ACE2) may be involved in host pathological changes on binding to the coronavirus after infection. On triggering functional changes in ACE2/AT2R, the cytokine homeostasis regulation axis becomes imbalanced via the Renin-Angiotensin System and IP-10, leading to a cytokine storm. Through a preliminary analysis of blood indices of COVID-19 patients with a history of hypertension, we found that non-ARB (Angiotensin II receptor blockers) users had more symptoms of severe illness than ARB users. This suggests ARBs could potentially be used to treat acute lung injury caused by coronavirus infection.

Collapse

Affiliation(s)

Xiaoyang Ji Key Laboratory of Animal Genetics, Breeding and Reproduction of Inner Mongolia Autonomous Region College of Animal ScienceInner Mongolia Agricultural UniversityHohhotChina Joint Turing‐Darwin Laboratory of Phil Rivers Technology Ltd. and Institute of Computing TechnologyChinese Academy of SciencesBeijingChina Department of Computational Biology, Phil Rivers Technology LtdBeijingChina
Wenting Tan Department of Infectious DiseasesSouthwest Hospital, Third Military Medical University (Army Medical University)ChongqingChina
Chunming Zhang Joint Turing‐Darwin Laboratory of Phil Rivers Technology Ltd. and Institute of Computing TechnologyChinese Academy of SciencesBeijingChina Department of Computational Biology, Phil Rivers Technology LtdBeijingChina Institute of Computing TechnologyChinese Academy of SciencesBeijingChina West Institute of Computing TechnologyChinese Academy of SciencesChongqingChina
Yubo Zhai Institute of Computing TechnologyChinese Academy of SciencesBeijingChina University of Chinese Academy of SciencesBeijingChina
Yiching Hsueh Joint Turing‐Darwin Laboratory of Phil Rivers Technology Ltd. and Institute of Computing TechnologyChinese Academy of SciencesBeijingChina Department of Computational Biology, Phil Rivers Technology LtdBeijingChina
Zhonghai Zhang Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
Chunli Zhang Department of Computational Biology, Phil Rivers Technology LtdBeijingChina
Yanqiu Lu Department of Infectious DiseasesChongqing Public Health Medical CenterChongqingChina
Bo Duan Institute of Computing TechnologyChinese Academy of SciencesBeijingChina West Institute of Computing TechnologyChinese Academy of SciencesChongqingChina
Guangming Tan Institute of Computing TechnologyChinese Academy of SciencesBeijingChina West Institute of Computing TechnologyChinese Academy of SciencesChongqingChina
Renhua Na Key Laboratory of Animal Genetics, Breeding and Reproduction of Inner Mongolia Autonomous Region College of Animal ScienceInner Mongolia Agricultural UniversityHohhotChina
Guohong Deng Department of Infectious DiseasesSouthwest Hospital, Third Military Medical University (Army Medical University)ChongqingChina
Gang Niu Joint Turing‐Darwin Laboratory of Phil Rivers Technology Ltd. and Institute of Computing TechnologyChinese Academy of SciencesBeijingChina Department of Computational Biology, Phil Rivers Technology LtdBeijingChina West Institute of Computing TechnologyChinese Academy of SciencesChongqingChina

Collapse

Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining. JOURNAL OF HEALTHCARE ENGINEERING 2020;2020:8829219. [PMID: 33299537 PMCID: PMC7707942 DOI: 10.1155/2020/8829219] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/26/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022]

Levin JM, Oprea TI, Davidovich S, Clozel T, Overington JP, Vanhaelen Q, Cantor CR, Bischof E, Zhavoronkov A. Artificial intelligence, drug repurposing and peer review. Nat Biotechnol 2020;38:1127-1131. [DOI: 10.1038/s41587-020-0686-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]