Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Martínez-deMiguel C, Segura-Bedmar I, Chacón-Solano E, Guerrero-Aspizua S. The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms. J Biomed Inform 2021;125:103961. [PMID: 34879250 DOI: 10.1016/j.jbi.2021.103961] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 11/08/2021] [Accepted: 11/22/2021] [Indexed: 11/26/2022]

For:	Martínez-deMiguel C, Segura-Bedmar I, Chacón-Solano E, Guerrero-Aspizua S. The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms. J Biomed Inform 2021;125:103961. [PMID: 34879250 DOI: 10.1016/j.jbi.2021.103961] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 11/08/2021] [Accepted: 11/22/2021] [Indexed: 11/26/2022]

Number

Cited by Other Article(s)

Hu J, Fu J, Zhao W, Lou P, Feng M, Ren H, Feng S, Li Y, Fang A. Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs. Health Informatics J 2024;30:14604582241291442. [PMID: 39379071 DOI: 10.1177/14604582241291442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]

Zelin C, Chung WK, Jeanne M, Zhang G, Weng C. Rare disease diagnosis using knowledge guided retrieval augmentation for ChatGPT. J Biomed Inform 2024;157:104702. [PMID: 39084480 PMCID: PMC11402564 DOI: 10.1016/j.jbi.2024.104702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/19/2024] [Accepted: 07/24/2024] [Indexed: 08/02/2024]

Shyr C, Hu Y, Bastarache L, Cheng A, Hamid R, Harris P, Xu H. Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024;8:438-461. [PMID: 38681753 PMCID: PMC11052982 DOI: 10.1007/s41666-023-00155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/24/2023] [Accepted: 11/13/2023] [Indexed: 05/01/2024]

Wang W, Zhao Z, Ning H. A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints. Sci Data 2024;11:482. [PMID: 38730023 PMCID: PMC11087536 DOI: 10.1038/s41597-024-03321-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open

Zhang J, Xu W, Lei C, Pu Y, Zhang Y, Zhang J, Yu H, Su X, Huang Y, Gong R, Zhang L, Shi Q. Using Clinician-Patient WeChat Group Communication Data to Identify Symptom Burdens in Patients With Uterine Fibroids Under Focused Ultrasound Ablation Surgery Treatment: Qualitative Study. JMIR Form Res 2023;7:e43995. [PMID: 37656501 PMCID: PMC10504630 DOI: 10.2196/43995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/26/2022] [Accepted: 07/24/2023] [Indexed: 09/02/2023] Open

Abstract

BACKGROUND

Unlike research project-based health data collection (questionnaires and interviews), social media platforms allow patients to freely discuss their health status and obtain peer support. Previous literature has pointed out that both public and private social platforms can serve as data sources for analysis.

OBJECTIVE

This study aimed to use natural language processing (NLP) techniques to identify concerns regarding the postoperative quality of life and symptom burdens in patients with uterine fibroids after focused ultrasound ablation surgery.

METHODS

Screenshots taken from clinician-patient WeChat groups were converted into free texts using image text recognition technology and used as the research object of this study. From 408 patients diagnosed with uterine fibroids in Chongqing Haifu Hospital between 2010 and 2020, we searched for symptom burdens in over 900,000 words of WeChat group chats. We first built a corpus of symptoms by manually coding 30% of the WeChat texts and then used regular expressions in Python to crawl symptom information from the remaining texts based on this corpus. We compared the results with a manual review (gold standard) of the same records. Finally, we analyzed the relationship between the population baseline data and conceptual symptoms; quantitative and qualitative results were examined.

RESULTS

A total of 408 patients with uterine fibroids were included in the study; 190,000 words of free text were obtained after data cleaning. The mean age of the patients was 39.94 (SD 6.81) years, and their mean BMI was 22.18 (SD 2.78) kg/m2. The median reporting times of the 7 major symptoms were 21, 26, 57, 2, 18, 30, and 49 days. Logistic regression models identified preoperative menstrual duration (odds ratio [OR] 1.14, 95% CI 5.86-6.37; P=.009), age of menophania (OR -1.02 , 95% CI 11.96-13.47; P=.03), and the number (OR 2.34, 95% CI 1.45-1.83; P=.04) and size of fibroids (OR 0.12, 95% CI 2.43-3.51; P=.04) as significant risk factors for postoperative symptoms.

CONCLUSIONS

Unstructured free texts from social media platforms extracted by NLP technology can be used for analysis. By extracting the conceptual information about patients' health-related quality of life, we can adopt personalized treatment for patients at different stages of recovery to improve their quality of life. Python-based text mining of free-text data can accurately extract symptom burden and save considerable time compared to manual review, maximizing the utility of the extant information in population-based electronic health records for comparative effectiveness research.

Collapse

Hens D, Wyers L, Claeys KG. Validation of an Artificial Intelligence driven framework to automatically detect red flag symptoms in screening for rare diseases in electronic health records: hereditary transthyretin amyloidosis polyneuropathy as a key example. J Peripher Nerv Syst 2023;28:79-85. [PMID: 36468607 DOI: 10.1111/jns.12523] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/19/2022] [Accepted: 11/28/2022] [Indexed: 12/07/2022]

Zhang J, Xu W, Lei C, Pu Y, Zhang Y, Zhang J, Yu H, Su X, Huang Y, Gong R, Zhang L, Shi Q. Using WeChat clinician-patient group communication data to identify symptom burdens in patients with uterine fibroids under focused ultrasound ablation surgery treatment :Qualitative Study (Preprint).. [DOI: 10.2196/preprints.43995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Abstract

BACKGROUND

Unlike research project-based health data collections(questionnaires, interviews), social media platforms, which allow patients to freely discuss their health status and obtain peer support.Previous literature has pointed out that both public and private social platforms can serve as data sources for analysing.

OBJECTIVE

This study aimed to use natural language processing (NLP) techniques to identify concerns regarding the postoperative quality of life and symptom burdens in uterine fibroids after focused ultrasound ablation surgery.

METHODS

Screenshots taken from the clinician-patient WeChat groups were converted into free texts using image text recognition technology and used as the research object of this study, which used regular expressions in Python to search for symptom burdens in over 900,000 words of WeChat group-chats associated with 408 patients in Chongqing Haifu Hospital diagnosed with uterine fibroids between 2010 and 2020. We first built a corpus of symptoms by manually coding 30% of the WeChat texts, and then used regular expressions to crawl symptom information from the remaining texts based on this corpus. We compared the results with a manual review (gold standard) of the same records. Then we analyzed the relationship between the population baseline data and conceptual symptoms, Quantitative and qualitative results were examined.

RESULTS

A total of 190,000 words of uterine fibroids patients' free text were finally obtained after data cleaning. A total of 408 patients were included in the study. The age of the patients was 39.94±6.81 years, and their BMI was 22.18±2.78 (kg/m^2). The median reporting times of the seven major symptoms were 21, 26, 57, 2, 18, 30, and 49 days. Results showed that patients with dysmenorrhea were younger(mean 38.26 (SD 7.05), P=.004) and slimmer (mean 22.37 (SD 3.81), P=.04), with lower fertility and parity (P<.05), and tended to stay longer in the hospital (P<.05). Logistic regression models identified preoperative menstrual duration (OR 1.14, 95% CI 5.86-6.37; P= .009), age of menophania (OR -1.02 ,95%CI 11.96-13.47,P=.03), and the number(OR 2.34,95% CI 1.45-1.83,P=.04) and size of fibroids(OR 0.12,95% CI 2.43-3.51,P=.04) as significant risk factors for postoperative symptoms.

CONCLUSIONS

Unstructured free texts from social media platforms extracted by NLP technology can be used for analysis, extracting the conceptual information about patients' HRQol,adopt personalized treatment for patients at different stages of recovery to improve the quality of life of patients. Python-based text mining of free-text data can accurately extract symptom burden administered and save considerable time compared to manual review, maximizing the utility of the extant information in population-based electronic health records for comparative effectiveness research.

CLINICALTRIAL

Collapse

Segura-Bedmar I, Camino-Perdones D, Guerrero-Aspizua S. Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts. BMC Bioinformatics 2022;23:263. [PMID: 35794528 PMCID: PMC9258216 DOI: 10.1186/s12859-022-04810-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022] Open

Yates T, Lain A, Campbell J, FitzPatrick DR, Simpson TI. Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders. Database (Oxford) 2022;2022:baac038. [PMID: 35670729 PMCID: PMC9216525 DOI: 10.1093/database/baac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/26/2022] [Accepted: 05/25/2022] [Indexed: 11/24/2022]

Abstract

There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.

Collapse