1
|
Madrid-García A, Freites-Núñez D, Merino-Barbancho B, Pérez Sancristobal I, Rodríguez-Rodríguez L. Mapping two decades of research in rheumatology-specific journals: a topic modeling analysis with BERTopic. Ther Adv Musculoskelet Dis 2024; 16:1759720X241308037. [PMID: 39734395 PMCID: PMC11672599 DOI: 10.1177/1759720x241308037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 12/03/2024] [Indexed: 12/31/2024] Open
Abstract
Background Rheumatology has experienced notable changes in the last decades. New drugs, including biologic agents and Janus kinase (JAK) inhibitors, have blossomed. Concepts such as window of opportunity, arthralgia suspicious for progression, or difficult-to-treat rheumatoid arthritis (RA) have appeared; and new management approaches and strategies such as treat-to-target have become popular. Statistical learning methods, gene therapy, telemedicine, or precision medicine are other advancements that have gained relevance in the field. To better characterize the research landscape and advances in rheumatology, automatic and efficient approaches based on natural language processing (NLP) should be used. Objectives The objective of this study is to use topic modeling (TM) techniques to uncover key topics and trends in rheumatology research conducted in the last 23 years. Design Retrospective study. Methods This study analyzed 96,004 abstracts published between 2000 and December 31, 2023, drawn from 34 specialized rheumatology journals obtained from PubMed. BERTopic, a novel TM approach that considers semantic relationships among words and their context, was used to uncover topics. Up to 30 different models were trained. Based on the number of topics, outliers, and topic coherence score, two of them were finally selected, and the topics were manually labeled by two rheumatologists. Word clouds and hierarchical clustering visualizations were computed. Finally, hot and cold trends were identified using linear regression models. Results Abstracts were classified into 45 and 47 topics. The most frequent topics were RA, systemic lupus erythematosus, and osteoarthritis. Expected topics such as COVID-19 or JAK inhibitors were identified after conducting dynamic TM. Topics such as spinal surgery or bone fractures have gained relevance in recent years; however, antiphospholipid syndrome or septic arthritis have lost momentum. Conclusion Our study utilized advanced NLP techniques to analyze the rheumatology research landscape and identify key themes and emerging trends. The results highlight the dynamic and varied nature of rheumatology research, illustrating how interest in certain topics has shifted over time.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Prof. Martin Lagos s/n, Madrid 28040, Spain
| | - Dalifer Freites-Núñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Madrid, Spain
| | - Beatriz Merino-Barbancho
- Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
| | - Inés Pérez Sancristobal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Madrid, Spain
| |
Collapse
|
2
|
Birkun AA. Title-based automated topic modelling of emergency medicine research outputs. Am J Emerg Med 2024:S0735-6757(24)00627-2. [PMID: 39592356 DOI: 10.1016/j.ajem.2024.11.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 11/17/2024] [Indexed: 11/28/2024] Open
Affiliation(s)
- Alexei A Birkun
- Department of General Surgery, Anaesthesiology, Resuscitation and Emergency Medicine, Medical Institute named after S.I. Georgievsky of V.I. Vernadsky Crimean Federal University, Simferopol, Russian Federation.
| |
Collapse
|
3
|
Mapundu MT, Kabudula CW, Musenge E, Olago V, Celik T. Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing. PLoS One 2024; 19:e0308452. [PMID: 39298425 DOI: 10.1371/journal.pone.0308452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 07/24/2024] [Indexed: 09/21/2024] Open
Abstract
Verbal autopsy (VA) narratives play a crucial role in understanding and documenting the causes of mortality, especially in regions lacking robust medical infrastructure. In this study, we propose a comprehensive approach to extract mortality causes and identify prevalent diseases from VA narratives utilizing advanced text mining techniques, so as to better understand the underlying health issues leading to mortality. Our methodology integrates n-gram-based language processing, Latent Dirichlet Allocation (LDA), and BERTopic, offering a multi-faceted analysis to enhance the accuracy and depth of information extraction. This is a retrospective study that uses secondary data analysis. We used data from the Agincourt Health and Demographic Surveillance Site (HDSS), which had 16338 observations collected between 1993 and 2015. Our text mining steps entailed data acquisition, pre-processing, feature extraction, topic segmentation, and discovered knowledge. The results suggest that the HDSS population may have died from mortality causes such as vomiting, chest/stomach pain, fever, coughing, loss of weight, low energy, headache. Additionally, we discovered that the most prevalent diseases entailed human immunodeficiency virus (HIV), tuberculosis (TB), diarrhoea, cancer, neurological disorders, malaria, diabetes, high blood pressure, chronic ailments (kidney, heart, lung, liver), maternal and accident related deaths. This study is relevant in that it avails valuable insights regarding mortality causes and most prevalent diseases using novel text mining approaches. These results can be integrated in the diagnosis pipeline for ease of human annotation and interpretation. As such, this will help with effective informed intervention programmes that can improve primary health care systems and chronic based delivery, thus increasing life expectancy.
Collapse
Affiliation(s)
- Michael Tonderai Mapundu
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Chodziwadziwa Whiteson Kabudula
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), Johannesburg, South Africa
| | - Eustasius Musenge
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Victor Olago
- National Health Laboratory Service (NHLS), National Cancer Registry, Johannesburg, South Africa
| | - Turgay Celik
- Wits Institute of Data Science, University of The Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of The Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
4
|
Ozkara BB, Karabacak M, Margetis K, Smith W, Wintermark M, Yedavalli VS. Trends in stroke-related journals: Examination of publication patterns using topic modeling. J Stroke Cerebrovasc Dis 2024; 33:107665. [PMID: 38412931 DOI: 10.1016/j.jstrokecerebrovasdis.2024.107665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 01/15/2024] [Accepted: 02/24/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVES This study aims to demonstrate the capacity of natural language processing and topic modeling to manage and interpret the vast quantities of scholarly publications in the landscape of stroke research. These tools can expedite the literature review process, reveal hidden themes, and track rising research areas. MATERIALS AND METHODS Our study involved reviewing and analyzing articles published in five prestigious stroke journals, namely Stroke, International Journal of Stroke, European Stroke Journal, Translational Stroke Research, and Journal of Stroke and Cerebrovascular Diseases. The team extracted document titles, abstracts, publication years, and citation counts from the Scopus database. BERTopic was chosen as the topic modeling technique. Using linear regression models, current stroke research trends were identified. Python 3.1 was used to analyze and visualize data. RESULTS Out of the 35,779 documents collected, 26,732 were classified into 30 categories and used for analysis. "Animal Models," "Rehabilitation," and "Reperfusion Therapy" were identified as the three most prevalent topics. Linear regression models identified "Emboli," "Medullary and Cerebellar Infarcts," and "Glucose Metabolism" as trending topics, whereas "Cerebral Venous Thrombosis," "Statins," and "Intracerebral Hemorrhage" demonstrated a weaker trend. CONCLUSIONS The methodology can assist researchers, funders, and publishers by documenting the evolution and specialization of topics. The findings illustrate the significance of animal models, the expansion of rehabilitation research, and the centrality of reperfusion therapy. Limitations include a five-journal cap and a reliance on high-quality metadata.
Collapse
Affiliation(s)
- Burak Berksu Ozkara
- Department of Neuroradiology, MD Anderson Cancer Center, 1400 Pressler Street, Houston, bX, 77030, USA
| | - Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Avenue, New York, NY, 10029, USA
| | - Konstantinos Margetis
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Avenue, New York, NY, 10029, USA
| | - Wade Smith
- Department of Neurology, University of California San Francisco, 505 Parnassus Avenue, San Francisco, CA, 94143, USA
| | - Max Wintermark
- Department of Neuroradiology, MD Anderson Cancer Center, 1400 Pressler Street, Houston, bX, 77030, USA
| | - Vivek Srikar Yedavalli
- Department of Radiology and Radiological Sciences, Johns Hopkins School of Medicine, 600 N Wolfe Street, Baltimore, MD, 21287, USA.
| |
Collapse
|
5
|
Karabacak M, Jain A, Jagtiani P, Hickman ZL, Dams-O'Connor K, Margetis K. Exploiting Natural Language Processing to Unveil Topics and Trends of Traumatic Brain Injury Research. Neurotrauma Rep 2024; 5:203-214. [PMID: 38463422 PMCID: PMC10924051 DOI: 10.1089/neur.2023.0102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2024] Open
Abstract
Traumatic brain injury (TBI) has evolved from a topic of relative obscurity to one of widespread scientific and lay interest. The scope and focus of TBI research have shifted, and research trends have changed in response to public and scientific interest. This study has two primary goals: first, to identify the predominant themes in TBI research; and second, to delineate "hot" and "cold" areas of interest by evaluating the current popularity or decline of these topics. Hot topics may be dwarfed in absolute numbers by other, larger TBI research areas but are rapidly gaining interest. Likewise, cold topics may present opportunities for researchers to revisit unanswered questions. We utilized BERTopic, an advanced natural language processing (NLP)-based technique, to analyze TBI research articles published since 1990. This approach facilitated the identification of key topics by extracting sets of distinctive keywords representative of each article's core themes. Using these topics' probabilities, we trained linear regression models to detect trends over time, recognizing topics that were gaining (hot) or losing (cold) relevance. Additionally, we conducted a specific analysis focusing on the trends observed in TBI research in the current decade (the 2020s). Our topic modeling analysis categorized 42,422 articles into 27 distinct topics. The 10 most frequently occurring topics were: "Rehabilitation," "Molecular Mechanisms of TBI," "Concussion," "Repetitive Head Impacts," "Surgical Interventions," "Biomarkers," "Intracranial Pressure," "Posttraumatic Neurodegeneration," "Chronic Traumatic Encephalopathy," and "Blast Induced TBI," while our trend analysis indicated that the hottest topics of the current decade were "Genomics," "Sex Hormones," and "Diffusion Tensor Imaging," while the cooling topics were "Posttraumatic Sleep," "Sensory Functions," and "Hyperosmolar Therapies." This study highlights the dynamic nature of TBI research and underscores the shifting emphasis within the field. The findings from our analysis can aid in the identification of emerging topics of interest and areas where there is little new research reported. By utilizing NLP to effectively synthesize and analyze an extensive collection of TBI-related scholarly literature, we demonstrate the potential of machine learning techniques in understanding and guiding future research prospects. This approach sets the stage for similar analyses in other medical disciplines, offering profound insights and opportunities for further exploration.
Collapse
Affiliation(s)
- Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, New York, USA
| | - Ankita Jain
- School of Medicine, New York Medical College, Valhalla, New York, USA
| | - Pemla Jagtiani
- School of Medicine, SUNY Downstate Health Sciences University, New York, New York, USA
| | - Zachary L. Hickman
- Department of Neurosurgery, Mount Sinai Health System, New York, New York, USA
- Department of Neurosurgery, NYC Health + Hospitals/Elmhurst, New York, New York, USA
| | - Kristen Dams-O'Connor
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | |
Collapse
|