1
|
Ke Y, Yang R, Liu N. Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study. J Med Internet Res 2024; 26:e48330. [PMID: 38630522 DOI: 10.2196/48330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 08/01/2023] [Accepted: 01/14/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies. OBJECTIVE This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs). METHODS We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into "OAD" and "traditional intensive care" (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families. RESULTS A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope. CONCLUSIONS This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis.
Collapse
Affiliation(s)
- Yuhe Ke
- Division of Anesthesiology and Perioperative Medicine, Singapore General Hospital, Singapore, Singapore
| | - Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| |
Collapse
|
2
|
Chen P, Jin Y, Ma X, Lin Y. Public perception on active aging after COVID-19: an unsupervised machine learning analysis of 44,343 posts. Front Public Health 2024; 12:1329704. [PMID: 38515596 PMCID: PMC10956692 DOI: 10.3389/fpubh.2024.1329704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
Introduction To analyze public perceptions of active aging in China on mainstream social media platforms to determine whether the "14th Five Year Plan for the Development of the Aging Career and Older Adult Care System" issued by the CPC in 2022 has fully addressed public needs. Methods The original tweets posted on Weibo between January 1, 2020, and June 30, 2022, containing the words "aging" or "old age" were extracted. A bidirectional encoder representation from transformers (BERT)-based model was used to generate themes related to this perception. A qualitative thematic analysis and an independent review of the theme labels were conducted by the researchers. Results The findings indicate that public perceptions revolved around four themes: (1) health prevention and protection, (2) convenient living environments, (3) cognitive health and social integration, and (4) protecting the rights and interests of the older adult. Discussion Our study found that although the Plan aligns with most of these themes, it lacks clear planning for financial security and marital life.
Collapse
Affiliation(s)
| | | | | | - Yan Lin
- School of Foreign Language Studies, Wenzhou Medical University, Wenzhou, Zhejiang, China
| |
Collapse
|
3
|
Abd-Alrazaq A, Nashwan AJ, Shah Z, Abujaber A, Alhuwail D, Schneider J, AlSaad R, Ali H, Alomoush W, Ahmed A, Aziz S. Machine Learning-Based Approach for Identifying Research Gaps: COVID-19 as a Case Study. JMIR Form Res 2024; 8:e49411. [PMID: 38441952 PMCID: PMC10916961 DOI: 10.2196/49411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Revised: 11/14/2023] [Accepted: 02/06/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest. OBJECTIVE In this paper, we propose a machine learning-based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study. METHODS We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance). RESULTS After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: "virus of COVID-19," "risk factors of COVID-19," "prevention of COVID-19," "treatment of COVID-19," "health care delivery during COVID-19," "and impact of COVID-19." The most prominent topic, observed in over half of the analyzed studies, was "the impact of COVID-19." CONCLUSIONS The proposed machine learning-based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.
Collapse
Affiliation(s)
- Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | | | - Zubair Shah
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ahmad Abujaber
- Nursing Department, Hamad Medical Corporation, Doha, Qatar
| | - Dari Alhuwail
- Information Science Department, College of Life Sciences, Kuwait University, Kuwait, Kuwait
- Health Informatics Unit, Dasman Diabetes Institute, Kuwait, Kuwait
| | - Jens Schneider
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Rawan AlSaad
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Hazrat Ali
- Faculty of Computing and Information Technology, Sohar University, Sohar, Oman
| | - Waleed Alomoush
- School of Information Technology, Skyline University College, Sharjah, United Arab Emirates
| | - Arfan Ahmed
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Sarah Aziz
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| |
Collapse
|
4
|
Ozkara BB, Karabacak M, Margetis K, Smith W, Wintermark M, Yedavalli VS. Trends in stroke-related journals: Examination of publication patterns using topic modeling. J Stroke Cerebrovasc Dis 2024; 33:107665. [PMID: 38412931 DOI: 10.1016/j.jstrokecerebrovasdis.2024.107665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 01/15/2024] [Accepted: 02/24/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVES This study aims to demonstrate the capacity of natural language processing and topic modeling to manage and interpret the vast quantities of scholarly publications in the landscape of stroke research. These tools can expedite the literature review process, reveal hidden themes, and track rising research areas. MATERIALS AND METHODS Our study involved reviewing and analyzing articles published in five prestigious stroke journals, namely Stroke, International Journal of Stroke, European Stroke Journal, Translational Stroke Research, and Journal of Stroke and Cerebrovascular Diseases. The team extracted document titles, abstracts, publication years, and citation counts from the Scopus database. BERTopic was chosen as the topic modeling technique. Using linear regression models, current stroke research trends were identified. Python 3.1 was used to analyze and visualize data. RESULTS Out of the 35,779 documents collected, 26,732 were classified into 30 categories and used for analysis. "Animal Models," "Rehabilitation," and "Reperfusion Therapy" were identified as the three most prevalent topics. Linear regression models identified "Emboli," "Medullary and Cerebellar Infarcts," and "Glucose Metabolism" as trending topics, whereas "Cerebral Venous Thrombosis," "Statins," and "Intracerebral Hemorrhage" demonstrated a weaker trend. CONCLUSIONS The methodology can assist researchers, funders, and publishers by documenting the evolution and specialization of topics. The findings illustrate the significance of animal models, the expansion of rehabilitation research, and the centrality of reperfusion therapy. Limitations include a five-journal cap and a reliance on high-quality metadata.
Collapse
Affiliation(s)
- Burak Berksu Ozkara
- Department of Neuroradiology, MD Anderson Cancer Center, 1400 Pressler Street, Houston, bX, 77030, USA
| | - Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Avenue, New York, NY, 10029, USA
| | - Konstantinos Margetis
- Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Avenue, New York, NY, 10029, USA
| | - Wade Smith
- Department of Neurology, University of California San Francisco, 505 Parnassus Avenue, San Francisco, CA, 94143, USA
| | - Max Wintermark
- Department of Neuroradiology, MD Anderson Cancer Center, 1400 Pressler Street, Houston, bX, 77030, USA
| | - Vivek Srikar Yedavalli
- Department of Radiology and Radiological Sciences, Johns Hopkins School of Medicine, 600 N Wolfe Street, Baltimore, MD, 21287, USA.
| |
Collapse
|
5
|
Lovis C, Escobar M, Stukel TA, Austin PC, Jaakkimainen L. Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study. JMIR Med Inform 2022; 10:e40102. [PMID: 36534443 PMCID: PMC9808604 DOI: 10.2196/40102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/01/2022] [Accepted: 09/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. OBJECTIVE We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada. METHODS We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model. RESULTS Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs). CONCLUSIONS Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system.
Collapse
Affiliation(s)
| | - Michael Escobar
- Dalla Lana School of Public Health, Division of Biostatistics, University of Toronto, Toronto, ON, Canada
| | - Therese A Stukel
- ICES, Toronto, ON, Canada.,Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Peter C Austin
- ICES, Toronto, ON, Canada.,Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Liisa Jaakkimainen
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada.,ICES, Toronto, ON, Canada.,Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
6
|
Egger R, Yu J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front Sociol 2022; 7:886498. [PMID: 35602001 PMCID: PMC9120935 DOI: 10.3389/fsoc.2022.886498] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/19/2022] [Indexed: 05/28/2023]
Abstract
The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.
Collapse
Affiliation(s)
- Roman Egger
- Innovation and Management in Tourism, Salzburg University of Applied Sciences, Salzburg, Austria
| | - Joanne Yu
- Department of Tourism and Service Management, Modul University Vienna, Vienna, Austria
| |
Collapse
|