1
|
Khodaei M, Edwards SV, Beerli P. Estimating Genome-wide Phylogenies Using Probabilistic Topic Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.20.572577. [PMID: 39605625 PMCID: PMC11601389 DOI: 10.1101/2023.12.20.572577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Methods for rapidly inferring the evolutionary history of species or populations with genome-wide data are progressing, but computational constraints still limit our abilities in this area. We developed an alignment-free method to infer genome-wide phylogenies and implemented it in the Python package T opic C ontml . The method uses probabilistic topic modeling (specifically, Latent Dirichlet Allocation or LDA) to extract 'topic' frequencies from k -mers, which are derived from multilocus DNA sequences. These extracted frequencies then serve as an input for the program C ontml in the PHYLIP package, which is used to generate a species tree. We evaluated the performance of T opic C ontml on simulated datasets with gaps and three biological datasets: (1) 14 DNA sequence loci from two Australian bird species distributed across nine populations, (2) 5162 loci from 80 mammal species, and (3) raw, unaligned, non-orthologous P ac B io sequences from 12 bird species. Our empirical results and simulated data suggest that our method is efficient and statistically robust. We also assessed the uncertainty of the estimated relationships among clades using a bootstrap procedure.
Collapse
|
2
|
Ortu M, Ibba G, Destefanis G, Conversano C, Tonelli R. Taxonomic insights into ethereum smart contracts by linking application categories to security vulnerabilities. Sci Rep 2024; 14:23433. [PMID: 39379443 PMCID: PMC11461646 DOI: 10.1038/s41598-024-73454-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 09/17/2024] [Indexed: 10/10/2024] Open
Abstract
The expansion of smart contracts on the Ethereum blockchain has created a diverse ecosystem of decentralized applications. This growth, however, poses challenges in classifying and securing these contracts. Existing research often separately addresses either classification or vulnerability detection, without a comprehensive analysis of how contract types are related to security risks. Our study addresses this gap by developing a taxonomy of smart contracts and examining the potential vulnerabilities associated with each category. We use the Latent Dirichlet Allocation (LDA) model to analyze a dataset of over 100,040 Ethereum smart contracts, which is notably larger than those used in previous studies. Our analysis categorizes these contracts into eleven groups, with five primary categories: Notary, Token, Game, Financial, and Blockchain interaction. This categorization sheds light on the various functions and applications of smart contracts in today's blockchain environment. In response to the growing need for better security in smart contract development, we also investigate the link between these categories and common vulnerabilities. Our results identify specific vulnerabilities associated with different contract types, providing valuable insights for developers and auditors. This relationship between contract categories and vulnerabilities is a new contribution to the field, as it has not been thoroughly explored in previous research. Our findings offer a detailed taxonomy of smart contracts and practical recommendations for enhancing security. By understanding how contract categories correlate with vulnerabilities, developers can implement more effective security measures, and auditors can better prioritize their reviews. This study advances both academic knowledge of smart contracts and practical strategies for securing decentralized applications on the Ethereum platform.
Collapse
Affiliation(s)
- Marco Ortu
- Department of Business and Economics Sciences, University of Cagliari, Viale Fra Ignazio 17, Cagliari, Italy.
| | - Giacomo Ibba
- Department of Computer Science and Mathematics, University of Cagliari, Via Porcell 4, Cagliari, Italy
| | | | - Claudio Conversano
- Department of Business and Economics Sciences, University of Cagliari, Viale Fra Ignazio 17, Cagliari, Italy
| | - Roberto Tonelli
- Department of Computer Science and Mathematics, University of Cagliari, Via Porcell 4, Cagliari, Italy
| |
Collapse
|
3
|
Brown WS, Paul LK. The corpus callosum and creativity revisited. Front Hum Neurosci 2024; 18:1443970. [PMID: 39328385 PMCID: PMC11424518 DOI: 10.3389/fnhum.2024.1443970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/26/2024] [Indexed: 09/28/2024] Open
Abstract
In 1969 Joseph Bogen, a colleague of Roger Sperry and the neurosurgeon who performed commissurotomy on Sperry's "split-brain" study participants, wrote an article subtitled "The Corpus Callosum and Creativity." The article argued for the critical role of the corpus callosum and hemispheric specialization in creativity. Building on a four-stage model of creativity (learning, incubation, illumination, refinement) and Sperry's innovative studies, the Bogens posited that in the intact brain, creativity relies on two opposing functions of the corpus callosum: (a) interhemispheric inhibition to facilitate simultaneous and independent activity of uniquely-specialized processing centers during learning and incubation and (b) interhemispheric facilitation to support the increased bi-hemispheric integration and coordination which produces illumination. This article revisits the Bogens' theory considering scientific discoveries over the past 50 years. We begin by reviewing relevant findings from split-brain studies, and then briefly consider findings from studies that examine the association of creativity with callosal structure and function in neurotypical participants. Finally, we provide an in-depth discussion of creativity in persons with agenesis of the corpus callosum (ACC)-the congenital absence of the corpus callosum. These three lines of inquiry strongly support the theory suggested by Bogen and Bogen in 1969 and provide further clarification regarding the critical and unique role of the corpus callosum in creative cognition.
Collapse
Affiliation(s)
- Warren S. Brown
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, United States
- International Research Consortium for the Corpus Callosum and Cerebral Connectivity (IRC), Pasadena, CA, United States
| | - Lynn K. Paul
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, United States
- International Research Consortium for the Corpus Callosum and Cerebral Connectivity (IRC), Pasadena, CA, United States
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, United States
| |
Collapse
|
4
|
Davies H, Nenadic G, Alfattni G, Arguello Casteleiro M, Al Moubayed N, Farrell S, Radford AD, Noble PJM. Text mining for disease surveillance in veterinary clinical data: part two, training computers to identify features in clinical text. Front Vet Sci 2024; 11:1352726. [PMID: 39239390 PMCID: PMC11376235 DOI: 10.3389/fvets.2024.1352726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 07/17/2024] [Indexed: 09/07/2024] Open
Abstract
In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more "traditional" text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models. Specifically, we describe the use of language models for record annotation, unsupervised approaches for identifying topics within large datasets, and discuss more recent developments in the area of generative models (such as ChatGPT). As these models become increasingly complex it is pertinent that researchers and clinicians work together to ensure that the outputs of these models are explainable in order to instill confidence in any conclusions drawn from them.
Collapse
Affiliation(s)
- Heather Davies
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | - Ghada Alfattni
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | | | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Sean Farrell
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Alan D Radford
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - P-J M Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
5
|
Brown WS, Hoard M, Birath B, Graves M, Nolty A, Paul LK. Imaginative elaboration in agenesis of the corpus callosum: topic modeling and perplexity. J Int Neuropsychol Soc 2024; 30:643-650. [PMID: 38752403 DOI: 10.1017/s1355617724000183] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2024]
Abstract
OBJECTIVE Previous studies have found deficits in imaginative elaboration and social inference to be associated with agenesis of the corpus callosum (ACC; Renteria-Vasquez et al., 2022; Turk et al., 2009). In the current study, Thematic Apperception Test (TAT) responses from a neurotypical control group and a group of individuals with ACC were used to further study the capacity for imaginative elaboration and story coherence. METHOD Topic modeling was employed utilizing Latent Diritchlet Allocation to characterize the narrative responses to the pictures used in the TAT. A measure of the difference between models (perplexity) was used to compare the topics of the responses of individual participants to the common core model derived from the responses of the control group. Story coherence was tested using sentence-to-sentence Latent Semantic Analysis. RESULTS Group differences in perplexity were statistically significant overall, and for each card individually (p < .001). There were no differences between the groups in story coherence. CONCLUSIONS TAT narratives from persons with ACC were normally coherent, but more conventional (i.e., more similar to the core text) compared to those of neurotypical controls. Individuals with ACC can make conventional social inferences about socially ambiguous stimuli, but are restricted in their imaginative elaborations, resulting in less topical variability (lower perplexity values) compared to neurotypical controls.
Collapse
Affiliation(s)
- Warren S Brown
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, USA
- International Research Consortium for the Corpus Callosum and Cerebral Connectivity (IRC5), Pasadena, CA, USA
| | - Matthew Hoard
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, USA
| | - Brandon Birath
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Mark Graves
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, USA
| | - Anne Nolty
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, USA
| | - Lynn K Paul
- Travis Research Institute, Fuller School of Psychology & Marriage and Family Therapy, Pasadena, CA, USA
- International Research Consortium for the Corpus Callosum and Cerebral Connectivity (IRC5), Pasadena, CA, USA
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
6
|
Atenstaedt RL. Word cloud analysis of Family Practice. Does the journal fulfil its editorial policy? Fam Pract 2024; 41:382-383. [PMID: 36852766 DOI: 10.1093/fampra/cmad020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Affiliation(s)
- Robert L Atenstaedt
- Public Health Department, Betsi Cadwaladr University Health Board & Glyndwr University, Abergele Hospital, Llanfair Road, Abergele LL22 8DP, United Kingdom
| |
Collapse
|
7
|
Xu RL, Wang S, Wang Z, Zhang Y, Xiao Y, Pathak J, Hodge D, Leng Y, Watkins SC, Ding Y, Peng Y. Analyzing Social Factors to Enhance Suicide Prevention Across Population Groups. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2024; 2024:189-199. [PMID: 39372906 PMCID: PMC11450796 DOI: 10.1109/ichi61247.2024.00032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Social factors like family background, education level, financial status, and stress can impact public health outcomes, such as suicidal ideation. However, the analysis of social factors for suicide prevention has been limited by the lack of up-to-date suicide reporting data, variations in reporting practices, and small sample sizes. In this study, we analyzed 172,629 suicide incidents from 2014 to 2020 utilizing the National Violent Death Reporting System Restricted Access Database (NVDRS-RAD). Logistic regression models were developed to examine the relationships between demographics and suicide-related circumstances. Trends over time were assessed, and Latent Dirichlet Allocation (LDA) was used to identify common suicide-related social factors. Mental health, interpersonal relationships, mental health treatment and disclosure, and school/work-related stressors were identified as the main themes of suicide-related social factors. This study also identified systemic disparities across various population groups, particularly concerning Black individuals, young people aged under 24, healthcare practitioners, and those with limited education backgrounds, which shed light on potential directions for demographic-specific suicidal interventions.
Collapse
Affiliation(s)
- Richard Li Xu
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Song Wang
- Cockrell School of Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Zewei Wang
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yuhan Zhang
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yunyu Xiao
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jyotishman Pathak
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - David Hodge
- National Center for Bioethics in Research and Health Care, Tuskegee University, Tuskegee, AL, USA
| | - Yan Leng
- McCombs School of Business, The University of Texas at Austin, Austin, TX, USA
| | - S Craig Watkins
- School of Journalism and Media, The University of Texas at Austin, Austin, TX, USA
| | - Ying Ding
- School of Information, The University of Texas at Austin, Austin, TX, USA
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
8
|
Zhang H, Lu X, Lu B, Gullo G, Chen L. Measuring the composition of the tumor microenvironment with transcriptome analysis: past, present and future. Future Oncol 2024; 20:1207-1220. [PMID: 38362731 PMCID: PMC11318690 DOI: 10.2217/fon-2023-0658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
Interactions between tumor cells and immune cells in the tumor microenvironment (TME) play a vital role the mechanisms of immune evasion, by which cancer cells escape immune elimination. Thus, the characterization and quantification of different components in the TME is a hot topic in molecular biology and drug discovery. Since the development of transcriptome sequencing in bulk tissue, single cells and spatial dimensions, there are increasing methods emerging to deconvolute and subtype the TME. This review discusses and compares such computational strategies and downstream subtyping analyses. Integrative analyses of the transcriptome with other data, such as epigenetics and T-cell receptor sequencing, are needed to obtain comprehensive knowledge of the dynamic TME.
Collapse
Affiliation(s)
- Han Zhang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
- UPMC Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Binfeng Lu
- Center for Discovery & Innovation, Hackensack Meridian Health, Nutley, NJ 07110, USA
| | - Giuseppe Gullo
- Department of Obstetrics & Gynecology, Villa Sofia Cervello Hospital, University of Palermo, 90146, Palermo, Italy
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
9
|
Jamialahmadi H, Khalili-Tanha G, Nazari E, Rezaei-Tavirani M. Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2024; 17:241-252. [PMID: 39308539 PMCID: PMC11413381 DOI: 10.22037/ghfbb.v17i3.2977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/11/2024] [Indexed: 09/25/2024]
Abstract
The incorporation of AI models into bioinformatics has brought about a revolutionary era in the analysis and interpretation of biological data. This mini-review offers a succinct overview of the indispensable role AI plays in the convergence of computational techniques and biological research. The search strategy followed PRISMA guidelines, encompassing databases such as PubMed, Embase, and Google Scholar to include studies published between 2018 and 2024, utilizing specific keywords. We explored the diverse applications of AI methodologies, including machine learning (ML), deep learning (DL), and natural language processing (NLP), across various domains of bioinformatics. These domains encompass genome sequencing, protein structure prediction, drug discovery, systems biology, personalized medicine, imaging, signal processing, and text mining. AI algorithms have exhibited remarkable efficacy in tackling intricate biological challenges, spanning from genome sequencing to protein structure prediction, and from drug discovery to personalized medicine. In conclusion, this study scrutinizes the evolving landscape of AI-driven tools and algorithms, emphasizing their pivotal role in expediting research, facilitating data interpretation, and catalyzing innovations in biomedical sciences.
Collapse
Affiliation(s)
- Hamid Jamialahmadi
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Ghazaleh Khalili-Tanha
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Elham Nazari
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
Arueyingho OV, Al-Taie A, McCallum C. Scoping review: Machine learning interventions in the management of healthcare systems. Digit Health 2024; 10:20552076221144095. [PMID: 39444734 PMCID: PMC11497546 DOI: 10.1177/20552076221144095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 11/18/2022] [Indexed: 10/25/2024] Open
Abstract
Background Healthcare institutions focus on improving the quality of life for end-users, with key performance indicators like access to essential medicines reflecting the effectiveness of management. Effective healthcare management involves planning, organizing, and controlling institutions built on human resources, data systems, service delivery, access to medicines, finance, and leadership. According to the World Health Organization, these elements must be balanced for an optimal healthcare system. Big data generated from healthcare institutions, including health records and genomic data, is crucial for smart staffing, decision-making, risk management, and patient engagement. Properly organizing and analysing this data is essential, and machine learning, a sub-field of artificial intelligence, can optimize these processes, leading to better overall healthcare management. Objectives This review examines the major applications of machine learning in healthcare management, the algorithms frequently used in data analysis, their limitations, and the evidence-based benefits of machine learning in healthcare. Methods Following PRISMA guidelines, databases such as IEEE Xplore, ScienceDirect, ACM Digital Library, and SCOPUS were searched for eligible articles published between 2011 and 2021. Articles had to be in English, peer-reviewed, and include relevant keywords like healthcare, management, and machine learning. Results Out of 51 relevant articles, 6 met the inclusion criteria. Identified algorithms include topic modelling, dynamic clustering, neural networks, decision trees, and ensemble classifiers, applied in areas such as electronic health records, chatbots, and multi-disease prediction. Conclusion Machine learning supports healthcare management by aiding decision-making, processing big data, and providing insights for system improvements.
Collapse
Affiliation(s)
- Oritsetimeyin V Arueyingho
- School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths (SCEEM), Centre for Doctoral Training in Digital Health and Care, University of Bristol, UK
| | - Anmar Al-Taie
- School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths (SCEEM), Centre for Doctoral Training in Digital Health and Care, University of Bristol, UK
| | - Claire McCallum
- Department of Clinical Pharmacy, Faculty of Pharmacy, Istinye University, Istanbul, Turkey
| |
Collapse
|
11
|
Silva RPD, Pollettini JT, Pazin Filho A. Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection. CAD SAUDE PUBLICA 2023; 39:e00243722. [PMID: 38055548 DOI: 10.1590/0102-311xpt243722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 07/04/2023] [Indexed: 12/08/2023] Open
Abstract
Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.
Collapse
Affiliation(s)
- Rildo Pinto da Silva
- Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brasil
| | | | - Antonio Pazin Filho
- Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brasil
| |
Collapse
|
12
|
Kim IB, Choi J, Park SC, Koike S, Kwon JS, Kim E, Choi HS, Lee JY, Lee YS. Data-mining analysis of media frame effects on social perception of schizophrenia renaming in Korea. BMC Psychiatry 2023; 23:882. [PMID: 38012639 PMCID: PMC10683161 DOI: 10.1186/s12888-023-05386-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 11/20/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND In 2011, Korean Neuropsychiatric Association renamed schizophrenia from 'mind split disorder' ('Jungshinbunyeolbyung' in Korean) to 'attunement disorder' ('Johyeonbyung' in Korean), in a strategic way to reduce social stigma toward people with schizophrenia. However, there remains an elusive consensus that how the renaming effort has contributed to changes in the social perception of schizophrenia in Korea. METHODS With this regard, we explored whether media frames alter the social perception, in ways of respecting or disrespecting schizophrenia patients before and after the renaming. This study extensively investigated media keywords related to schizophrenia across the time by applying both language and epidemiologic analyses. RESULTS In results, the media keywords have been negatively described for schizophrenia patients both before and after the renaming. Further, from an analysis using the regression model, a significant correlation was observed between the frequency of negative keywords and the hospitalization frequency of schizophrenia patients. CONCLUSIONS These findings suggest that the social perception of schizophrenia has been scarcely changed, but rather remained negatively biased against schizophrenia patients, in spite of the renaming effort. Notably, the biased media frames have been demonstrated to negatively impact on the social perception, and even on the medical use patterns of general schizophrenia patients. In conclusion, we suggest that the unbiased media frames along with the renaming effort may collectively help reduce the negative social perception of schizophrenia. TRIAL REGISTRATION This study was approved from the Institute of Review Board (IRB) of the Yoing-In Mental Hospital (IRB No. YIMH-IRB-2019-02).
Collapse
Affiliation(s)
- Il Bin Kim
- Department of Psychiatry, CHA Gangnam Medical Center, CHA University School of Medicine, Gangnam, Republic of Korea
| | - Joonho Choi
- Department of Psychiatry, Hanyang University Guri Hospital, Guri, Republic of Korea
| | - Seon-Cheol Park
- Department of Psychiatry, Hanyang University Guri Hospital, Guri, Republic of Korea
| | - Shinsuke Koike
- University of Tokyo Institute for Diversity and Adaptation of Human Mind (UTIDAHM), University of Tokyo, Tokyo, Japan
| | - Jun Soo Kwon
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Eunkyung Kim
- Department of Psychiatry, Hanyang University Guri Hospital, Guri, Republic of Korea
| | - Hyo Sun Choi
- Medicine Center for Mental health research, Hanyang University College, Seoul, Republic of Korea
| | - Ju Yeon Lee
- Medicine Center for Mental health research, Hanyang University College, Seoul, Republic of Korea
| | - Yu Sang Lee
- Department of Psychiatry, Yong-In Mental Hospital, 940 Jungbu-daero, Giheung-gu, Yongin, Republic of Korea.
| |
Collapse
|
13
|
Kyröläinen AJ, Gillett J, Karabin M, Sonnadara R, Kuperman V. Cognitive and social well-being in older adulthood: The CoSoWELL corpus of written life stories. Behav Res Methods 2023; 55:2885-2909. [PMID: 36002624 PMCID: PMC9400578 DOI: 10.3758/s13428-022-01926-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/23/2022] [Indexed: 11/30/2022]
Abstract
This paper presents the Cognitive and Social WELL-being (CoSoWELL) project that consists of two components. One is a large corpus of narratives written by over 1000 North American older adults (55+ years old) in five test sessions before and during the first year of the COVID-19 pandemic. The other component is a rich collection of socio-demographic data collected through a survey from the same participants. This paper introduces the first release of the corpus consisting of 1.3 million tokens and the survey data (CoSoWELL version 1.0). It also presents a series of analyses validating design decisions for creating the corpus of narratives written about personal life events that took place in the distant past, recent past (yesterday) and future, along with control narratives. We report results of computational topic modeling and linguistic analyses of the narratives in the corpus, which track the time-locked impact of the COVID-19 pandemic on the content of autobiographical memories before and during the COVID-19 pandemic. The main findings demonstrate a high validity of our analytical approach to unique narrative data and point to both the locus of topical shifts (narratives about recent past and future) and their detailed timeline. We make the CoSoWELL corpus and survey data available to researchers and discuss implications of our findings in the framework of research on aging and autobiographical memories under stress.
Collapse
Affiliation(s)
- Aki-Juhani Kyröläinen
- Department of Linguistics and Languages, McMaster University, Togo Salmon Hall 513, 1280 Main Street West, Hamilton, Ontario, Canada, 8S 4M2.
| | | | | | | | | |
Collapse
|
14
|
Yamanouchi Y, Nakamura T, Ikeda T, Usuku K. An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records. Methods Inf Med 2023; 62:110-118. [PMID: 36809794 PMCID: PMC10462427 DOI: 10.1055/a-2039-3773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 04/13/2022] [Indexed: 02/23/2023]
Abstract
BACKGROUND Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques. OBJECTIVE We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques. METHODS Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV). RESULTS In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV. CONCLUSION The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.
Collapse
Affiliation(s)
- Yoshinori Yamanouchi
- Department of Medical Information Science, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Taishi Nakamura
- Department of Medical Information Science, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Tokunori Ikeda
- Department of Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, Sojo University, Nishi-ku, Kumamoto, Japan
| | - Koichiro Usuku
- Department of Medical Information Science, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
| |
Collapse
|
15
|
Zhang H, Lu X, Lu B, Chen L. scGEM: Unveiling the Nested Tree-Structured Gene Co-Expressing Modules in Single Cell Transcriptome Data. Cancers (Basel) 2023; 15:4277. [PMID: 37686554 PMCID: PMC10486867 DOI: 10.3390/cancers15174277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/22/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
BACKGROUND Single-cell transcriptome analysis has fundamentally changed biological research by allowing higher-resolution computational analysis of individual cells and subsets of cell types. However, few methods have met the need to recognize and quantify the underlying cellular programs that determine the specialization and differentiation of the cell types. METHODS In this study, we present scGEM, a nested tree-structured nonparametric Bayesian model, to reveal the gene co-expression modules (GEMs) reflecting transcriptome processes in single cells. RESULTS We show that scGEM can discover shared and specialized transcriptome signals across different cell types using peripheral blood mononuclear single cells and early brain development single cells. scGEM outperformed other methods in perplexity and topic coherence (p < 0.001) on our simulation data. Larger datasets, deeper trees and pre-trained models are shown to be positively associated with better scGEM performance. The GEMs obtained from triple-negative breast cancer single cells exhibited better correlations with lymphocyte infiltration (p = 0.009) and the cell cycle (p < 0.001) than other methods in additional validation on the bulk RNAseq dataset. CONCLUSIONS Altogether, we demonstrate that scGEM can be used to model the hidden cellular functions of single cells, thereby unveiling the specialization and generalization of transcriptomic programs across different types of cells.
Collapse
Affiliation(s)
- Han Zhang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
- UPMC Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Binfeng Lu
- Center for Discovery and Innovation, Hackensack Meridian Health, Nutley, NJ 07110, USA
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
| |
Collapse
|
16
|
Shankar K, Chandrasekaran R, Jeripity Venkata P, Miketinas D. Investigating the Role of Nutrition in Enhancing Immunity During the COVID-19 Pandemic: Twitter Text-Mining Analysis. J Med Internet Res 2023; 25:e47328. [PMID: 37428522 PMCID: PMC10366666 DOI: 10.2196/47328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/09/2023] [Accepted: 05/09/2023] [Indexed: 07/11/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic has brought to the spotlight the critical role played by a balanced and healthy diet in bolstering the human immune system. There is burgeoning interest in nutrition-related information on social media platforms like Twitter. There is a critical need to assess and understand public opinion, attitudes, and sentiments toward nutrition-related information shared on Twitter. OBJECTIVE This study uses text mining to analyze nutrition-related messages on Twitter to identify and analyze how the general public perceives various food groups and diets for improving immunity to the SARS-CoV-2 virus. METHODS We gathered 71,178 nutrition-related tweets that were posted between January 01, 2020, and September 30, 2020. The Correlated Explanation text mining algorithm was used to identify frequently discussed topics that users mentioned as contributing to immunity building against SARS-CoV-2. We assessed the relative importance of these topics and performed a sentiment analysis. We also qualitatively examined the tweets to gain a closer understanding of nutrition-related topics and food groups. RESULTS Text-mining yielded 10 topics that users discussed frequently on Twitter, viz proteins, whole grains, fruits, vegetables, dairy-related, spices and herbs, fluids, supplements, avoidable foods, and specialty diets. Supplements were the most frequently discussed topic (23,913/71,178, 33.6%) with a higher proportion (20,935/23,913, 87.75%) exhibiting a positive sentiment with a score of 0.41. Consuming fluids (17,685/71,178, 24.85%) and fruits (14,807/71,178, 20.80%) were the second and third most frequent topics with favorable, positive sentiments. Spices and herbs (8719/71,178, 12.25%) and avoidable foods (8619/71,178, 12.11%) were also frequently discussed. Negative sentiments were observed for a higher proportion of avoidable foods (7627/8619, 84.31%) with a sentiment score of -0.39. CONCLUSIONS This study identified 10 important food groups and associated sentiments that users discussed as a means to improve immunity. Our findings can help dieticians and nutritionists to frame appropriate interventions and diet programs.
Collapse
Affiliation(s)
- Kavitha Shankar
- Department of Nutrition and Food Sciences, Texas Woman's University Institute for Health Sciences, Houston, TX, United States
| | - Ranganathan Chandrasekaran
- Department of Information and Decision Sciences, University of Illinois at Chicago, Chicago, IL, United States
| | | | - Derek Miketinas
- Department of Nutrition and Food Sciences, Texas Woman's University Institute for Health Sciences, Houston, TX, United States
| |
Collapse
|
17
|
Vora LK, Gholap AD, Jetha K, Thakur RRS, Solanki HK, Chavda VP. Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics 2023; 15:1916. [PMID: 37514102 PMCID: PMC10385763 DOI: 10.3390/pharmaceutics15071916] [Citation(s) in RCA: 102] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 06/28/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023] Open
Abstract
Artificial intelligence (AI) has emerged as a powerful tool that harnesses anthropomorphic knowledge and provides expedited solutions to complex challenges. Remarkable advancements in AI technology and machine learning present a transformative opportunity in the drug discovery, formulation, and testing of pharmaceutical dosage forms. By utilizing AI algorithms that analyze extensive biological data, including genomics and proteomics, researchers can identify disease-associated targets and predict their interactions with potential drug candidates. This enables a more efficient and targeted approach to drug discovery, thereby increasing the likelihood of successful drug approvals. Furthermore, AI can contribute to reducing development costs by optimizing research and development processes. Machine learning algorithms assist in experimental design and can predict the pharmacokinetics and toxicity of drug candidates. This capability enables the prioritization and optimization of lead compounds, reducing the need for extensive and costly animal testing. Personalized medicine approaches can be facilitated through AI algorithms that analyze real-world patient data, leading to more effective treatment outcomes and improved patient adherence. This comprehensive review explores the wide-ranging applications of AI in drug discovery, drug delivery dosage form designs, process optimization, testing, and pharmacokinetics/pharmacodynamics (PK/PD) studies. This review provides an overview of various AI-based approaches utilized in pharmaceutical technology, highlighting their benefits and drawbacks. Nevertheless, the continued investment in and exploration of AI in the pharmaceutical industry offer exciting prospects for enhancing drug development processes and patient care.
Collapse
Affiliation(s)
- Lalitkumar K Vora
- School of Pharmacy, Queen's University Belfast, 97 Lisburn Road, Belfast BT9 7BL, UK
| | - Amol D Gholap
- Department of Pharmaceutics, St. John Institute of Pharmacy and Research, Palghar 401404, Maharashtra, India
| | - Keshava Jetha
- Department of Pharmaceutics and Pharmaceutical Technology, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
- Ph.D. Section, Gujarat Technological University, Ahmedabad 382424, Gujarat, India
| | | | - Hetvi K Solanki
- Pharmacy Section, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
| | - Vivek P Chavda
- Department of Pharmaceutics and Pharmaceutical Technology, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
| |
Collapse
|
18
|
Moy AJ, Withall J, Hobensack M, Yeji Lee R, Levy DR, Rossetti SC, Rosenbloom ST, Johnson K, Cato K. Eliciting Insights From Chat Logs of the 25X5 Symposium to Reduce Documentation Burden: Novel Application of Topic Modeling. J Med Internet Res 2023; 25:e45645. [PMID: 37195741 PMCID: PMC10233429 DOI: 10.2196/45645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 03/03/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023] Open
Abstract
BACKGROUND Addressing clinician documentation burden through "targeted solutions" is a growing priority for many organizations ranging from government and academia to industry. Between January and February 2021, the 25 by 5: Symposium to Reduce Documentation Burden on US Clinicians by 75% (25X5 Symposium) convened across 2 weekly 2-hour sessions among experts and stakeholders to generate actionable goals for reducing clinician documentation over the next 5 years. Throughout this web-based symposium, we passively collected attendees' contributions to a chat functionality-with their knowledge that the content would be deidentified and made publicly available. This presented a novel opportunity to synthesize and understand participants' perceptions and interests from chat messages. We performed a content analysis of 25X5 Symposium chat logs to identify themes about reducing clinician documentation burden. OBJECTIVE The objective of this study was to explore unstructured chat log content from the web-based 25X5 Symposium to elicit latent insights on clinician documentation burden among clinicians, health care leaders, and other stakeholders using topic modeling. METHODS Across the 6 sessions, we captured 1787 messages among 167 unique chat participants cumulatively; 14 were private messages not included in the analysis. We implemented a latent Dirichlet allocation (LDA) topic model on the aggregated dataset to identify clinician documentation burden topics mentioned in the chat logs. Coherence scores and manual examination informed optimal model selection. Next, 5 domain experts independently and qualitatively assigned descriptive labels to model-identified topics and classified them into higher-level categories, which were finalized through a panel consensus. RESULTS We uncovered ten topics using the LDA model: (1) determining data and documentation needs (422/1773, 23.8%); (2) collectively reassessing documentation requirements in electronic health records (EHRs) (252/1773, 14.2%); (3) focusing documentation on patient narrative (162/1773, 9.1%); (4) documentation that adds value (147/1773, 8.3%); (5) regulatory impact on clinician burden (142/1773, 8%); (6) improved EHR user interface and design (128/1773, 7.2%); (7) addressing poor usability (122/1773, 6.9%); (8) sharing 25X5 Symposium resources (122/1773, 6.9%); (9) capturing data related to clinician practice (113/1773, 6.4%); and (10) the role of quality measures and technology in burnout (110/1773, 6.2%). Among these 10 topics, 5 high-level categories emerged: consensus building (821/1773, 46.3%), burden sources (365/1773, 20.6%), EHR design (250/1773, 14.1%), patient-centered care (162/1773, 9.1%), and symposium comments (122/1773, 6.9%). CONCLUSIONS We conducted a topic modeling analysis on 25X5 Symposium multiparticipant chat logs to explore the feasibility of this novel application and elicit additional insights on clinician documentation burden among attendees. Based on the results of our LDA analysis, consensus building, burden sources, EHR design, and patient-centered care may be important themes to consider when addressing clinician documentation burden. Our findings demonstrate the value of topic modeling in discovering topics associated with clinician documentation burden using unstructured textual content. Topic modeling may be a suitable approach to examine latent themes presented in web-based symposium chat logs.
Collapse
Affiliation(s)
- Amanda J Moy
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Jennifer Withall
- School of Nursing, Columbia University, New York, NY, United States
| | - Mollie Hobensack
- School of Nursing, Columbia University, New York, NY, United States
| | - Rachel Yeji Lee
- School of Nursing, Columbia University, New York, NY, United States
| | - Deborah R Levy
- School of Medicine, Yale University, New Haven, CT, United States
- Veteran's Affairs Connecticut Health Care System, Pain, Research, Informatics, Multi-morbidities Education Center, West Haven, CT, United States
| | - Sarah C Rossetti
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- School of Nursing, Columbia University, New York, NY, United States
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, United States
| | - Kevin Johnson
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Kenrick Cato
- School of Nursing, Columbia University, New York, NY, United States
- Department of Emergency Medicine, Columbia University Irving Medical Center, New York, NY, United States
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, United States
| |
Collapse
|
19
|
Grubbs AE, Sinha N, Garg R, Barber EL. Use of topic modeling to assess research trends in the journal Gynecologic Oncology. Gynecol Oncol 2023; 172:41-46. [PMID: 36933402 PMCID: PMC10245278 DOI: 10.1016/j.ygyno.2023.03.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/26/2023] [Accepted: 03/01/2023] [Indexed: 03/18/2023]
Abstract
STUDY OBJECTIVE There is scant research identifying thematic trends within medical research. This work may provide insight into how a given field values certain topics. We assessed the feasibility of using a machine learning approach to determine the most common research themes published in Gynecologic Oncology over a thirty-year period and to subsequently evaluate how interest in these topics changed over time. METHODS We retrieved the abstracts of all original research published in Gynecologic Oncology from 1990 to 2020 using PubMed. Abstract text was processed through a natural language processing algorithm and clustered into topical themes using latent Dirichlet allocation (LDA) prior to manual labeling. Topics were investigated for temporal trends. RESULTS We retrieved 12,586 original research articles, of which 11,217 were evaluable for subsequent analysis. Twenty-three research topics were selected at the completion of topic modeling. The topics of basic science genetics, epidemiologic methods, and chemotherapy experienced the greatest increase over the time period, while postoperative outcomes, reproductive age cancer management, and cervical dysplasia experienced the greatest decline. Interest in basic science research remained relatively constant. Topics were additionally reviewed for words indicative of either surgical or medical therapy. Both surgical and medical topics saw increasing interest, with surgical topics experiencing a greater increase and representing a higher proportion of published topics. CONCLUSIONS Topic modeling, a type of unsupervised machine learning, was successfully used to identify trends in research themes. The application of this technique provided insight into how the field of gynecologic oncology values the components of its scope of practice and therefore how it may choose to allocate grant funding, disseminate research, and participate in the public discourse.
Collapse
Affiliation(s)
- Allison E Grubbs
- Northwestern University Feinberg School of Medicine, Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Chicago, IL, USA.
| | - Nikita Sinha
- Northwestern University Feinberg School of Medicine, Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Chicago, IL, USA
| | - Ravi Garg
- Institute of Public Health and Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Emma L Barber
- Northwestern University Feinberg School of Medicine, Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Chicago, IL, USA; Institute of Public Health and Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA; Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL, USA; Center for Health Equity Transformation, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| |
Collapse
|
20
|
Topic modeling algorithms and applications: A survey. INFORM SYST 2023. [DOI: 10.1016/j.is.2022.102131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
21
|
Williams CYK, Li RX, Luo MY, Bance M. Exploring patient experiences and concerns in the online Cochlear implant community: A cross-sectional study and validation of automated topic modelling. Clin Otolaryngol 2023; 48:442-450. [PMID: 36645237 DOI: 10.1111/coa.14037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 12/20/2022] [Accepted: 01/07/2023] [Indexed: 01/17/2023]
Abstract
OBJECTIVE There is a paucity of research examining patient experiences of cochlear implants. We sought to use natural language processing methods to explore patient experiences and concerns in the online cochlear implant (CI) community. MATERIALS AND METHODS Cross-sectional study of posts on the online Reddit r/CochlearImplants forum from 1 March 2015 to 11 November 2021. Natural language processing using the BERTopic automated topic modelling technique was employed to cluster posts into semantically similar topics. Topic categorisation was manually validated by two independent reviewers and Cohen's kappa calculated to determine inter-rater reliability between machine vs human and human vs human categorisation. RESULTS We retrieved 987 posts from 588 unique Reddit users on the r/CochlearImplants forum. Posts were initially categorised by BERTopic into 16 different Topics, which were increased to 23 Topics following manual inspection. The most popular topics related to CI connectivity (n = 112), adults considering getting a CI (n = 107), surgery-related posts (n = 89) and day-to-day living with a CI (n = 85). Cohen's kappa among all posts was 0.62 (machine vs. human) and 0.72 (human vs. human), and among categorised posts was 0.85 (machine vs. human) and 0.84 (human vs. human). CONCLUSIONS This cross-sectional study of social media discussions among the online cochlear implant community identified common attitudes, experiences and concerns of patients living with, or seeking, a cochlear implant. Our validation of natural language processing methods to categorise topics shows that automated analysis of similar Otolaryngology-related content is a viable and accurate alternative to manual qualitative approaches.
Collapse
Affiliation(s)
- Christopher Y K Williams
- School of Clinical Medicine, University of Cambridge, Cambridge, UK.,Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Rosia X Li
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Michael Y Luo
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Manohar Bance
- Department of Otolaryngology-Head and Neck Surgery, Addenbrooke's Hospital, Cambridge, UK
| |
Collapse
|
22
|
Danesh F, Dastani M. Text classification technique for discovering country-based publications from international COVID-19 publications. Digit Health 2023; 9:20552076231185674. [PMID: 37426592 PMCID: PMC10328158 DOI: 10.1177/20552076231185674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 06/15/2023] [Indexed: 07/11/2023] Open
Abstract
Objective The significant increase in the number of COVID-19 publications, on the one hand, and the strategic importance of this subject area for research and treatment systems in the health field, on the other hand, reveals the need for text-mining research more than ever. The main objective of the present paper is to discover country-based publications from international COVID-19 publications with text classification techniques. Methods The present paper is applied research that has been performed using text-mining techniques such as clustering and text classification. The statistical population is all COVID-19 publications from PubMed Central® (PMC), extracted from November 2019 to June 2021. Latent Dirichlet allocation (LDA) was used for clustering, and support vector machine (SVM), scikit-learn library, and Python programming language were used for text classification. Text classification was applied to discover the consistency of Iranian and international topics. Results The findings showed that seven topics were extracted using the LDA algorithm for international and Iranian publications on COVID-19. Moreover, the COVID-19 publications show the largest share in the subject area of "Social and Technology in COVID-19" at the international (April 2021) and national (February 2021) levels with 50.61% and 39.44%, respectively. The highest rate of publications at international and national levels was in April 2021 and February 2021, respectively. Conclusion One of the most important results of this study was discovering a common trend and consistency of Iranian and international publications on COVID-19. Accordingly, in the topic category "Covid-19 Proteins: Vaccine and Antibody Response," Iranian publications have a common publishing and research trend with international ones.
Collapse
Affiliation(s)
| | - Meisam Dastani
- Statistics and Information Technology Department, Gonabad University of Medical Science, Gonabad, Iran
| |
Collapse
|
23
|
Ebrahimi F, Dehghani M, Makkizadeh F. Analysis of Persian Bioinformatics Research with Topic Modeling. BIOMED RESEARCH INTERNATIONAL 2023; 2023:3728131. [PMID: 37101687 PMCID: PMC10125747 DOI: 10.1155/2023/3728131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/17/2023] [Accepted: 03/18/2023] [Indexed: 04/28/2023]
Abstract
Purpose As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attention with the rapid accumulation of biological datasets. Thus, this research is aimed at modeling the topic content of the bioinformatics literature presented by Iranian researchers in the Scopus Citation Database. Methodology. This research was a descriptive-exploratory study, and the studied population included 3899 papers indexed in the Scopus database, which had been indexed in this database until March 9, 2022. The topic modeling was then performed on the abstracts and titles of the papers. A combination of LDA and TF-IDF was utilized for topic modeling. Findings. The data analysis with topic modeling resulted in identifying seven main topics "Molecular Modeling," "Gene Expression," "Biomarker," "Coronavirus," "Immunoinformatics," "Cancer Bioinformatics," and "Systems Biology." Moreover, "Systems Biology" and "Coronavirus" had the largest and smallest clusters, respectively. Conclusion The present investigation demonstrated an acceptable performance for the LDA algorithm in classifying the topics included in this field. The extracted topic clusters indicated excellent consistency and topic connection with each other.
Collapse
Affiliation(s)
- Fezzeh Ebrahimi
- Department of Scientometrics, Faculty of Social Sciences, Yazd University, Yazd, Iran
| | - Mohammad Dehghani
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | | |
Collapse
|
24
|
Shan Y, Ji M, Xie W, Lam KY, Chow CY. Public Trust in Artificial Intelligence Applications in Mental Health Care: Topic Modeling Analysis. JMIR Hum Factors 2022; 9:e38799. [DOI: 10.2196/38799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 07/10/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022] Open
Abstract
Background
Mental disorders (MDs) impose heavy burdens on health care (HC) systems and affect a growing number of people worldwide. The use of mobile health (mHealth) apps empowered by artificial intelligence (AI) is increasingly being resorted to as a possible solution.
Objective
This study adopted a topic modeling (TM) approach to investigate the public trust in AI apps in mental health care (MHC) by identifying the dominant topics and themes in user reviews of the 8 most relevant mental health (MH) apps with the largest numbers of reviewers.
Methods
We searched Google Play for the top MH apps with the largest numbers of reviewers, from which we selected the most relevant apps. Subsequently, we extracted data from user reviews posted from January 1, 2020, to April 2, 2022. After cleaning the extracted data using the Python text processing tool spaCy, we ascertained the optimal number of topics, drawing on the coherence scores and used latent Dirichlet allocation (LDA) TM to generate the most salient topics and related terms. We then classified the ascertained topics into different theme categories by plotting them onto a 2D plane via multidimensional scaling using the pyLDAvis visualization tool. Finally, we analyzed these topics and themes qualitatively to better understand the status of public trust in AI apps in MHC.
Results
From the top 20 MH apps with the largest numbers of reviewers retrieved, we chose the 8 (40%) most relevant apps: (1) Wysa: Anxiety Therapy Chatbot; (2) Youper Therapy; (3) MindDoc: Your Companion; (4) TalkLife for Anxiety, Depression & Stress; (5) 7 Cups: Online Therapy for Mental Health & Anxiety; (6) BetterHelp-Therapy; (7) Sanvello; and (8) InnerHour. These apps provided 14.2% (n=559), 11.0% (n=431), 13.7% (n=538), 8.8% (n=356), 14.1% (n=554), 11.9% (n=468), 9.2% (n=362), and 16.9% (n=663) of the collected 3931 reviews, respectively. The 4 dominant topics were topic 4 (cheering people up; n=1069, 27%), topic 3 (calming people down; n=1029, 26%), topic 2 (helping figure out the inner world; n=963, 25%), and topic 1 (being an alternative or complement to a therapist; n=870, 22%). Based on topic coherence and intertopic distance, topics 3 and 4 were combined into theme 3 (dispelling negative emotions), while topics 2 and 1 remained 2 separate themes: theme 2 (helping figure out the inner world) and theme 1 (being an alternative or complement to a therapist), respectively. These themes and topics, though involving some dissenting voices, reflected an overall high status of trust in AI apps.
Conclusions
This is the first study to investigate the public trust in AI apps in MHC from the perspective of user reviews using the TM technique. The automatic text analysis and complementary manual interpretation of the collected data allowed us to discover the dominant topics hidden in a data set and categorize these topics into different themes to reveal an overall high degree of public trust. The dissenting voices from users, though only a few, can serve as indicators for health providers and app developers to jointly improve these apps, which will ultimately facilitate the treatment of prevalent MDs and alleviate the overburdened HC systems worldwide.
Collapse
|
25
|
Zhen C, Wang Y, Geng J, Han L, Li J, Peng J, Wang T, Hao J, Shang X, Wei Z, Zhu P, Peng J. A review and performance evaluation of clustering frameworks for single-cell Hi-C data. Brief Bioinform 2022; 23:6712299. [PMID: 36151714 DOI: 10.1093/bib/bbac385] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 07/31/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C (high-resolution chromosome conformation capture) technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. Recently, a few methods are well designed for single-cell Hi-C clustering. In this manuscript, we perform an in-depth benchmark study of available single-cell Hi-C data clustering methods to implement an evaluation system for multiple clustering frameworks based on both human and mouse datasets. We compare eight methods in terms of visualization and clustering performance. Performance is evaluated using four benchmark metrics including adjusted rand index, normalized mutual information, homogeneity and Fowlkes-Mallows index. Furthermore, we also evaluate the eight methods for the task of separating cells at different stages of the cell cycle based on single-cell Hi-C data.
Collapse
Affiliation(s)
- Caiwei Zhen
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jiaquan Geng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Lu Han
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jingyi Li
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jinghao Peng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jianye Hao
- School of Computer Software, Tianjin University, 300350, Tianjin, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Zhongyu Wei
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Peican Zhu
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, 710072, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, 710072, Xi'an, China
| |
Collapse
|
26
|
Hu M, Conway M. Perspectives of the COVID-19 Pandemic on Reddit: Comparative Natural Language Processing Study of the United States, the United Kingdom, Canada, and Australia. JMIR INFODEMIOLOGY 2022; 2:e36941. [PMID: 36196144 PMCID: PMC9521381 DOI: 10.2196/36941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/13/2022] [Accepted: 09/15/2022] [Indexed: 11/13/2022]
Abstract
Background
Since COVID-19 was declared a pandemic by the World Health Organization on March 11, 2020, the disease has had an unprecedented impact worldwide. Social media such as Reddit can serve as a resource for enhancing situational awareness, particularly regarding monitoring public attitudes and behavior during the crisis. Insights gained can then be utilized to better understand public attitudes and behaviors during the COVID-19 crisis, and to support communication and health-promotion messaging.
Objective
The aim of this study was to compare public attitudes toward the 2020-2021 COVID-19 pandemic across four predominantly English-speaking countries (the United States, the United Kingdom, Canada, and Australia) using data derived from the social media platform Reddit.
Methods
We utilized a topic modeling natural language processing method (more specifically latent Dirichlet allocation). Topic modeling is a popular unsupervised learning technique that can be used to automatically infer topics (ie, semantically related categories) from a large corpus of text. We derived our data from six country-specific, COVID-19–related subreddits (r/CoronavirusAustralia, r/CoronavirusDownunder, r/CoronavirusCanada, r/CanadaCoronavirus, r/CoronavirusUK, and r/coronavirusus). We used topic modeling methods to investigate and compare topics of concern for each country.
Results
Our consolidated Reddit data set consisted of 84,229 initiating posts and 1,094,853 associated comments collected between February and November 2020 for the United States, the United Kingdom, Canada, and Australia. The volume of posting in COVID-19–related subreddits declined consistently across all four countries during the study period (February 2020 to November 2020). During lockdown events, the volume of posts peaked. The UK and Australian subreddits contained much more evidence-based policy discussion than the US or Canadian subreddits.
Conclusions
This study provides evidence to support the contention that there are key differences between salient topics discussed across the four countries on the Reddit platform. Further, our approach indicates that Reddit data have the potential to provide insights not readily apparent in survey-based approaches.
Collapse
Affiliation(s)
- Mengke Hu
- Department of Biomedical Informatics University of Utah Salt Lake City, UT United States
| | - Mike Conway
- Department of Biomedical Informatics University of Utah Salt Lake City, UT United States
- School of Computing & Information Systems University of Melbourne Carlton Australia
- Centre for Digital Transformation of Health University of Melbourne Carlton Australia
| |
Collapse
|
27
|
Singh R, Nagpal S, Pinna NK, Mande SS. Tracking mutational semantics of SARS-CoV-2 genomes. Sci Rep 2022; 12:15704. [PMID: 36127400 PMCID: PMC9487856 DOI: 10.1038/s41598-022-20000-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 09/07/2022] [Indexed: 11/16/2022] Open
Abstract
Natural language processing (NLP) algorithms process linguistic data in order to discover the associated word semantics and develop models that can describe or even predict the latent meanings of the data. The applications of NLP become multi-fold while dealing with dynamic or temporally evolving datasets (e.g., historical literature). Biological datasets of genome-sequences are interesting since they are sequential as well as dynamic. Here we describe how SARS-CoV-2 genomes and mutations thereof can be processed using fundamental algorithms in NLP to reveal the characteristics and evolution of the virus. We demonstrate applicability of NLP in not only probing the temporal mutational signatures through dynamic topic modelling, but also in tracing the mutation-associations through tracing of semantic drift in genomic mutation records. Our approach also yields promising results in unfolding the mutational relevance to patient health status, thereby identifying putative signatures linked to known/highly speculated mutations of concern.
Collapse
Affiliation(s)
- Rohan Singh
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India
| | - Sunil Nagpal
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India.
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, 110025, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| | - Nishal K Pinna
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India
| | - Sharmila S Mande
- TCS Research, Tata Consultancy Services Ltd, Pune, 411013, India.
| |
Collapse
|
28
|
Breuninger TA, Wawro N, Freuer D, Reitmeier S, Artati A, Grallert H, Adamski J, Meisinger C, Peters A, Haller D, Linseisen J. Fecal Bile Acids and Neutral Sterols Are Associated with Latent Microbial Subgroups in the Human Gut. Metabolites 2022; 12:846. [PMID: 36144250 PMCID: PMC9504437 DOI: 10.3390/metabo12090846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/28/2022] Open
Abstract
Bile acids, neutral sterols, and the gut microbiome are intricately intertwined and each affects human health and metabolism. However, much is still unknown about this relationship. This analysis included 1280 participants of the KORA FF4 study. Fecal metabolites (primary and secondary bile acids, plant and animal sterols) were analyzed using a metabolomics approach. Dirichlet regression models were used to evaluate associations between the metabolites and twenty microbial subgroups that were previously identified using latent Dirichlet allocation. Significant associations were identified between 12 of 17 primary and secondary bile acids and several of the microbial subgroups. Three subgroups showed largely positive significant associations with bile acids, and six subgroups showed mostly inverse associations with fecal bile acids. We identified a trend where microbial subgroups that were previously associated with "healthy" factors were here inversely associated with fecal bile acid levels. Conversely, subgroups that were previously associated with "unhealthy" factors were positively associated with fecal bile acid levels. These results indicate that further research is necessary regarding bile acids and microbiota composition, particularly in relation to metabolic health.
Collapse
Affiliation(s)
- Taylor A. Breuninger
- Chair of Epidemiology, University Hospital Augsburg, University of Augsburg, Stenglinstr. 2, 86156 Augsburg, Germany
| | - Nina Wawro
- Chair of Epidemiology, University Hospital Augsburg, University of Augsburg, Stenglinstr. 2, 86156 Augsburg, Germany
- Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Institute of Epidemiology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Dennis Freuer
- Chair of Epidemiology, University Hospital Augsburg, University of Augsburg, Stenglinstr. 2, 86156 Augsburg, Germany
| | - Sandra Reitmeier
- Chair of Nutrition and Immunology, Technische Universität München, Gregor-Mendel-Str. 2, 85354 Freising, Germany
- ZIEL—Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Anna Artati
- Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Metabolomics and Proteomics Core, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Harald Grallert
- Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Institute of Epidemiology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597, Singapore
- Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia
| | - Christa Meisinger
- Chair of Epidemiology, University Hospital Augsburg, University of Augsburg, Stenglinstr. 2, 86156 Augsburg, Germany
| | - Annette Peters
- Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Institute of Epidemiology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Dirk Haller
- Chair of Nutrition and Immunology, Technische Universität München, Gregor-Mendel-Str. 2, 85354 Freising, Germany
- ZIEL—Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Jakob Linseisen
- Chair of Epidemiology, University Hospital Augsburg, University of Augsburg, Stenglinstr. 2, 86156 Augsburg, Germany
- ZIEL—Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
29
|
Ko YJ, Kim S, Pan CH, Park K. Identification of Functional Microbial Modules Through Network-Based Analysis of Meta-Microbial Features Using Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2851-2862. [PMID: 34329170 DOI: 10.1109/tcbb.2021.3100893] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As the microbiome is composed of a variety of microbial interactions, it is imperative in microbiome research to identify a microbial sub-community that collectively conducts a specific function. However, current methodologies have been highly limited to analyzing conditional abundance changes of individual microorganisms without considering group-wise collective microbial features. To overcome this limitation, we developed a network-based method using nonnegative matrix factorization (NMF) to identify functional meta-microbial features (MMFs) that, as a group, better discriminate specific environmental conditions of samples using microbiome data. As proof of concept, large-scale human microbiome data collected from different body sites were used to identify body site-specific MMFs by applying NMF. The statistical test for MMFs led us to identify highly discriminative MMFs on sample classes, called synergistic MMFs (SYMMFs). Finally, we constructed a SYMMF-based microbial interaction network (SYMMF-net) by integrating all of the SYMMF information. Network analysis revealed core microbial modules closely related to critical sample properties. Similar results were also found when the method was applied to various disease-associated microbiome data. The developed method interprets high-dimensional microbiome data by identifying functional microbial modules on sample properties and intuitively representing their systematic relationships via a microbial network.
Collapse
|
30
|
Dastani M, Atarodi A. Health Information Technology During the COVID-19 Epidemic: A Review via Text Mining. Online J Public Health Inform 2022; 14:e3. [PMID: 36120163 PMCID: PMC9473330 DOI: 10.5210/ojphi.v14i1.11090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Due to the prevalence of the COVID-19 epidemic in all countries of the world, the need to apply health information technology is of great importance. hence, the study has identified the role of health information technology during the period of the COVID-19 epidemic. Methods The present research is a review study by employing text mining techniques. Therefore, 941 published documents related to health information technology's role during the COVID-19 epidemic were extracted by keyword searching in the Web of Science database. In order to analyze the data and implement the text mining and topic modeling algorithms, Python programming language was applied. Results The results indicated that the highest number of publications related to the role of health information technology in the period of the COVID-19 epidemic was respectively on the following topics: "Models and smart systems," "Telemedicine," "Health care," "Health information technology," "Evidence-based medicine," "Big data and Statistic analysis." Conclusion Health information technology has been extensively used during the COVID-19 epidemic. Therefore, different communities can apply these technologies, considering the conditions and facilities to manage the COVID-19 epidemic better.
Collapse
Affiliation(s)
- Meisam Dastani
- Social Determinants of Health Research Center, Gonabad University of Medical Sciences, Gonabad, Iran
| | - Alireza Atarodi
- Department of Knowledge and Information Science, Paramedical College and Social Development & Health Promotion Research Center,, Gonabad University of Medical Sciences, Gonabad, Iran
| |
Collapse
|
31
|
Chen JS, Baxter SL. Applications of natural language processing in ophthalmology: present and future. Front Med (Lausanne) 2022; 9:906554. [PMID: 36004369 PMCID: PMC9393550 DOI: 10.3389/fmed.2022.906554] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 05/31/2022] [Indexed: 11/13/2022] Open
Abstract
Advances in technology, including novel ophthalmic imaging devices and adoption of the electronic health record (EHR), have resulted in significantly increased data available for both clinical use and research in ophthalmology. While artificial intelligence (AI) algorithms have the potential to utilize these data to transform clinical care, current applications of AI in ophthalmology have focused mostly on image-based deep learning. Unstructured free-text in the EHR represents a tremendous amount of underutilized data in big data analyses and predictive AI. Natural language processing (NLP) is a type of AI involved in processing human language that can be used to develop automated algorithms using these vast quantities of available text data. The purpose of this review was to introduce ophthalmologists to NLP by (1) reviewing current applications of NLP in ophthalmology and (2) exploring potential applications of NLP. We reviewed current literature published in Pubmed and Google Scholar for articles related to NLP and ophthalmology, and used ancestor search to expand our references. Overall, we found 19 published studies of NLP in ophthalmology. The majority of these publications (16) focused on extracting specific text such as visual acuity from free-text notes for the purposes of quantitative analysis. Other applications included: domain embedding, predictive modeling, and topic modeling. Future ophthalmic applications of NLP may also focus on developing search engines for data within free-text notes, cleaning notes, automated question-answering, and translating ophthalmology notes for other specialties or for patients, especially with a growing interest in open notes. As medicine becomes more data-oriented, NLP offers increasing opportunities to augment our ability to harness free-text data and drive innovations in healthcare delivery and treatment of ophthalmic conditions.
Collapse
Affiliation(s)
- Jimmy S. Chen
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, United States
- Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, United States
| | - Sally L. Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, United States
- Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, United States
| |
Collapse
|
32
|
Squires A, Clark-Cutaia M, Henderson MD, Arneson G, Resnik P. "Should I stay or should I go?" Nurses' perspectives about working during the Covid-19 pandemic's first wave in the United States: A summative content analysis combined with topic modeling. Int J Nurs Stud 2022; 131:104256. [PMID: 35544991 PMCID: PMC9020864 DOI: 10.1016/j.ijnurstu.2022.104256] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 04/04/2022] [Accepted: 04/07/2022] [Indexed: 01/13/2023]
Abstract
BACKGROUND The COVID-19 pandemic had its first peak in the United States between April and July of 2020, with incidence and prevalence rates of the virus the greatest in the northeastern coast of the country. At the time of study implementation, there were few studies capturing the perspectives of nurses working the frontlines of the pandemic in any setting as research output in the United States focused largely on treating the disease. OBJECTIVE The purpose of this study was to capture the perspectives of nurses in the United States working the frontlines of the COVID-19 pandemic's first wave. We were specifically interested in examining the impact of the pandemic on nurses' roles, professional relationships, and the organizational cultures of their employers. DESIGN We conducted an online qualitative study with a pragmatic design to capture the perspectives of nurses working during the first wave of the United States COVID-19 pandemic. Through social networking recruitment, frontline nurses from across the country were invited to participate. Participants provided long form, text-based responses to four questions designed to capture their experiences. A combination of Latent Dirichlet Allocation--a natural language processing technique--along with traditional summative content analysis techniques were used to analyze the data. SETTING The United States during the COVID-19 pandemic's first wave between May and July of 2020. RESULTS A total of 318 nurses participated from 29 out of 50 states, with 242 fully completing all questions. Findings suggested that the place of work mattered significantly in terms of the frontline working experience. It influenced role changes, risk assumption, interprofessional teamwork experiences, and ultimately, likelihood to leave their jobs or the profession altogether. Organizational culture and its influence on pandemic response implementation was a critical feature of their experiences. CONCLUSIONS Findings suggest that organizational performance during the pandemic may be reflected in nursing workforce retention as the risk for workforce attrition appears high. It was also clear from the reports that nurses appear to have assumed higher occupational risks during the pandemic when compared to other providers. The 2020 data from this study also offered a number of signals about potential threats to the stability and sustainability of the US nursing workforce that are now manifesting. The findings underscore the importance of conducting health workforce research during a crisis in order to discern the signals of future problems or for long-term crisis response. TWEETABLE ABSTRACT Healthcare leaders made the difference for nurses during the pandemic. How many nurses leave their employer in the next year will tell you who was good, who wasn't.
Collapse
Affiliation(s)
- Allison Squires
- Rory Meyers College of Nursing, New York University, 433 First Avenue, 6th Floor, New York, NY 10010, United States of America,Corresponding author
| | - Maya Clark-Cutaia
- Rory Meyers College of Nursing, New York University, 433 First Avenue, 6th Floor, New York, NY 10010, United States of America
| | - Marcus D. Henderson
- School of Nursing, Johns Hopkins University, Baltimore, MD, United States of America
| | - Gavin Arneson
- Rory Meyers College of Nursing, New York University, 433 First Avenue, 6th Floor, New York, NY 10010, United States of America
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, United States of America
| |
Collapse
|
33
|
Cifuentes J, Olarte F. A macro perspective of the perceptions of the education system via topic modelling analysis. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:1783-1820. [PMID: 35702681 PMCID: PMC9186274 DOI: 10.1007/s11042-022-13202-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/05/2022] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
Education quality has become an important issue and has received considerable attention around the world, especially due to its relevant repercussions on the socio-economical development of society. In recent years, many nations have realized the need for a highly skilled workforce to thrive in the emerging knowledge-based economy. They have consequently adopted strategies to identify the lines of action to improve the education quality. In response to the government's efforts to improve the education quality in Colombia, this study examines the current perceptions of the education system from the perspective of key local stakeholders. Therefore, we used a survey that contained open-ended questions to collect information about the limitations and difficulties of the education process for several groups of participants. The collected answers were categorized into a variety of topics using a Latent Dirichlet Allocation based model. Consequently, the students', teachers' and parents' answers were analyzed separately to obtain a general landscape of the perceptions of the education system. Evaluation metrics, such as topic coherence, were quantitatively analyzed to assess the modelling performance. In addition, a methodology for the hyper-parameters setting and the final topic labelling was presented. The results suggest that topic modelling strategies are a viable alternative to identify strategic lines of action and to obtain a macro-perspective of the perceptions of the education system.
Collapse
Affiliation(s)
- Jenny Cifuentes
- ICADE, Faculty of Economics and Business Administration; Department of Quantitative Methods, Universidad Pontificia Comillas, Calle Alberto Aguilera 23, Madrid, 28015 Spain
| | - Fredy Olarte
- Electrical and Electronics Engineering Department, Universidad Nacional de Colombia, Carrera 45 26-85, Bogotá, Colombia
| |
Collapse
|
34
|
Chen J, Williams M, Huang Y, Si S. Identifying Topics and Evolutionary Trends of Literature on Brain Metastases Using Latent Dirichlet Allocation. Front Mol Biosci 2022; 9:858577. [PMID: 35720132 PMCID: PMC9201447 DOI: 10.3389/fmolb.2022.858577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/01/2022] [Indexed: 11/13/2022] Open
Abstract
Research on brain metastases kept innovating. We aimed to illustrate what topics the research focused on and how it varied in different periods of all the studies on brain metastases with topic modelling. We used the latent Dirichlet allocation model to analyse the titles and abstracts of 50,176 articles on brain metastases retrieved from Web of Science, Embase and MEDLINE. We further stratified the articles to find out the topic trends of different periods. Our study identified that a rising number of studies on brain metastases were published in recent decades at a higher rate than all cancer articles. Overall, the major themes focused on treatment and histopathology. Radiotherapy took over the first and third places in the top 20 topics. Since the 2010's, increasing attention concerned about gene mutations. Targeted therapy was a popular topic of brain metastases research after 2020.
Collapse
Affiliation(s)
- Jiarong Chen
- Clinical Experimental Center, Jiangmen Key Laboratory of Clinical Biobanks and Translational Research, Jiangmen Central Hospital, Jiangmen, China
- Department of Oncology, Jiangmen Central Hospital, Jiangmen, China
- Computational Oncology Group, Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Matt Williams
- Computational Oncology Group, Department of Surgery and Cancer, Imperial College London, London, United Kingdom
- Department of Radiotherapy, Charing Cross Hospital, Imperial College Healthcare NHS Trust, London, United Kingdom
| | - Yanming Huang
- Clinical Experimental Center, Jiangmen Key Laboratory of Clinical Biobanks and Translational Research, Jiangmen Central Hospital, Jiangmen, China
| | - Shijing Si
- Duke University, Durham, NC, United States
| |
Collapse
|
35
|
Instruments and Tools to Identify Radical Textual Content. INFORMATION 2022. [DOI: 10.3390/info13040193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/10/2022] Open
Abstract
The Internet and social networks are increasingly becoming a media of extremist propaganda. On homepages, in forums or chats, extremists spread their ideologies and world views, which are often contrary to the basic liberal democratic values of the European Union. It is not uncommon that violence is used against those of different faiths, those who think differently, and members of social minorities. This paper presents a set of instruments and tools developed to help investigators to better address hybrid security threats, i.e., threats that combine physical and cyber attacks. These tools have been designed and developed to support security authorities in identifying extremist propaganda on the Internet and classifying it in terms of its degree of danger. This concerns both extremist content on freely accessible Internet pages and content in closed chats. We illustrate the functionalities of the tools through an example related to radicalisation detection; the data used here are just a few tweets, emails propaganda, and darknet posts. This work was supported by the EU granted PREVISION (Prediction and Visual Intelligence for Security Intelligence) project.
Collapse
|
36
|
Bhatnagar R, Sardar S, Beheshti M, Podichetty JT. How can natural language processing help model informed drug development?: a review. JAMIA Open 2022; 5:ooac043. [PMID: 35702625 PMCID: PMC9188322 DOI: 10.1093/jamiaopen/ooac043] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/28/2022] [Accepted: 05/26/2022] [Indexed: 01/20/2023] Open
Abstract
Objective To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement. Materials and Methods Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified. Results NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications. Discussion Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD. Conclusions This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.
Collapse
Affiliation(s)
- Roopal Bhatnagar
- Data Science, Data Collaboration Center, Critical Path Institute , Tucson, Arizona, USA
| | - Sakshi Sardar
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | - Maedeh Beheshti
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | | |
Collapse
|
37
|
Pancheva A, Wheadon H, Rogers S, Otto TD. Using topic modeling to detect cellular crosstalk in scRNA-seq. PLoS Comput Biol 2022; 18:e1009975. [PMID: 35395014 PMCID: PMC9064087 DOI: 10.1371/journal.pcbi.1009975] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 05/03/2022] [Accepted: 02/25/2022] [Indexed: 11/19/2022] Open
Abstract
Cell-cell interactions are vital for numerous biological processes including development, differentiation, and response to inflammation. Currently, most methods for studying interactions on scRNA-seq level are based on curated databases of ligands and receptors. While those methods are useful, they are limited to our current biological knowledge. Recent advances in single cell protocols have allowed for physically interacting cells to be captured, and as such we have the potential to study interactions in a complemantary way without relying on prior knowledge. We introduce a new method based on Latent Dirichlet Allocation (LDA) for detecting genes that change as a result of interaction. We apply our method to synthetic datasets to demonstrate its ability to detect genes that change in an interacting population compared to a reference population. Next, we apply our approach to two datasets of physically interacting cells to identify the genes that change as a result of interaction, examples include adhesion and co-stimulatory molecules which confirm physical interaction between cells. For each dataset we produce a ranking of genes that are changing in subpopulations of the interacting cells. In addition to the genes discussed in the original publications, we highlight further candidates for interaction in the top 100 and 300 ranked genes. Lastly, we apply our method to a dataset generated by a standard droplet-based protocol not designed to capture interacting cells, and discuss its suitability for analysing interactions. We present a method that streamlines detection of interactions and does not require prior clustering and generation of synthetic reference profiles to detect changes in expression.
Collapse
Affiliation(s)
- Alexandrina Pancheva
- Institute for Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| | - Helen Wheadon
- Institute of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, United Kingdom
| | - Thomas D. Otto
- Institute for Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
38
|
Ruiz Tejada Segura ML, Abou Moussa E, Garabello E, Nakahara TS, Makhlouf M, Mathew LS, Wang L, Valle F, Huang SSY, Mainland JD, Caselle M, Osella M, Lorenz S, Reisert J, Logan DW, Malnic B, Scialdone A, Saraiva LR. A 3D transcriptomics atlas of the mouse nose sheds light on the anatomical logic of smell. Cell Rep 2022; 38:110547. [PMID: 35320714 PMCID: PMC8995392 DOI: 10.1016/j.celrep.2022.110547] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 01/26/2022] [Accepted: 03/01/2022] [Indexed: 12/26/2022] Open
Abstract
The sense of smell helps us navigate the environment, but its molecular architecture and underlying logic remain understudied. The spatial location of odorant receptor genes (Olfrs) in the nose is thought to be independent of the structural diversity of the odorants they detect. Using spatial transcriptomics, we create a genome-wide 3D atlas of the mouse olfactory mucosa (OM). Topographic maps of genes differentially expressed in space reveal that both Olfrs and non-Olfrs are distributed in a continuous and overlapping fashion over at least five broad zones in the OM. The spatial locations of Olfrs correlate with the mucus solubility of the odorants they recognize, providing direct evidence for the chromatographic theory of olfaction. This resource resolves the molecular architecture of the mouse OM and will inform future studies on mechanisms underlying Olfr gene choice, axonal pathfinding, patterning of the nervous system, and basic logic for the peripheral representation of smell.
Collapse
Affiliation(s)
- Mayra L Ruiz Tejada Segura
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München, Feodor-Lynen-Strasse 21, 81377 München, Germany; Institute of Functional Epigenetics, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | | | - Elisa Garabello
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München, Feodor-Lynen-Strasse 21, 81377 München, Germany; Physics Department, University of Turin and INFN, Via P. Giuria 1, 10125 Turin, Italy; Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Thiago S Nakahara
- Department of Biochemistry, University of São Paulo, São Paulo, Brazil
| | | | | | - Li Wang
- Sidra Medicine, P.O. Box 26999, Doha, Qatar
| | - Filippo Valle
- Physics Department, University of Turin and INFN, Via P. Giuria 1, 10125 Turin, Italy
| | | | - Joel D Mainland
- Monell Chemical Senses Center, 3500 Market Street, Philadelphia, PA 19104, USA; Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michele Caselle
- Physics Department, University of Turin and INFN, Via P. Giuria 1, 10125 Turin, Italy
| | - Matteo Osella
- Physics Department, University of Turin and INFN, Via P. Giuria 1, 10125 Turin, Italy
| | - Stephan Lorenz
- Sidra Medicine, P.O. Box 26999, Doha, Qatar; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johannes Reisert
- Monell Chemical Senses Center, 3500 Market Street, Philadelphia, PA 19104, USA
| | - Darren W Logan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bettina Malnic
- Department of Biochemistry, University of São Paulo, São Paulo, Brazil
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München, Feodor-Lynen-Strasse 21, 81377 München, Germany; Institute of Functional Epigenetics, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.
| | - Luis R Saraiva
- Sidra Medicine, P.O. Box 26999, Doha, Qatar; Monell Chemical Senses Center, 3500 Market Street, Philadelphia, PA 19104, USA; College of Health and Life Sciences, Hamad Bin Khalifa University, P.O. Box 34110, Doha, Qatar.
| |
Collapse
|
39
|
Restrepo S, ter Horst E, Zambrano JD, Gunn LH, Molina G, Salazar CA. Hierarchical Bayesian classification methods to identify topics by journal quartile with an application in biological sciences. EDUCATION FOR INFORMATION 2022. [DOI: 10.3233/efi-211546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This manuscript builds on a novel, automatic, freely-available Bayesian approach to extract information in abstracts and titles to classify research topics by quartile. This approach is demonstrated for all N= 149,129 ISI-indexed publications in biological sciences journals during 2017. A Bayesian multinomial inverse regression approach is used to extract rankings of topics without the need of a pre-defined dictionary. Bigrams are used for extraction of research topics across manuscripts, and rankings of research topics are constructed by quartile. Worldwide and local results (e.g., comparison between two peer/aspirational research institutions in Colombia) are provided, and differences are explored both at the global and local levels. Some topics persist across quartiles, while the relevance of others is quartile-specific. Challenges in sustainable development appear as more prevalent in top quartile journals across institutions, while the two Colombian institutions favour plant and microorganism research. This approach can reduce information inequities, by allowing young/incipient researchers in biological sciences, especially within lower income countries or universities with limited resources, to freely assess the state of the literature and the relative likelihood of publication in higher impact journals by research topic. This can also serve institutions of higher education to identify missing research topics and areas of competitive advantage.
Collapse
Affiliation(s)
| | | | | | - Laura H. Gunn
- University of North Carolina at Charlotte & Imperial College London, USA
| | | | | |
Collapse
|
40
|
Ameli N, Gibson MP, Khanna A, Howey M, Lai H. An Application of Machine Learning Techniques to Analyze Patient Information to Improve Oral Health Outcomes. FRONTIERS IN DENTAL MEDICINE 2022. [DOI: 10.3389/fdmed.2022.833191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
ObjectiveVarious health-related fields have applied Machine learning (ML) techniques such as text mining, topic modeling (TM), and artificial neural networks (ANN) to automate tasks otherwise completed by humans to enhance patient care. However, research in dentistry on the integration of these techniques into the clinic arena has yet to exist. Thus, the purpose of this study was to: introduce a method of automating the reviewing patient chart information using ML, provide a step-by-step description of how it was conducted, and demonstrate this method's potential to identify predictive relationships between patient chart information and important oral health-related contributors.MethodsA secondary data analysis was conducted to demonstrate the approach on a set of anonymized patient charts collected from a dental clinic. Two ML applications for patient chart review were demonstrated: (1) text mining and Latent Dirichlet Allocation (LDA) were used to preprocess, model, and cluster data in a narrative format and extract common topics for further analysis, (2) Ordinal logistic regression (OLR) and ANN were used to determine predictive relationships between the extracted patient chart data topics and oral health-related contributors. All analysis was conducted in R and SPSS (IBM, SPSS, statistics 22).ResultsData from 785 patient charts were analyzed. Preprocessing of raw data (data cleaning and categorizing) identified 66 variables, of which 45 were included for analysis. Using LDA, 10 radiographic findings topics and 8 treatment planning topics were extracted from the data. OLR showed that caries risk, occlusal risk, biomechanical risk, gingival recession, periodontitis, gingivitis, assisted mouth opening, and muscle tenderness were highly predictable using the extracted radiographic and treatment planning topics and chart information. Using the statistically significant predictors obtained from OLR, ANN analysis showed that the model can correctly predict >72% of all variables except for bruxism and tooth crowding (63.1 and 68.9%, respectively).ConclusionOur study presents a novel approach to address the need for data-enabled innovations in the field of dentistry and creates new areas of research in dental analytics. Utilizing ML methods and its application in dental practice has the potential to improve clinicians' and patients' understanding of the major factors that contribute to oral health diseases/conditions.
Collapse
|
41
|
Valle F, Osella M, Caselle M. Multiomics Topic Modeling for Breast Cancer Classification. Cancers (Basel) 2022; 14:1150. [PMID: 35267458 PMCID: PMC8909787 DOI: 10.3390/cancers14051150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 02/18/2022] [Indexed: 12/04/2022] Open
Abstract
The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
Collapse
Affiliation(s)
- Filippo Valle
- Physics Department, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy; (M.O.); (M.C.)
| | | | | |
Collapse
|
42
|
State of Industry 5.0—Analysis and Identification of Current Research Trends. APPLIED SYSTEM INNOVATION 2022. [DOI: 10.3390/asi5010027] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The term Industry 4.0, coined to be the fourth industrial revolution, refers to a higher level of automation for operational productivity and efficiency by connecting virtual and physical worlds in an industry. With Industry 4.0 being unable to address and meet increased drive of personalization, the term Industry 5.0 was coined for addressing personalized manufacturing and empowering humans in manufacturing processes. The onset of the term Industry 5.0 is observed to have various views of how it is defined and what constitutes the reconciliation between humans and machines. This serves as the motivation of this paper in identifying and analyzing the various themes and research trends of what Industry 5.0 is using text mining tools and techniques. Toward this, the abstracts of 196 published papers based on the keyword “Industry 5.0” search in IEEE, science direct and MDPI data bases were extracted. Data cleaning and preprocessing were performed for further analysis to apply text mining techniques of key terms extraction and frequency analysis. Further topic mining i.e., unsupervised machine learning method was used for exploring the data. It is observed that the terms artificial intelligence (AI), big data, supply chain, digital transformation, machine learning, internet of things (IoT), are among the most often used and among several enablers that have been identified by researchers to drive Industry 5.0. Five major themes of Industry 5.0 addressing, supply chain evaluation and optimization, enterprise innovation and digitization, smart and sustainable manufacturing, transformation driven by IoT, AI, and Big Data, and Human-machine connectivity were classified among the published literature, highlighting the research themes that can be further explored. It is observed that the theme of Industry 5.0 as a gateway towards human machine connectivity and co-existence is gaining more interest among the research community in the recent years.
Collapse
|
43
|
Abstract
Plastic pollution is one of the most significant environmental issues in the world. The rapid increase of the cumulative amount of plastic waste has caused alarm, and the public have called for actions to mitigate its impacts on the environment. Numerous governments and social activists from various non-profit organisations have set up policies and actively promoted awareness and have engaged the public in discussions on this issue. Nevertheless, social responsibility is the key to a sustainable environment, and individuals are accountable for performing their civic duty and commit to behavioural changes that can reduce the use of plastics. This paper explores a set of topic modelling techniques to assist policymakers and environment communities in understanding public opinions about the issues related to plastic pollution by analysing social media data. We report on an experiment in which a total of 274,404 tweets were collected from Twitter that are related to plastic pollution, and five topic modelling techniques, including (a) Latent Dirichlet Allocation (LDA), (b) Hierarchical Dirichlet Process (HDP), (c) Latent Semantic Indexing (LSI), (d) Non-Negative Matrix Factorisation (NMF), and (e) extension of LDA—Structural Topic Model (STM), were applied to the data to identify popular topics of online conversations, considering topic coherence, topic prevalence, and topic correlation. Our experimental results show that some of these topic modelling techniques are effective in detecting and identifying important topics surrounding plastic pollution, and potentially different techniques can be combined to develop an efficient system for mining important environment-related topics from social media data on a large scale.
Collapse
|
44
|
Hsu DF, LaFleur MT, Orazbek I. Improving SDG Classification Precision Using Combinatorial Fusion. SENSORS (BASEL, SWITZERLAND) 2022; 22:1067. [PMID: 35161807 PMCID: PMC8838763 DOI: 10.3390/s22031067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/27/2022] [Accepted: 01/27/2022] [Indexed: 02/05/2023]
Abstract
Combinatorial fusion algorithm (CFA) is a machine learning and artificial intelligence (ML/AI) framework for combining multiple scoring systems using the rank-score characteristic (RSC) function and cognitive diversity (CD). When measuring the relevance of a publication or document with respect to the 17 Sustainable Development Goals (SDGs) of the United Nations, a classification scheme is used. However, this classification process is a challenging task due to the overlapping goals and contextual differences of those diverse SDGs. In this paper, we use CFA to combine a topic model classifier (Model A) and a semantic link classifier (Model B) to improve the precision of the classification process. We characterize and analyze each of the individual models using the RSC function and CD between Models A and B. We evaluate the classification results from combining the models using a score combination and a rank combination, when compared to the results obtained from human experts. In summary, we demonstrate that the combination of Models A and B can improve classification precision only if these individual models perform well and are diverse.
Collapse
Affiliation(s)
- D. Frank Hsu
- Laboratory of Informatics and Data Mining, Department of Computer and Information Science, Fordham University, New York, NY 10023, USA;
| | - Marcelo T. LaFleur
- Department of Economic and Social Affairs, United Nations, New York, NY 10017, USA
| | - Ilyas Orazbek
- Laboratory of Informatics and Data Mining, Department of Computer and Information Science, Fordham University, New York, NY 10023, USA;
| |
Collapse
|
45
|
Jeevanandam J, Agyei D, Danquah MK, Udenigwe C. Food quality monitoring through bioinformatics and big data. FUTURE FOODS 2022. [DOI: 10.1016/b978-0-323-91001-9.00036-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
46
|
Tizzani M, Muñoz-Gómez V, De Nardi M, Paolotti D, Muñoz O, Ceschi P, Viltrop A, Capua I. Integrating digital and field surveillance as complementary efforts to manage epidemic diseases of livestock: African swine fever as a case study. PLoS One 2022; 16:e0252972. [PMID: 34972117 PMCID: PMC8719698 DOI: 10.1371/journal.pone.0252972] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 12/13/2021] [Indexed: 11/18/2022] Open
Abstract
SARS-CoV-2 has clearly shown that efficient management of infectious diseases requires a top-down approach which must be complemented with a bottom-up response to be effective. Here we investigate a novel approach to surveillance for transboundary animal diseases using African Swine (ASF) fever as a model. We collected data both at a population level and at the local level on information-seeking behavior respectively through digital data and targeted questionnaire-based surveys to relevant stakeholders such as pig farmers and veterinary authorities. Our study shows how information-seeking behavior and resulting public attention during an epidemic, can be identified through novel data streams from digital platforms such as Wikipedia. Leveraging attention in a critical moment can be key to providing the correct information at the right moment, especially to an interested cohort of people. We also bring evidence on how field surveys aimed at local workers and veterinary authorities remain a crucial tool to assess more in-depth preparedness and awareness among front-line actors. We conclude that these two tools should be used in combination to maximize the outcome of surveillance and prevention activities for selected transboundary animal diseases such as ASF.
Collapse
Affiliation(s)
- Michele Tizzani
- Institute for Scientific Interchange Foundation, Torino, Italy
| | - Violeta Muñoz-Gómez
- SAFOSO, Liebefeld, Switzerland.,Section of Epidemiology, Vetsuisse Faculty, University of Zurich, Zurich, Switzerland
| | | | | | - Olga Muñoz
- One Health Centre of Excellence, Gainesville, Florida, Unites States of America.,Department of Environmental and Global Health, College of Public Health and Health Professionals, Gainesville, Florida, United States of America
| | - Piera Ceschi
- Department of Environmental and Global Health, College of Public Health and Health Professionals, Gainesville, Florida, United States of America
| | - Arvo Viltrop
- Estonian University of Life Sciences, Tartu, Estonia
| | - Ilaria Capua
- Department of Environmental and Global Health, College of Public Health and Health Professionals, Gainesville, Florida, United States of America
| |
Collapse
|
47
|
Tewari S, Toledo Margalef P, Kareem A, Abdul-Hussein A, White M, Wazana A, Davidge ST, Delrieux C, Connor KL. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. J Pers Med 2021; 11:jpm11111064. [PMID: 34834416 PMCID: PMC8621659 DOI: 10.3390/jpm11111064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 10/01/2021] [Accepted: 10/18/2021] [Indexed: 01/03/2023] Open
Abstract
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Collapse
Affiliation(s)
- Shrankhala Tewari
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Pablo Toledo Margalef
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
| | - Ayesha Kareem
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ayah Abdul-Hussein
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Marina White
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
| | - Ashley Wazana
- Department of Psychiatry, McGill University, Montreal, QC H3A 0G4, Canada;
| | - Sandra T. Davidge
- Women and Children’s Health Research Institute, University of Alberta, Edmonton, AB T6G 1C9, Canada;
| | - Claudio Delrieux
- CONICET, National Science and Technology Council of Argentina, Buenos Aires C1425FQD, Argentina; (P.T.M.); (C.D.)
- DIEC—Electric and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca B8000, Argentina
| | - Kristin L. Connor
- Health Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada; (S.T.); (A.K.); (A.A.-H.); (M.W.)
- Correspondence:
| |
Collapse
|
48
|
Topic Modeling for Amharic User Generated Texts. INFORMATION 2021. [DOI: 10.3390/info12100401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.
Collapse
|
49
|
Yao Z, Yang J, Liu J, Keith M, Guan C. Comparing tweet sentiments in megacities using machine learning techniques: In the midst of COVID-19. CITIES (LONDON, ENGLAND) 2021; 116:103273. [PMID: 36540864 PMCID: PMC9756302 DOI: 10.1016/j.cities.2021.103273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 03/08/2021] [Accepted: 05/20/2021] [Indexed: 05/07/2023]
Abstract
COVID-19 was announced by the World Health Organization as a pandemic on March 11, 2020. Not only has COVID-19 struck the economy and public health, but it also has deep influences on people's feelings. Twitter, as an active social media, is a great database where we can investigate people's sentiments during this pandemic. By conducting sentiment analysis on Tweets using advanced machine learning techniques, this study aims to investigate how public sentiments respond to the pandemic from March 2 to May 21, 2020 in New York City, Los Angeles, London, and another six global mega-cities. Results showed that across cities, negative and positive Tweet sentiment clustered around mid-March and early May, respectively. Furthermore, positive sentiments of Tweets from New York City and London were positively correlated with stricter quarantine measures, although this correlation was not significant in Los Angeles. Meanwhile, Tweet sentiments of all three cities did not exhibit a strong correlation with new cases and hospitalization. Last but not least, we provide a qualitative analysis of the reasons behind differences in correlations shown above, along with a discussion of the polarizing effect of public policies on Tweet sentiments. Thus, the results of this study imply that Tweet sentiment is more sensitive to quarantine orders than reported statistics of COVID-19, especially in populous megacities where public transportation is heavily relied upon, which calls for prompt and effective quarantine measures during contagious disease outbreaks.
Collapse
Affiliation(s)
- Zhirui Yao
- Arts and Science, New York University Shanghai, China
- Key Laboratory of National Forestry and Grassland Administration on Ecological Landscaping of Challenging Urban Sites, Shanghai Engineering Research Center of Landscaping on Challenging Urban Sites, Shanghai Academy of Landscape Architecture Science and Planning, China
| | - Junyan Yang
- Department of Urban Planning, Southeast University, China
| | - Jialin Liu
- Key Laboratory of National Forestry and Grassland Administration on Ecological Landscaping of Challenging Urban Sites, Shanghai Engineering Research Center of Landscaping on Challenging Urban Sites, Shanghai Academy of Landscape Architecture Science and Planning, China
| | - Michael Keith
- PEAK Urban Programme, University of Oxford, United Kingdom
| | - ChengHe Guan
- Arts and Science, New York University Shanghai, China
- Shanghai Key Laboratory of Urban Renewal and Spatial Optimization Technology, Tongji University, China
| |
Collapse
|
50
|
Jamil R, Ashraf I, Rustam F, Saad E, Mehmood A, Choi GS. Detecting sarcasm in multi-domain datasets using convolutional neural networks and long short term memory network model. PeerJ Comput Sci 2021; 7:e645. [PMID: 34541306 PMCID: PMC8409330 DOI: 10.7717/peerj-cs.645] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/25/2021] [Indexed: 06/13/2023]
Abstract
Sarcasm emerges as a common phenomenon across social networking sites because people express their negative thoughts, hatred and opinions using positive vocabulary which makes it a challenging task to detect sarcasm. Although various studies have investigated the sarcasm detection on baseline datasets, this work is the first to detect sarcasm from a multi-domain dataset that is constructed by combining Twitter and News Headlines datasets. This study proposes a hybrid approach where the convolutional neural networks (CNN) are used for feature extraction while the long short-term memory (LSTM) is trained and tested on those features. For performance analysis, several machine learning algorithms such as random forest, support vector classifier, extra tree classifier and decision tree are used. The performance of both the proposed model and machine learning algorithms is analyzed using the term frequency-inverse document frequency, bag of words approach, and global vectors for word representations. Experimental results indicate that the proposed model surpasses the performance of the traditional machine learning algorithms with an accuracy of 91.60%. Several state-of-the-art approaches for sarcasm detection are compared with the proposed model and results suggest that the proposed model outperforms these approaches concerning the precision, recall and F1 scores. The proposed model is accurate, robust, and performs sarcasm detection on a multi-domain dataset.
Collapse
Affiliation(s)
- Ramish Jamil
- Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| | - Furqan Rustam
- Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Eysha Saad
- Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Arif Mehmood
- The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| |
Collapse
|