1
|
Le-Khac UN, Bolton M, Boxall NJ, Wallace SMN, George Y. Living review framework for better policy design and management of hazardous waste in Australia. Sci Total Environ 2024; 924:171556. [PMID: 38458450 DOI: 10.1016/j.scitotenv.2024.171556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/25/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
The significant increase in hazardous waste generation in Australia has led to the discussion over the incorporation of artificial intelligence into the hazardous waste management system. Recent studies explored the potential applications of artificial intelligence in various processes of managing waste. However, no study has examined the use of text mining in the hazardous waste management sector for the purpose of informing policymakers. This study developed a living review framework which applied supervised text classification and text mining techniques to extract knowledge using the domain literature data between 2022 and 2023. The framework employed statistical classification models trained using iterative training and the best model XGBoost achieved an F1 score of 0.87. Using a small set of 126 manually labelled global articles, XGBoost automatically predicted the labels of 678 Australian articles with high confidence. Then, keyword extraction and unsupervised topic modelling with Latent Dirichlet Allocation (LDA) were performed. Results indicated that there were 2 main research themes in Australian literature: (1) the key waste streams and (2) the resource recovery and recycling of waste. The implication of this framework would benefit the policymakers, researchers, and hazardous waste management organisations by serving as a real time guideline of the current key waste streams and research themes in the literature which allow robust knowledge to be applied to waste management and highlight where the gap in research remains.
Collapse
Affiliation(s)
- Uyen N Le-Khac
- Data Science and AI Department, Faculty of Information Technology, Monash University, Australia.
| | - Mitzi Bolton
- Monash Sustainable Development Institute, Monash University, Australia
| | - Naomi J Boxall
- Environment, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
| | - Stephanie M N Wallace
- Centre for Anthropogenic Pollution Impact and Management (CAPIM), School of BioSciences, University of Melbourne, Australia
| | - Yasmeen George
- Data Science and AI Department, Faculty of Information Technology, Monash University, Australia
| |
Collapse
|
2
|
Novoa J, Fernandez-Dumont A, Mills ENC, Moreno FJ, Pazos F. Advancing the allergenicity assessment of new proteins using a text mining resource. Food Chem Toxicol 2024; 187:114638. [PMID: 38582341 DOI: 10.1016/j.fct.2024.114638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/11/2024] [Accepted: 03/31/2024] [Indexed: 04/08/2024]
Abstract
With a society increasingly demanding alternative protein food sources, new strategies for evaluating protein safety issues, such as allergenic potential, are needed. Large-scale and systemic studies on allergenic proteins are hindered by the limited and non-harmonized clinical information available for these substances in dedicated databases. A missing key information is that representing the symptomatology of the allergens, especially given in terms of standard vocabularies, that would allow connecting with other biomedical resources to carry out different studies related to human health. In this work, we have generated the first resource with a comprehensive annotation of allergens' symptomatology, using a text-mining approach that extracts significant co-mentions between these entities from the scientific literature (PubMed, ∼36 million abstracts). The method identifies statistically significant co-mentions between the textual descriptions of the two types of entities in the literature as indication of relationship. 1,180 clinical signs extracted from the Human Phenotype Ontology, the Medical Subject Heading terms of PubMed together with other allergen-specific symptoms, were linked to 1,036 unique allergens annotated in two main allergen-related public databases via 14,009 relationships. This novel resource, publicly available through an interactive web interface, could serve as a starting point for future manually curated compilation of allergen symptomatology.
Collapse
Affiliation(s)
- Jorge Novoa
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), 28049, Madrid, Spain
| | | | - E N Clare Mills
- School of Biosciences and Medicine, The University of Surrey, Guildford, GU2 7XH, UK
| | - F Javier Moreno
- Instituto de Investigación en Ciencias de La Alimentación (CIAL), CSIC-UAM, CEI (UAM+CSIC), 28049, Madrid, Spain.
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), 28049, Madrid, Spain.
| |
Collapse
|
3
|
Rabin AS, Weinstein JB, Seelye SM, Whittington TN, Hogan CK, Prescott HC. Development and validation of a pulmonary function test data extraction tool for the US department of veterans affairs electronic health record. BMC Res Notes 2024; 17:115. [PMID: 38654333 DOI: 10.1186/s13104-024-06770-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
OBJECTIVE Pulmonary function test (PFT) results are recorded variably across hospitals in the Department of Veterans Affairs (VA) electronic health record (EHR), using both unstructured and semi-structured notes. We developed and validated a hospital-specific code to extract pre-bronchodilator measures of obstruction (ratio of forced expiratory volume in one second [FEV1] to forced vital capacity [FVC]) and severity of obstruction (percent predicted of FEV1). RESULTS Among 36 VA facilities with the most PFTs completed between 2018 and 2022 from a parent cohort of veterans receiving long-acting controller inhalers, 12 had a consistent syntactical convention or template for reporting PFT data in the EHR. Of the 42,718 PFTs identified from these 12 facilities, the hospital-specific text processing pipeline yielded 24,860 values for the FEV1:FVC ratio and 23,729 values for FEV1. A ratio of FEV1:FVC less than 0.7 was identified in 17,615 of 24,922 studies (70.7%); 8864 of 24,922 (35.6%) had a severe or very severe reduction in FEV1 (< 50% of the predicted value). Among 100 randomly selected PFT reports reviewed by two pulmonary physicians, the coding solution correctly identified the presence of obstruction in 99 out of 100 studies and the degree of obstruction in 96 out of 100 studies.
Collapse
Affiliation(s)
- Alexander S Rabin
- Pulmonary Section, Veterans Affairs Ann Arbor Healthcare System, 2215 Fuller Road, 48105, Ann Arbor, MI, USA.
- Division of Pulmonary and Critical Care Medicine, University of Michigan, Ann Arbor, MI, USA.
| | - Julien B Weinstein
- Division of Pulmonary and Critical Care Medicine, University of Michigan, Ann Arbor, MI, USA
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, MI, USA
| | - Sarah M Seelye
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, MI, USA
| | | | - Cainnear K Hogan
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, MI, USA
| | - Hallie C Prescott
- Pulmonary Section, Veterans Affairs Ann Arbor Healthcare System, 2215 Fuller Road, 48105, Ann Arbor, MI, USA
- Division of Pulmonary and Critical Care Medicine, University of Michigan, Ann Arbor, MI, USA
- Veterans Affairs Center for Clinical Management Research, Ann Arbor, MI, USA
| |
Collapse
|
4
|
Liu J, Wu H, Robertson DH, Zhang J. Text mining and portal development for gene-specific publications on Alzheimer's disease and other neurodegenerative diseases. BMC Med Inform Decis Mak 2024; 24:98. [PMID: 38632621 PMCID: PMC11025191 DOI: 10.1186/s12911-024-02501-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Tremendous research efforts have been made in the Alzheimer's disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. METHODS In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. RESULTS By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. CONCLUSION The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users' interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.
Collapse
Affiliation(s)
- Jiannan Liu
- Department of BioHealth Informatics, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
| | - Huanmei Wu
- Department of BioHealth Informatics, Indiana University School of Informatics & Computing, Indianapolis, IN, 46202, USA
- Health Services Administration & Policy, Temple University College of Public Health, Philadelphia, PA, 19122, USA
| | - Daniel H Robertson
- Integrated Data Sciences, Indiana Biosciences Research Institute, Indianapolis, IN, 46202, USA
| | - Jie Zhang
- Dept of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| |
Collapse
|
5
|
Macanovic A, Przepiorka W. A systematic evaluation of text mining methods for short texts: Mapping individuals' internal states from online posts. Behav Res Methods 2024:10.3758/s13428-024-02381-9. [PMID: 38575776 DOI: 10.3758/s13428-024-02381-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2024] [Indexed: 04/06/2024]
Abstract
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals' internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders' evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.
Collapse
Affiliation(s)
- Ana Macanovic
- Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands.
| | - Wojtek Przepiorka
- Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
6
|
Zhang Y, Peng J, Cheng B, Liu Y, Jiang C. MMR: A Multi-view Merge Representation model for Chemical-Disease relation extraction. Comput Biol Chem 2024; 110:108063. [PMID: 38613989 DOI: 10.1016/j.compbiolchem.2024.108063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 03/13/2024] [Accepted: 03/25/2024] [Indexed: 04/15/2024]
Abstract
Chemical-Disease relation (CDR) extraction aims to identify the semantic relations between chemical and disease entities in the unstructured biomedical document, which provides a basis for downstream tasks such as clinical medical diagnosis and drug discovery. Compared with general domain relation extraction, it needs a more effective representation of the whole document due to the specialized nature of texts in the biomedical domain, including the biomedical entity and entity-pair representation. In this paper, we propose a novel Multi-view Merge Representation (MMR) model to thoroughly capture entity and entity-pair representation of the document. First, we utilize prior knowledge and a pre-trained transformer encoder to capture entity semantic representation. Then we employ the U-Net layer and Graph Convolution Network layer to capture global entity-pair representation. Finally, we get a specific merged representation for each entity pair to be classified. We evaluate our model on the CDR dataset published by the BioCreative-V community and achieve a state-of-the-art result.
Collapse
Affiliation(s)
- Yi Zhang
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Jing Peng
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Baitai Cheng
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Yang Liu
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| | - Chi Jiang
- Intelligent Bioinformatics Laboratory, School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China.
| |
Collapse
|
7
|
Das Baksi K, Pokhrel V, Pudavar AE, Mande SS, Kuntal BK. BactInt: A domain driven transfer learning approach for extracting inter-bacterial associations from biomedical text. Comput Biol Chem 2024; 109:108012. [PMID: 38198963 DOI: 10.1016/j.compbiolchem.2023.108012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/15/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
BACKGROUND The healthy as well as dysbiotic state of an ecosystem like human body is known to be influenced not only by the presence of the bacterial groups in it, but also with respect to the associations within themselves. Evidence reported in biomedical text serves as a reliable source for identifying and ascertaining such inter bacterial associations. However, the complexity of the reported text as well as the ever-increasing volume of information necessitates development of methods for automated and accurate extraction of such knowledge. METHODS A BioBERT (biomedical domain specific language model) based information extraction model for bacterial associations is presented that utilizes learning patterns from other publicly available datasets. Additionally, a specialized sentence corpus has been developed to significantly improve the prediction accuracy of the 'transfer learned' model using a fine-tuning approach. RESULTS The final model was seen to outperform all other variations (non-transfer learned and non-fine-tuned models) as well as models trained on BioGPT (a domain trained Generative Pre-trained Transformer). To further demonstrate the utility, a case study was performed using bacterial association network data obtained from experimental studies. CONCLUSION This study attempts to demonstrate the applicability of transfer learning in a niche field of life sciences where understanding of inter bacterial relationships is crucial to obtain meaningful insights in comprehending microbial community structures across different ecosystems. The study further discusses how such a model can be further improved by fine tuning using limited training data. The results presented and the datasets made available are expected to be a valuable addition in the field of medical informatics and bioinformatics.
Collapse
Affiliation(s)
| | - Vatsala Pokhrel
- TCS Research, Tata Consultancy Services Ltd, Pune 411057, India
| | | | | | - Bhusan K Kuntal
- TCS Research, Tata Consultancy Services Ltd, Pune 411057, India.
| |
Collapse
|
8
|
Wei Z, Zhang S. A structured sentiment analysis dataset based on public comments from various domains. Data Brief 2024; 53:110232. [PMID: 38439992 PMCID: PMC10910210 DOI: 10.1016/j.dib.2024.110232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/12/2024] [Accepted: 02/16/2024] [Indexed: 03/06/2024] Open
Abstract
A structured sentiment analysis dataset, derived from social media comments, is introduced in this paper. The dataset spans 22 diverse domains and comprises over 200,000 reviews, providing a rich resource for sentiment analysis tasks in the Chinese language context. Each comment within the dataset has been manually annotated with a sentiment label, either positive, negative, or neutral, and grouped by topic. This meticulous annotation process ensures the dataset's reliability for training, validating, and testing sentiment analysis models. The construction of the dataset involved a three-step process. Initially, data was collected from the topics that garnered high attention and discussion rates, thereby reflecting the authentic opinions of users. Following data collection, preprocessing was undertaken to remove extraneous elements, while preserving emoticons that are crucial for sentiment analysis. The final step involved manual annotation by researchers, who assigned sentiment labels to each comment based on various factors. The dataset stands as a valuable contribution to the field of natural language processing, particularly for sentiment analysis tasks in the Chinese language context.
Collapse
Affiliation(s)
- Zhongliang Wei
- School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan, China
| | - Shunxiang Zhang
- School of Computer Science and Engineering, Anhui University of Science & Technology, Huainan, China
| |
Collapse
|
9
|
Paul J, Jacob J, Mahmud M, Vaka M, Krishnan SG, Arifutzzaman A, Thesiya D, Xiong T, Kadirgama K, Selvaraj J. A data mining approach to analyze the role of biomacromolecules-based nanocomposites in sustainable packaging. Int J Biol Macromol 2024; 265:130850. [PMID: 38492706 DOI: 10.1016/j.ijbiomac.2024.130850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/09/2024] [Accepted: 03/11/2024] [Indexed: 03/18/2024]
Abstract
Recent decades have witnessed a surge in research interest in bio-nanocomposite-based packaging materials, but still, a lack of systematic analysis exists in this domain. Bio-based packaging materials pose a sustainable alternative to petroleum-based packaging materials. The current work employs bibliometric analysis to deliver a comprehensive outline on the role of bio nanocomposites in packaging. India, Iran, and China were revealed to be the top three nations actively engaged in this domain in total publications. Islamic Azad University in Iran and Universiti Putra Malaysia in Malaysia are among the world's best institutions in active research and publications in this field. The extensive collaboration between nations and institutions highlights the significance of a holistic approach towards bio-nanocomposite. The National Natural Science Foundation of China is the leading funding body in this field of research. Among authors, Jong whan Rhim secured the topmost citations (2234) in this domain (13 publications). Among journals, Carbohydrate Polymers secured the maximum citation count (4629) from 36 articles; the initial one was published in 2011. Bio nanocomposite is the most frequently used keyword. Researchers and policymakers focussing on sustainable packaging solutions will gain crucial insights on the current research status on packaging solutions using bio-nanocomposites from the conclusions.
Collapse
Affiliation(s)
- John Paul
- Faculty of Mechanical & Automotive Engineering Technology, University Malaysia Pahang Al-Sultan Abdullah, Malaysia.
| | - Jeeja Jacob
- Higher Institution Centre of Excellence, UM Power Energy Dedicated Advanced Centre (UMPEDAC), University of Malaya, Kuala Lumpur, Malaysia.
| | - Md Mahmud
- Phillip M. Drayer Department of Electrical and Computer Engineering, College of Engineering, Lamar University, Beaumont, TX 77710, USA
| | - Mahesh Vaka
- Thermal Energy Storage department, Iberian Energy Storage Research Center (CIIAE), 10003 Caceres, Spain
| | - Syam G Krishnan
- Department of Chemical Engineering, Faculty of Engineering and Information Technology, The University of Melbourne, Victoria 3010, Australia
| | - A Arifutzzaman
- Tyndall National Institute, University College Cork, Lee Maltings, Cork T12 R5CP, Ireland
| | | | - Teng Xiong
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, Singapore 117566, Singapore
| | - K Kadirgama
- Faculty of Mechanical & Automotive Engineering Technology, University Malaysia Pahang Al-Sultan Abdullah, Malaysia; Department of Civil Engineering, College of Engineering, Almaaqal University, Iraq.
| | - Jeyraj Selvaraj
- Higher Institution Centre of Excellence, UM Power Energy Dedicated Advanced Centre (UMPEDAC), University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
10
|
Christensen RVK, Bentsen NS. Discourse developments within the public agenda on Danish nature management 2016-2021: Animal welfare ethics as a barrier to rewilding projects. Ambio 2024; 53:637-652. [PMID: 38070061 PMCID: PMC10920536 DOI: 10.1007/s13280-023-01964-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/21/2023] [Accepted: 11/10/2023] [Indexed: 03/09/2024]
Abstract
Prompted by the increasing public focus on environmental policy and the continuous inability of States to reach environmental targets agreed upon in the context of the United Nations and the European Union, we explore the development of discourses within the Danish public agenda regarding nature management 2016-2021. This is done through a mixed-methods framework of discourse analysis and structural topic modeling based on documents from the Danish Parliament's Environmental committee 2016-2021, estimating topic prevalence, and analyzing the discourses within each topic, resulting in a qualitative overview of 21 identified topics and their associated discourses and an overview of how the different topic proportions changed over time. A shift in the public agenda was found: a change from discussions about untouched forest focused on trade-offs between timber extraction and biodiversity, to a discussion about different understandings of animal welfare in the context of large grazers in nature national parks in Denmark.
Collapse
Affiliation(s)
| | - Niclas Scott Bentsen
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Rolighedsvej 23, 1958, Frederiksberg C, Denmark.
| |
Collapse
|
11
|
Martin VP, Gauld C, Taillard J, Peter-Derex L, Lopez R, Micoulaud-Franchi JA. Sleepiness should be reinvestigated through the lens of clinical neurophysiology: A mixed expertal and big-data Natural Language Processing approach. Neurophysiol Clin 2024; 54:102937. [PMID: 38401240 DOI: 10.1016/j.neucli.2023.102937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 02/26/2024] Open
Abstract
Historically, the field of sleep medicine has revolved around electrophysiological tools. However, the use of these tools as a neurophysiological method of investigation seems to be underrepresented today, from both international recommendations and sleep centers, in contrast to behavioral and psychometric tools. The aim of this article is to combine a data-driven approach and neurophysiological and sleep medicine expertise to confirm or refute the hypothesis that neurophysiology has declined in favor of behavioral or self-reported dimensions in sleep medicine for the investigation of sleepiness, despite the use of electrophysiological tools. Using Natural Language Processing methods, we analyzed the abstracts of the 18,370 articles indexed by PubMed containing the terms 'sleepiness' or 'sleepy' in the title, abstract, or keywords. For this purpose, we examined these abstracts using two methods: a lexical network, enabling the identification of concepts (neurophysiological or clinical) related to sleepiness in these articles and their interconnections; furthermore, we analyzed the temporal evolution of these concepts to extract historical trends. These results confirm the hypothesis that neurophysiology has declined in favor of behavioral or self-reported dimensions in sleep medicine for the investigation of sleepiness. In order to bring sleepiness measurements closer to brain functioning and to reintroduce neurophysiology into sleep medicine, we discuss two strategies: the first is reanalyzing electrophysiological signals collected during the standard sleep electrophysiological test; the second takes advantage of the current trend towards dimensional models of sleepiness to situate clinical neurophysiology at the heart of the redefinition of sleepiness.
Collapse
Affiliation(s)
- Vincent P Martin
- Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg; Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400 Talence, France; Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France
| | - Christophe Gauld
- Service Psychopathologie du Développement de l'Enfant et de l'Adolescent, Hospices Civils de Lyon & Université de Lyon 1, France; Institut des Sciences Cognitives Marc Jeannerod, UMR 5229 CNRS & Université Claude Bernard Lyon 1, France
| | - Jacques Taillard
- Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France
| | - Laure Peter-Derex
- Lyon Neuroscience Research Centre, INSERM U1028, CNRS UMR 5292, Lyon, France; Centre for Sleep Medicine and Respiratory Diseases, Croix-Rousse Hospital, Hospices Civils de Lyon, Lyon 1 University, Lyon, France
| | - Régis Lopez
- National Reference Centre for Orphan Diseases, Narcolepsy-Rare hypersomnias, Sleep Unit, Department of Neurology, CHU de Montpellier, University of Montpellier, Montpellier, France; Institute for Neurosciences of Montpellier (INM), University of Montpellier, Inserm, Montpellier, France
| | - Jean-Arthur Micoulaud-Franchi
- Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France; University Sleep Clinic, University Hospital of Bordeaux, Place Amélie Raba-Leon, 33 076 Bordeaux, France.
| |
Collapse
|
12
|
Tentua MN, Suprapto, Afiahayati. NERSkill.Id: Annotated dataset of Indonesian's skill entity recognition. Data Brief 2024; 53:110192. [PMID: 38406245 PMCID: PMC10884741 DOI: 10.1016/j.dib.2024.110192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/07/2024] [Accepted: 02/07/2024] [Indexed: 02/27/2024] Open
Abstract
NERSkill.Id is a manually annotated named entity recognition (NER) dataset focused on skill entities in the Indonesian language. The dataset comprises 418.868 tokens, each accompanied by corresponding tags following the BIO scheme. Notably, 15,51% of these tokens represent named entities, falling into three distinct categories: hard skill, soft skill, and technology. To construct this dataset, data were gathered from a job portal and subsequently processed using open-source libraries. Given the scarcity of annotated corpora for Indonesian, NERSkill.Id fills a significant void and offers immense value to multiple stakeholders. NLP researchers can harness the dataset's richness to advance skill entity recognition technology in the Indonesian language. Companies and recruiters can benefit by employing NERSkill.Id to enhance talent acquisition and job matching processes through accurate skill identification. Furthermore, educational institutions can leverage the dataset to adapt their courses and training programs to meet the evolving needs of the job market. This dataset can be effectively utilized for training and evaluating named entity recognition systems, empowering advancements in skill entity recognition for the Indonesian language.
Collapse
Affiliation(s)
- Meilany Nonsi Tentua
- Informatic, Sains and Technology Faculty, Universitas PGRI Yogyakarta, Indonesia
| | - Suprapto
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Afiahayati
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
13
|
Jiang L, Lan M, Menke JD, Vorland CJ, Kilicoglu H. CONSORT-TM: Text classification models for assessing the completeness of randomized controlled trial publications. medRxiv 2024:2024.03.31.24305138. [PMID: 38633775 PMCID: PMC11023672 DOI: 10.1101/2024.03.31.24305138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Objective To develop text classification models for determining whether the checklist items in the CONSORT reporting guidelines are reported in randomized controlled trial publications. Materials and Methods Using a corpus annotated at the sentence level with 37 fine-grained CONSORT items, we trained several sentence classification models (PubMedBERT fine-tuning, BioGPT fine-tuning, and in-context learning with GPT-4) and compared their performance. To address the problem of small training dataset, we used several data augmentation methods (EDA, UMLS-EDA, text generation and rephrasing with GPT-4) and assessed their impact on the fine-tuned PubMedBERT model. We also fine-tuned PubMedBERT models limited to checklist items associated with specific sections (e.g., Methods) to evaluate whether such models could improve performance compared to the single full model. We performed 5-fold cross-validation and report precision, recall, F1 score, and area under curve (AUC). Results Fine-tuned PubMedBERT model that takes as input the sentence and the surrounding sentence representations and uses section headers yielded the best overall performance (0.71 micro-F1, 0.64 macro-F1). Data augmentation had limited positive effect, UMLS-EDA yielding slightly better results than data augmentation using GPT-4. BioGPT fine-tuning and GPT-4 in-context learning exhibited suboptimal results. Methods-specific model yielded higher performance for methodology items, other section-specific models did not have significant impact. Conclusion Most CONSORT checklist items can be recognized reasonably well with the fine-tuned PubMedBERT model but there is room for improvement. Improved models can underpin the journal editorial workflows and CONSORT adherence checks and can help authors in improving the reporting quality and completeness of their manuscripts.
Collapse
Affiliation(s)
- Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Mengfei Lan
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Joe D. Menke
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Colby J Vorland
- Indiana University, School of Public Health, Bloomington, IN, USA
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
14
|
Xie X, Li D, Pei Y, Zhu W, Du X, Jiang X, Zhang L, Wang HQ. Personalized anti-tumor drug efficacy prediction based on clinical data. Heliyon 2024; 10:e27300. [PMID: 38500995 PMCID: PMC10945121 DOI: 10.1016/j.heliyon.2024.e27300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 02/27/2024] [Accepted: 02/27/2024] [Indexed: 03/20/2024] Open
Abstract
Anti-tumor drug efficacy prediction poses an unprecedented challenge to realizing personalized medicine. This paper proposes to predict personalized anti-tumor drug efficacy based on clinical data. Specifically, we encode the clinical text as numeric vectors featured with hidden topics for patients using Latent Dirichlet Allocation model. Then, to classify patients into two classes, responsive or non-responsive to a drug, drug efficacy predictors are established by machine learning based on the Latent Dirichlet Allocation topic representation. To evaluate the proposed method, we collected and collated clinical records of lung and bowel cancer patients treated with platinum. Experimental results on the data sets show the efficacy and effectiveness of the proposed method, suggesting the potential value of clinical data in cancer precision medicine. We hope that it will promote the research of drug efficacy prediction based on clinical data.
Collapse
Affiliation(s)
- Xinping Xie
- School of Mathematics and Physics, Anhui Jianzhu University, Hefei, China
| | - Dandan Li
- School of Mathematics and Physics, Anhui Jianzhu University, Hefei, China
| | - Yangyang Pei
- School of Mathematics and Physics, Anhui Jianzhu University, Hefei, China
| | - Weiwei Zhu
- Institute of Intelligent Machines/Zhongqi AI Lab., Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Xiaodong Du
- Experimental Teaching Center, Hefei University, Hefei, China
| | - Xiaodong Jiang
- Medical Oncology Department, The First Affiliated Hospital of University of Science and Technology of China, Hefei, Anhui, 230001, China
| | - Lei Zhang
- Pharmacy Department, The First Affiliated Hospital of University of Science and Technology of China, Hefei, Anhui, 230001, China
| | - Hong-Qiang Wang
- Institute of Intelligent Machines/Zhongqi AI Lab., Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| |
Collapse
|
15
|
Xu G, Li X, Liu X, Han J, Shao K, Yang H, Fan F, Zhang X, Dou J. Bibliometric insights into the evolution of uranium contamination reduction research topics: Focus on microbial reduction of uranium. Sci Total Environ 2024; 917:170397. [PMID: 38307284 DOI: 10.1016/j.scitotenv.2024.170397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 01/09/2024] [Accepted: 01/21/2024] [Indexed: 02/04/2024]
Abstract
Confronting the threat of environment uranium pollution, decades of research have yielded advanced and significant findings in uranium bioremediation, resulting in the accumulation of tremendous amount of high-quality literature. In this study, we analyzed over 10,000 uranium reduction-related papers published from 1990 to the present in the Web of Science based on bibliometrics, and revealed some critical information on knowledge structure, thematic evolution and additional attention. Methods including contribution comparison, co-occurrence and temporal evolution analysis are applied. The results of the distribution and impact analysis of authors, sources, and journals indicated that the United States is a leader in this field of research and China is on the rise. The top keywords remained stable, primarily focused on chemicals (uranium, iron, plutonium, nitrat, carbon), characters (divers, surfac, speciat), and microbiology (microbial commun, cytochrome, extracellular polymeric subst). Keywords related to new strains, reduction mechanisms and product characteristics demonstrated the strongest uptrend, while some keywords related to mechanism and performance were clearly emerging in the past 5 years. Furthermore, the evolution of the thematic progression can be categorized into three stages, commencing with the discovery of the enzymatic reduction of hexavalent uranium to tetravalent uranium, developing in the groundwater remediation process at uranium-contaminated sites, and delving into the research on microbial reduction mechanisms of uranium. For future research, enhancing the understanding of mechanisms, improving uranium removal performance, and exploring practical applications can be considered. This study provides unique insights into microbial uranium reduction research, providing valuable references for related studies in this field.
Collapse
Affiliation(s)
- Guangming Xu
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Xindai Li
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Xinyao Liu
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Juncheng Han
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Kexin Shao
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Haotian Yang
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China
| | - Fuqiang Fan
- Advanced Institute of Natural Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, PR China.
| | - Xiaodong Zhang
- Analytical and Testing Center of BNU, Beijing Normal University, Beijing 100875, PR China
| | - Junfeng Dou
- Engineering Research Center of Ministry of Education on Groundwater Pollution Control and Remediation, College of Water Sciences, Beijing Normal University, Beijing 100875, PR China.
| |
Collapse
|
16
|
Karystianis G, Lukmanjaya W, Buchan I, Simpson P, Ginnivan N, Nenadic G, Butler T. An analysis of published study designs in PubMed prisoner health abstracts from 1963 to 2023: a text mining study. BMC Med Res Methodol 2024; 24:68. [PMID: 38494501 PMCID: PMC10944606 DOI: 10.1186/s12874-024-02186-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/20/2024] [Indexed: 03/19/2024] Open
Abstract
BACKGROUND The challenging nature of studies with incarcerated populations and other offender groups can impede the conduct of research, particularly that involving complex study designs such as randomised control trials and clinical interventions. Providing an overview of study designs employed in this area can offer insights into this issue and how research quality may impact on health and justice outcomes. METHODS We used a rule-based approach to extract study designs from a sample of 34,481 PubMed abstracts related to epidemiological criminology published between 1963 and 2023. The results were compared against an accepted hierarchy of scientific evidence. RESULTS We evaluated our method in a random sample of 100 PubMed abstracts. An F1-Score of 92.2% was returned. Of 34,481 study abstracts, almost 40.0% (13,671) had an extracted study design. The most common study design was observational (37.3%; 5101) while experimental research in the form of trials (randomised, non-randomised) was present in 16.9% (2319). Mapped against the current hierarchy of scientific evidence, 13.7% (1874) of extracted study designs could not be categorised. Among the remaining studies, most were observational (17.2%; 2343) followed by systematic reviews (10.5%; 1432) with randomised controlled trials accounting for 8.7% (1196) of studies and meta-analysis for 1.4% (190) of studies. CONCLUSIONS It is possible to extract epidemiological study designs from a large-scale PubMed sample computationally. However, the number of trials, systematic reviews, and meta-analysis is relatively small - just 1 in 5 articles. Despite an increase over time in the total number of articles, study design details in the abstracts were missing. Epidemiological criminology still lacks the experimental evidence needed to address the health needs of the marginalized and isolated population that is prisoners and offenders.
Collapse
Affiliation(s)
- George Karystianis
- School of Population Health, University of New South Wales, Sydney, Australia.
| | - Wilson Lukmanjaya
- School of Population Health, University of New South Wales, Sydney, Australia
| | - Iain Buchan
- Institute of Population Health, University of Liverpool, Liverpool, UK
| | - Paul Simpson
- School of Population Health, University of New South Wales, Sydney, Australia
| | - Natasha Ginnivan
- School of Population Health, University of New South Wales, Sydney, Australia
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, UK
| | - Tony Butler
- School of Population Health, University of New South Wales, Sydney, Australia
| |
Collapse
|
17
|
Raman R, Venugopalan M, Kamal A. Evaluating human resources management literacy: A performance analysis of ChatGPT and bard. Heliyon 2024; 10:e27026. [PMID: 38486738 PMCID: PMC10937570 DOI: 10.1016/j.heliyon.2024.e27026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/16/2024] [Accepted: 02/22/2024] [Indexed: 03/17/2024] Open
Abstract
This study presents a comprehensive analysis comparing the literacy levels of two Generative Artificial Intelligence (GAI) tools, ChatGPT and Bard, using a dataset of 134 questions from the Human Resources (HR) domain. The generated responses are evaluated for accuracy, relevance, and clarity. We find that ChatGPT outperforms Bard in overall accuracy (84.3% vs. 82.8%). This difference in performance suggests that ChatGPT could serve as a robotic advisor in transactional HR roles. In contrast, Bard may possess additional safeguards against misuse in the HR function, making it less capable of generating responses to certain types of questions. Statistical tests reveal that although the two systems differ in their mean accuracy, relevance, and clarity of the responses, the observed differences are not always statistically significant, implying that both tools may be more complementary than competitive. The Pearson correlation coefficients further support this by showing weak to non-existent relationships in performance metrics between the two tools. Confirmation queries don't improve ChatGPT or Bard's response accuracy. The study thus contributes to emerging research on the utility of GAI tools in Human Resources Management and suggests that involving certified HR professionals in the design phase could enhance underlying language model performance.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - Murale Venugopalan
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - Anju Kamal
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| |
Collapse
|
18
|
Wedyan M, Saeidi-Rizi F. Assessing the Impact of Urban Environments on Mental Health and Perception Using Deep Learning: A Review and Text Mining Analysis. J Urban Health 2024:10.1007/s11524-024-00830-6. [PMID: 38466494 DOI: 10.1007/s11524-024-00830-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 03/13/2024]
Abstract
Understanding how outdoor environments affect mental health outcomes is vital in today's fast-paced and urbanized society. Recently, advancements in data-gathering technologies and deep learning have facilitated the study of the relationship between the outdoor environment and human perception. In a systematic review, we investigate how deep learning techniques can shed light on a better understanding of the influence of outdoor environments on human perceptions and emotions, with an emphasis on mental health outcomes. We have systematically reviewed 40 articles published in SCOPUS and the Web of Science databases which were the published papers between 2016 and 2023. The study presents and utilizes a novel topic modeling method to identify coherent keywords. By extracting the top words of each research topic, and identifying the current topics, we indicate that current studies are classified into three areas. The first topic was "Urban Perception and Environmental Factors" where the studies aimed to evaluate perceptions and mental health outcomes. Within this topic, the studies were divided based on human emotions, mood, stress, and urban features impacts. The second topic was titled "Data Analysis and Urban Imagery in Modeling" which focused on refining deep learning techniques, data collection methods, and participants' variability to understand human perceptions more accurately. The last topic was named "Greenery and visual exposure in urban spaces" which focused on the impact of the amount and the exposure of green features on mental health and perceptions. Upon reviewing the papers, this study provides a guide for subsequent research to enhance the view of using deep learning techniques to understand how urban environments influence mental health. It also provides various suggestions that should be taken into account when planning outdoor spaces.
Collapse
Affiliation(s)
- Musab Wedyan
- School of Planning, Design and Construction, Michigan State University, East Lansing, MI, USA
| | - Fatemeh Saeidi-Rizi
- School of Planning, Design and Construction, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
19
|
Park YJ, Yang GJ, Sohn CB, Park SJ. GPDminer: a tool for extracting named entities and analyzing relations in biological literature. BMC Bioinformatics 2024; 25:101. [PMID: 38448845 PMCID: PMC10916184 DOI: 10.1186/s12859-024-05710-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 02/19/2024] [Indexed: 03/08/2024] Open
Abstract
PURPOSE The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.
Collapse
Affiliation(s)
- Yeon-Ji Park
- Department of Electronics and Communications Engineering, Kwangwoon University, 20 Gwangun-ro, Seoul, 01897, Republic of Korea
| | - Geun-Je Yang
- Department of Electronics and Communications Engineering, Kwangwoon University, 20 Gwangun-ro, Seoul, 01897, Republic of Korea
| | - Chae-Bong Sohn
- Department of Electronics and Communications Engineering, Kwangwoon University, 20 Gwangun-ro, Seoul, 01897, Republic of Korea.
| | - Soo Jun Park
- Welfare & Medical ICT Research Department, Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Daejeon, 34129, Republic of Korea.
| |
Collapse
|
20
|
Grotenhuis Z, Mosteiro PJ, Leeuwenberg AM. Modest performance of text mining to extract health outcomes may be almost sufficient for high-quality prognostic model development. Comput Biol Med 2024; 170:108014. [PMID: 38301515 DOI: 10.1016/j.compbiomed.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/03/2024] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Across medicine, prognostic models are used to estimate patient risk of certain future health outcomes (e.g., cardiovascular or mortality risk). To develop (or train) prognostic models, historic patient-level training data is needed containing both the predictive factors (i.e., features) and the relevant health outcomes (i.e., labels). Sometimes, when the health outcomes are not recorded in structured data, these are first extracted from textual notes using text mining techniques. Because there exist many studies utilizing text mining to obtain outcome data for prognostic model development, our aim is to study the impact of the text mining quality on downstream prognostic model performance. METHODS We conducted a simulation study charting the relationship between text mining quality and prognostic model performance using an illustrative case study about in-hospital mortality prediction in intensive care unit patients. We repeatedly developed and evaluated a prognostic model for in-hospital mortality, using outcome data extracted by multiple text mining models of varying quality. RESULTS Interestingly, we found in our case study that a relatively low-quality text mining model (F1 score ≈ 0.50) could already be used to train a prognostic model with quite good discrimination (area under the receiver operating characteristic curve of around 0.80). The calibration of the risks estimated by the prognostic model seemed unreliable across the majority of settings, even when text mining models were of relatively high quality (F1 ≈ 0.80). DISCUSSION Developing prognostic models on text-extracted outcomes using imperfect text mining models seems promising. However, it is likely that prognostic models developed using this approach may not produce well-calibrated risk estimates, and require recalibration in (possibly a smaller amount of) manually extracted outcome data.
Collapse
Affiliation(s)
- Zwierd Grotenhuis
- Department of Information and Computing Sciences, Utrecht University, The Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Pablo J Mosteiro
- Department of Information and Computing Sciences, Utrecht University, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, The Netherlands.
| |
Collapse
|
21
|
Jiao J, He P, Zha J. Factors influencing illegal dumping of hazardous waste in China. J Environ Manage 2024; 354:120366. [PMID: 38364544 DOI: 10.1016/j.jenvman.2024.120366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/22/2024] [Accepted: 02/08/2024] [Indexed: 02/18/2024]
Abstract
In recent years, illegal dumping of hazardous waste (IDHW) in China has become a recurring problem. Effective identification and exploration of the factors influencing illegal dumping are crucial for incident prevention and hazardous waste management, but its analysis has rarely been reported. Thus, this study focused on 568 cases of IDHW officially reported by the government. Through regular expressions, the categories of dumped wastes and the provinces where the incidents occurred were extracted. Furthermore, a comprehensive set of influencing factors was constructed by text mining for the case content and by the integration from the existing literature. On this basis, the unstructured and structured data were integrated using a Boolean dataset to respectively explore the association rules of influencing factors for the overall IDHW and for major waste categories, in conjunction with the extracted province information. Subsequently, a Bayesian network was constructed by utilizing the results of association rules mining and the key factors were identified through corresponding analysis. The findings of this study reveal a close connection between various influencing factors, with distinct key factors identified for different categories of hazardous waste. Among them, law-enforcement emerges as a crucial factor in most IDHW cases, while the factor of public monitoring for metallic hazardous waste and the factor of government supervision for distillation residue waste and other waste play a key role in their respective cases of illegal dumping. These findings offer a fresh research perspective for investigating the factors influencing IDHW and present helpful insights for developing effective strategies to prevent and control such incidents.
Collapse
Affiliation(s)
- Jianling Jiao
- School of Management, Hefei University of Technology, Hefei, Anhui, 230009, China; Philosophy and Social Sciences Laboratory of Data Science and Smart Society Governance, Ministry of Education, Hefei, Anhui, China.
| | - Pengwang He
- School of Management, Hefei University of Technology, Hefei, Anhui, 230009, China.
| | - Jianrui Zha
- School of Management, Hefei University of Technology, Hefei, Anhui, 230009, China; Anhui Key Laboratory of Philosophy and Social Sciences of Energy and Energy and Environment Smart Management and Green Low Carbon Development, Hefei University of Technology, Hefei, 230009, China.
| |
Collapse
|
22
|
Chen X, Zou D, Xie H, Wang FL. Technology-enhanced higher education: Text mining and bibliometrics. Heliyon 2024; 10:e25776. [PMID: 38384551 PMCID: PMC10878921 DOI: 10.1016/j.heliyon.2024.e25776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 12/01/2023] [Accepted: 02/01/2024] [Indexed: 02/23/2024] Open
Abstract
Objectives Research on technology-enhanced higher education (TEHE) has been active and influential in educational technology. The study had three objectives: (1) to recognize the tendencies in the field and the contributing countries/regions/institutions, (2) to visualize scientific collaborations, and (3) to reveal important research topics, their developmental tendencies, correlations, and distributions across contributing countries/regions/institutions. Methods We collected 609 papers in relation to TEHE from 2004 to 2022 and analyzed them using text mining and bibliometric methods. Specifically, we focused on determining article trends, identifying contributing institutions/countries/regions, visualizing scientific collaborations through social network analysis, and revealing the important topics and their conceptual evolutions over time using topic models, Mann-Kendall trend test, hierarchical clustering, and Sankey visualization. Results Regarding the first objective, TEHE articles have grown consistently and will continue to expand. This growth was due to the contributions of Spanish universities and institutions from other countries/regions such as the USA, the UK, Australia, Germany, China, and Turkey. Regarding the second objective, the exploration of regional and institutional collaborations through social networks revealed that geographically adjacent institutions tended to foster close collaborations, particularly among those sharing similar research interests. Nevertheless, more cross-regional collaborations are needed to advance TEHE research. Regarding the third objective, the analysis of topics highlighted research hotspots and emerging themes such as Massive Online Open Courses, AI and big data in education, Gamification and engagement, Learning effectiveness and strategies, Social networks and discussion forums, COVID-19 and online learning, and Plagiarism detection and learning analytics. Conclusions This bibliometric study comprehensively analyzed the research landscape of TEHE research regarding contributors, collaborations, and research topics, and offers a glimpse into what the future may hold. It can be used as a guide for contributors to the field to identify the current research hotspots and emerging themes.
Collapse
Affiliation(s)
- Xieling Chen
- School of Education, Guangzhou University, Guangzhou, China
| | - Di Zou
- Centre for English and Additional Languages, Lingnan University, Hong Kong SAR, China
| | - Haoran Xie
- Department of Computing and Decision Sciences, Lingnan University, Hong Kong SAR, China
| | - Fu Lee Wang
- School of Science and Technology, Hong Kong Metropolitan University, Hong Kong SAR, China
| |
Collapse
|
23
|
He X, Zhang H, Huang J, Zhao D, Li Y, Nie R, Liu X. [Research on fault diagnosis of patient monitor based on text mining]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2024; 41:168-176. [PMID: 38403618 PMCID: PMC10894744 DOI: 10.7507/1001-5515.202306017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
The conventional fault diagnosis of patient monitors heavily relies on manual experience, resulting in low diagnostic efficiency and ineffective utilization of fault maintenance text data. To address these issues, this paper proposes an intelligent fault diagnosis method for patient monitors based on multi-feature text representation, improved bidirectional gate recurrent unit (BiGRU) and attention mechanism. Firstly, the fault text data was preprocessed, and the word vectors containing multiple linguistic features was generated by linguistically-motivated bidirectional encoder representation from Transformer. Then, the bidirectional fault features were extracted and weighted by the improved BiGRU and attention mechanism respectively. Finally, the weighted loss function is used to reduce the impact of class imbalance on the model. To validate the effectiveness of the proposed method, this paper uses the patient monitor fault dataset for verification, and the macro F1 value has achieved 91.11%. The results show that the model built in this study can realize the automatic classification of fault text, and may provide assistant decision support for the intelligent fault diagnosis of the patient monitor in the future.
Collapse
Affiliation(s)
- Xiangfei He
- School of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Hehua Zhang
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
- School of Biological Information, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Jing Huang
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Dechun Zhao
- School of Biological Information, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yang Li
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Rui Nie
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Xianghua Liu
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| |
Collapse
|
24
|
Sánchez M, Urquiza L. Improving fraud detection with semi-supervised topic modeling and keyword integration. PeerJ Comput Sci 2024; 10:e1733. [PMID: 38259882 PMCID: PMC10803081 DOI: 10.7717/peerj-cs.1733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/13/2023] [Indexed: 01/24/2024]
Abstract
Fraud detection through auditors' manual review of accounting and financial records has traditionally relied on human experience and intuition. However, replicating this task using technological tools has represented a challenge for information security researchers. Natural language processing techniques, such as topic modeling, have been explored to extract information and categorize large sets of documents. Topic modeling, such as latent Dirichlet allocation (LDA) or non-negative matrix factorization (NMF), has recently gained popularity for discovering thematic structures in text collections. However, unsupervised topic modeling may not always produce the best results for specific tasks, such as fraud detection. Therefore, in the present work, we propose to use semi-supervised topic modeling, which allows the incorporation of specific knowledge of the study domain through the use of keywords to learn latent topics related to fraud. By leveraging relevant keywords, our proposed approach aims to identify patterns related to the vertices of the fraud triangle theory, providing more consistent and interpretable results for fraud detection. The model's performance was evaluated by training with several datasets and testing it with another one that did not intervene in its training. The results showed efficient performance averages with a 7% increase in performance compared to a previous job. Overall, the study emphasizes the importance of deepening the analysis of fraud behaviors and proposing strategies to identify them proactively.
Collapse
Affiliation(s)
- Marco Sánchez
- Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Pichincha, Ecuador
| | - Luis Urquiza
- Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Quito, Pichincha, Ecuador
| |
Collapse
|
25
|
Sinha GR, Viswanathan M, Larrison CR. Student loan debt and mental health: a comprehensive review of scholarly literature from 1900 to 2019. J Evid Based Soc Work (2019) 2024:1-31. [PMID: 38179674 DOI: 10.1080/26408066.2023.2299019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Abstract
PURPOSE The review had two purposes. The first was to examine the nature and extent of published literature on student loan and the second was to systematically review the literature on student loans and mental health. MATERIALS AND METHODS Data from academic databases (1900-2019) were analyzed using two methods. First, topic modeling (a text-mining tool that utilized Bayesian statistics to extract hidden patterns in large volumes of texts) was used to understand the topical coverage in peer-reviewed abstracts (n = 988) on student debt. Second, using PRISMA guidelines, 46 manuscripts were systematically reviewed to synthesize literature linking student debt and mental health. RESULTS A model with 10 topics was selected for parsimony and more accurate clustered representation of the patterns. Certain topics have received less attention, including mental health and wellbeing. In the systematic review, themes derived were categorized into two life trajectories: before and during repayment. Whereas stress, anxiety, and depression dominated the literature, the review demonstrated that the consequences of student loans extend beyond mental health and negatively affect a person's wellbeing. Self-efficacy emerged as a potential solution. DISCUSSION AND CONCLUSION Across countries and samples, the results are uniform and show that student loan burdens certain vulnerable groups more. Findings indicate diversity in mental health measures has resulted into a lack of a unified theoretical framework. Better scales and consensus on commonly used terms will strengthen the literature. Some areas, such as impact of student loans on graduate students or consumers repaying their loans, warrant attention in future research.
Collapse
Affiliation(s)
- Gaurav R Sinha
- School of Social Work, University of Georgia, Athens, Georgia, USA
| | | | | |
Collapse
|
26
|
Aljrees T, Umer M, Saidani O, Almuqren L, Ishaq A, Alsubai S, Eshmawi AA, Ashraf I. Contradiction in text review and apps rating: prediction using textual features and transfer learning. PeerJ Comput Sci 2024; 10:e1722. [PMID: 38196956 PMCID: PMC10773744 DOI: 10.7717/peerj-cs.1722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/05/2023] [Indexed: 01/11/2024]
Abstract
Mobile app stores, such as Google Play, have become famous platforms for practically all types of software and services for mobile phone users. Users may browse and download apps via app stores, which also help developers monitor their apps by allowing users to rate and review them. App reviews may contain the user's experience, bug details, requests for additional features, or a textual rating of the app. These ratings can be frequently biased due to inadequate votes. However, there are significant discrepancies between the numerical ratings and the user reviews. This study uses a transfer learning approach to predict the numerical ratings of Google apps. It benefits from user-provided numeric ratings of apps as the training data and provides authentic ratings of mobile apps by analyzing users' reviews. A transfer learning-based model ELMo is proposed for this purpose which is based on the word vector feature representation technique. The performance of the proposed model is compared with three other transfer learning and five machine learning models. The dataset is scrapped from the Google Play store which extracts the data from 14 different categories of apps. First, biased and unbiased user rating is segregated using TextBlob analysis to formulate the ground truth, and then classifiers prediction accuracy is evaluated. Results demonstrate that the ELMo classifier has a high potential to predict authentic numeric ratings with user actual reviews.
Collapse
Affiliation(s)
- Turki Aljrees
- College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science, Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Latifah Almuqren
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Abid Ishaq
- Department of Computer Science, Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Ala’ Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| |
Collapse
|
27
|
Li Z, Yao M, Luo Z, Wang X, Liu T, Huang Q, Su C. A chemical accident cause text mining method based on improved accident triangle. BMC Public Health 2024; 24:39. [PMID: 38166879 PMCID: PMC10762847 DOI: 10.1186/s12889-023-17510-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 12/16/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND With the rapid development of China's chemical industry, although researchers have developed many methods in the field of chemical safety, the situation of chemical safety in China is still not optimistic. How to prevent accidents has always been the focus of scholars' attention. METHODS Based on the characteristics of chemical enterprises and the Heinrich accident triangle, this paper developed the organizational-level accident triangle, which divides accidents into group-level, unit-level, and workshop-level accidents. Based on 484 accident records of a large chemical enterprise in China, the Spearman correlation coefficient was used to analyze the rationality of accident classification and the occurrence rules of accidents at different levels. In addition, this paper used TF-IDF and K-means algorithms to extract keywords and perform text clustering analysis for accidents at different levels based on accident classification. The risk factors of each accident cluster were further analyzed, and improvement measures were proposed for the sample enterprises. RESULTS The results show that reducing unit-level accidents can prevent group-level accidents. The accidents of the sample enterprises are mainly personal injury accidents, production accidents, environmental pollution accidents, and quality accidents. The leading causes of personal injury accidents are employees' unsafe behaviors, such as poor safety awareness, non-standard operation, illegal operation, untimely communication, etc. The leading causes of production accidents, environmental pollution accidents, and quality accidents include the unsafe state of materials, such as equipment damage, pipeline leakage, short-circuiting, excessive fluctuation of process parameters, etc. CONCLUSION: Compared with the traditional accident classification method, the accident triangle proposed in this paper based on the organizational level dramatically reduces the differences between accidents, helps enterprises quickly identify risk factors, and prevents accidents. This method can effectively prevent accidents and provide helpful guidance for the safety management of chemical enterprises.
Collapse
Affiliation(s)
- Zheng Li
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China.
| | - Min Yao
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China
- Institute of Management Science, Ningxia University, Yin'chuan, 750021, China
| | - Zhenmin Luo
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China
| | - Xinping Wang
- College of Management, Xi'an University of Science and Technology, Xi'an, 710054, China
| | - Tongshuang Liu
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China
| | - Qianrui Huang
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China
| | - Chang Su
- College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an, 710054, China
| |
Collapse
|
28
|
Helmy A, Nassar R, Ramdan N. Depression detection for twitter users using sentiment analysis in English and Arabic tweets. Artif Intell Med 2024; 147:102716. [PMID: 38184345 DOI: 10.1016/j.artmed.2023.102716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 11/06/2023] [Accepted: 11/08/2023] [Indexed: 01/08/2024]
Abstract
Since depression often results in suicidal thoughts and leaves a person severely disabled daily, there is an elevated risk of premature mortality due to mental problems caused by depression. Therefore, it's crucial to identify the patient's mental illness as soon as possible. People are increasingly using social media platforms to express their opinions and share daily activities, which makes online platforms rich sources of early depression detection. The contribution of this paper is multifold. First, it presents five machine-learning models for Arabic and English depression detection using Twitter text. The best model for Arabic text achieved an f1-score of 96.6 % for binary classification to depressed and Non_dep. For English text without negation, the model achieved 92 % for binary classification and 88 % for multi-classification (depressed, indifferent, happy). For English text with negation, an 87 %, and 85 % f1 score was achieved for binary and multi-classification respectively. Second, the work introduced a manually annotated Arabic_Dep_tweets_10,000 corpus of 10.000 Arabic tweets, which covered neutral tweets as well as a variety of depressed and happy terms. In addition, two automatically annotated English corpora, Eng_without_negation_60.000 corpus of 60,172 English tweets and Eng_with_negation_57.000 corpus of 57,392 English tweets. Both covered a wide range of depressed and cheerful terms; however, Negation was included in the Eng_with_negation_57.000 corpus. Finally, this paper exposes a depression-detection web application which implements our optimal models to detect tweets that contain depression symptoms and predict depression trends for a person either using English or Arabic language.
Collapse
Affiliation(s)
- AbdelMoniem Helmy
- Department of Information Systems and Technology Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt.
| | - Radwa Nassar
- Department of Information Systems and Technology Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt
| | - Nagy Ramdan
- Department of Information Systems and Technology Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt
| |
Collapse
|
29
|
Saha S, Vyas R. Computational Analysis of Gastric Canceromics Data to Identify Putative Biomarkers. Curr Top Med Chem 2024; 24:128-156. [PMID: 37861003 DOI: 10.2174/0115680266259310230924190213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/07/2023] [Accepted: 08/22/2023] [Indexed: 10/21/2023]
Abstract
BACKGROUND Gastric cancer develops as a malignant tumor in the mucosa of the stomach, and spreads through further layers. Early-stage diagnosis of gastric cancer is highly challenging because the patients either exhibit symptoms similar to stomach infections or show no signs at all. Biomarkers are active players in the cancer process by acting as indications of aberrant alterations due to malignancy. OBJECTIVE Though there have been significant advancements in the biomarkers and therapeutic targets, there are still insufficient data to fully eradicate the disease in its early phases. Therefore, it is crucial to identify particular biomarkers for detecting and treating stomach cancer. This review aims to provide a thorough overview of data analysis in gastric cancer. METHODS Text mining, network analysis, machine learning (ML), deep learning (DL), and structural bioinformatics approaches have been employed in this study. RESULTS We have built a huge interaction network in the current study to forecast new biomarkers for gastric cancer. The four putatively unique and potential biomarker genes have been identified via a large association network in this study. CONCLUSION The molecular basis of the illness is well understood by computational approaches, which also provide biomarkers for targeted cancer therapy. These putative biomarkers may be useful in the early detection of disease. This study also shows that in H. pylori infection in early-stage gastric cancer, the top 10 hub genes constitute an essential component of the epithelial cell signaling pathways. These genes can further contribute to the future development of effective biomarkers.
Collapse
Affiliation(s)
- Sagarika Saha
- MIT School of Bioengineering Sciences & Research, MIT Art Design and Technology University, Raj Baugh Campus, Loni Kalbhor, Pune, 412201, Maharashtra, India
| | - Renu Vyas
- MIT School of Bioengineering Sciences & Research, MIT Art Design and Technology University, Raj Baugh Campus, Loni Kalbhor, Pune, 412201, Maharashtra, India
| |
Collapse
|
30
|
Kumar V, Shankar G, Akhter Y. Deciphering drug discovery and microbial pathogenesis research in tuberculosis during the two decades of postgenomic era using entity mining approach. Arch Microbiol 2023; 206:46. [PMID: 38153595 DOI: 10.1007/s00203-023-03776-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 11/10/2023] [Accepted: 11/27/2023] [Indexed: 12/29/2023]
Abstract
We examined literature on Mycobacterium tuberculosis (Mtb) subsequent to its genome release, spanning years 1999-2020. We employed scientometric mapping, entity mining, visualization techniques, and PubMed and PubTator databases. Most popular keywords, most active research groups, and growth in quantity of publications were determined. By gathering annotations from the PubTator, we determined direction of research in the areas of drug hypersensitivity, drug resistance (AMR), and drug-related side effects. Additionally, we examined the patterns in research on Mtb metabolism and various forms of tuberculosis, including skin, brain, pulmonary, extrapulmonary, and latent tuberculosis. We discovered that 2011 had the highest annual growth rate of publications, at 19.94%. The USA leads the world in publications with 18,038, followed by China with 14,441, and India with 12,158 publications. Studies on isoniazid and rifampicin resistance showed an enormous increase. Non-tuberculous mycobacteria also been the subject of more research in effort to better understand Mtb physiology and as model organisms. Researchers also looked at co-infections like leprosy, hepatitis, plasmodium, HIV, and other opportunistic infections. Host perspectives like immune response, hypoxia, and reactive oxygen species, as well as comorbidities like arthritis, cancer, diabetes, and kidney disease etc. were also looked at. Symptomatic aspects like fever, coughing, and weight loss were also investigated. Vitamin D has gained popularity as a supplement during illness recovery, however, the interest of researchers declined off late. We delineated dominant researchers, journals, institutions, and leading nations globally, which is crucial for aligning ongoing and evolving landscape of TB research efforts. Recognising the dominant patterns offers important information about the areas of focus for current research, allowing biomedical scientists, clinicians, and organizations to strategically coordinate their efforts with the changing priorities in the field of tuberculosis research.
Collapse
Affiliation(s)
- Vinit Kumar
- Department of Library and Information Science, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow, 226025, Uttar Pradesh, India.
| | - Gauri Shankar
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow, 226025, Uttar Pradesh, India
| | - Yusuf Akhter
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow, 226025, Uttar Pradesh, India.
| |
Collapse
|
31
|
Zheng X, Wang X, Luo X, Tong F, Zhao D. BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network. BMC Bioinformatics 2023; 24:486. [PMID: 38114906 PMCID: PMC10731880 DOI: 10.1186/s12859-023-05601-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Automatic and accurate extraction of diverse biomedical relations from literature is a crucial component of bio-medical text mining. Currently, stacking various classification networks on pre-trained language models to perform fine-tuning is a common framework to end-to-end solve the biomedical relation extraction (BioRE) problem. However, the sequence-based pre-trained language models underutilize the graphical topology of language to some extent. In addition, sequence-oriented deep neural networks have limitations in processing graphical features. RESULTS In this paper, we propose a novel method for sentence-level BioRE task, BioEGRE (BioELECTRA and Graph pointer neural net-work for Relation Extraction), aimed at leveraging the linguistic topological features. First, the biomedical literature is preprocessed to retain sentences involving pre-defined entity pairs. Secondly, SciSpaCy is employed to conduct dependency parsing; sentences are modeled as graphs based on the parsing results; BioELECTRA is utilized to generate token-level representations, which are modeled as attributes of nodes in the sentence graphs; a graph pointer neural network layer is employed to select the most relevant multi-hop neighbors to optimize representations; a fully-connected neural network layer is employed to generate the sentence-level representation. Finally, the Softmax function is employed to calculate the probabilities. Our proposed method is evaluated on three BioRE tasks: a multi-class (CHEMPROT) and two binary tasks (GAD and EU-ADR). The results show that our method achieves F1-scores of 79.97% (CHEMPROT), 83.31% (GAD), and 83.51% (EU-ADR), surpassing the performance of existing state-of-the-art models. CONCLUSION The experimental results on 3 biomedical benchmark datasets demonstrate the effectiveness and generalization of BioEGRE, which indicates that linguistic topology and a graph pointer neural network layer explicitly improve performance for BioRE tasks.
Collapse
Affiliation(s)
- Xiangwen Zheng
- Academy of Military Medical Sciences, Beijing, 100039, China
| | - Xuanze Wang
- Academy of Military Medical Sciences, Beijing, 100039, China
| | - Xiaowei Luo
- Academy of Military Medical Sciences, Beijing, 100039, China
| | - Fan Tong
- Academy of Military Medical Sciences, Beijing, 100039, China
| | - Dongsheng Zhao
- Academy of Military Medical Sciences, Beijing, 100039, China.
| |
Collapse
|
32
|
AbuShihab K, Obaideen K, Alameddine M, Alkurd RAF, Khraiwesh HM, Mohammad Y, Abdelrahim DN, Madkour MI, Faris ME. Reflection on Ramadan Fasting Research Related to Sustainable Development Goal 3 (Good Health and Well-Being): A Bibliometric Analysis. J Relig Health 2023:10.1007/s10943-023-01955-9. [PMID: 38110843 DOI: 10.1007/s10943-023-01955-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/07/2023] [Indexed: 12/20/2023]
Abstract
There is a large body of research on Ramadan intermittent fasting (RIF) and health in Muslim communities, that can offer insights to promote the achievement of Sustainable Development Goal 3 (SDG 3), which encompasses good health and well-being. Based on recent bibliometric evidence, we hypothesized that RIF research is highly relevant to SDG 3, particularly Targets 3.1, 3.2, 3.4, and 3.5. Therefore, this bibliometric study quantified RIF literature supporting SDG 3 and associated targets over the past seven decades and explored themes and trends. All types of research articles were extracted from the Scopus database from inception to March 2022. Microsoft Excel, Biblioshiny, and VOSviewer were used to qualitatively and quantitatively examine RIF research trends supporting SDG 3 and associated targets. We identified 1729 relevant articles. The number of publications notably increased since 1986, with a dramatic increase in 2019-2020. RIF research predominantly supported Target 3.4 (reducing risk for non-communicable diseases), with research hotspots being diabetes, diabetes medications, pregnancy, physiology, metabolic diseases, and obesity and metabolism. This target was also the most commonly supported by dedicated authors and institutions publishing on RIF, whereas other SDG 3 targets were negligibly addressed in comparison. Our comprehensive bibliometric analysis of RIF literature showed growing support for SDG 3 through positive contributions to half of the SDG 3 targets, although Target 3.4 received the most attention. We also identified knowledge gaps that may shape further research directions on RIF and promote the achievement of SDG 3 in Muslim communities.
Collapse
Affiliation(s)
- Katia AbuShihab
- Nutrition and Food Research Group, Research Institute of Medical and Health Sciences (RIMHS), Sharjah University, Sharjah, United Arab Emirates
| | - Khaled Obaideen
- Sustainable Engineering Asset Management Research Group, University of Sharjah, Sharjah, United Arab Emirates.
| | - Mohamad Alameddine
- Department of Health Service Administration, College of Health Sciences, University of Sharjah, Sharjah, United Arab Emirates
- Research Institute of Medical and Health Sciences (RIMHS), University of Sharjah, Sharjah, United Arab Emirates
| | - Refat Ahmad Fawzi Alkurd
- Faculty of Pharmacy and Medical Sciences, Department of Nutrition, University of Petra, Amman, Jordan
| | - Husam M Khraiwesh
- Department of Nutrition and Food Processing, Faculty of Agricultural Technology, Al-Balqa' Applied University, Salt, Jordan
| | - Yara Mohammad
- College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - Dana N Abdelrahim
- Health Promotion Research Group, Research Institute of Medical and Health Sciences (RIMHS), Sharjah University, Sharjah, United Arab Emirates
| | - Mohamed I Madkour
- Research Institute of Medical and Health Sciences (RIMHS), University of Sharjah, Sharjah, United Arab Emirates
- Department of Medical Laboratory Sciences, College of Health Sciences, University of Sharjah, Sharjah, United Arab Emirates
| | - MoezAlIslam E Faris
- Research Institute of Medical and Health Sciences (RIMHS), University of Sharjah, Sharjah, United Arab Emirates.
- Department of Clinical Nutrition and Dietetics, College of Health Sciences, University of Sharjah, Sharjah, United Arab Emirates.
| |
Collapse
|
33
|
Atkinson-Clement C, Duflot M, Lastennet E, Patsalides L, Wasserman E, Sartoris TM, Tarrano C, Rosso C, Burbaud P, Deniau E, Czernecki V, Roze E, Hartmann A, Worbe Y. How does Tourette syndrome impact adolescents' daily living? A text mining study. Eur Child Adolesc Psychiatry 2023; 32:2623-2635. [PMID: 36460852 PMCID: PMC10682273 DOI: 10.1007/s00787-022-02116-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022]
Abstract
Tourette syndrome is a neurodevelopmental disease in which clinical manifestations are essentially present during childhood and adolescence, corresponding to one of the critical development phases. However, its consequences on the daily lives of young patients have been insufficiently investigated. Here, we aimed to investigate this using a statistical text mining approach, allowing for the analysis of a large volume of free textual data. Sixty-two adolescents with Tourette syndrome participated in an interview in which they discussed their daily life (i) in school, (ii) at home, and (iii) with strangers, (iv) the aspect of Tourette syndrome which caused the most difficulty, and (v) their thoughts regarding their future as adults. Following data pre-processing, these corpora were analyzed separately using the IRAMUTEQ software through factorial correspondence analysis to identify the most commonly recurring topics of each corpus, and their relations with clinical features. The main difficulty corpus was directly related to comorbidities of Tourette syndrome. Daily life at home was correlated with executive functioning. Difficulties at school were related to a higher severity of tics. Thoughts regarding future daily life were worst for the youngest patients and were correlated with executive functioning and a higher depression score. Taken altogether, our results highlighted that social stigma was a pervasive topic among our corpora. From a clinical standpoint, tic severity was especially related to difficulties at school, while comorbidities had a high impact on social daily living and cost for managing both tics and symptoms of comorbidities. TRIAL REGISTRATION: clinicaltrials.gov/ct2/show/NCT04179435.
Collapse
Affiliation(s)
- Cyril Atkinson-Clement
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France.
- Precision Imaging Beacon, School of Medicine, University of Nottingham, Nottingham, UK.
| | - Marion Duflot
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Eloise Lastennet
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Leïla Patsalides
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Emma Wasserman
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Therese-Marie Sartoris
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Clément Tarrano
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Charlotte Rosso
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
- Urgences Cérébro-Vasculaires, Pitié-Salpétrière Hospital, Paris, France
| | - Pierre Burbaud
- Centre Hospitalier Universitaire de Bordeaux, Institut des Maladies Neurodégénératives, CNRS, University of Bordeaux, Bordeaux, France
| | - Emmanuelle Deniau
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
- National Reference Center for Tourette Syndrome, Assistance Publique des Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière, 75013, Paris, France
| | - Virginie Czernecki
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
- National Reference Center for Tourette Syndrome, Assistance Publique des Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière, 75013, Paris, France
| | - Emmanuel Roze
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
| | - Andreas Hartmann
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
- National Reference Center for Tourette Syndrome, Assistance Publique des Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière, 75013, Paris, France
| | - Yulia Worbe
- Sorbonne Université, Paris Brain Institute Institut du Cerveau-ICM, CNRS, Hôpital de La Pitié Salpêtrière (DMU 6), InsermParis, AP-HP, France
- National Reference Center for Tourette Syndrome, Assistance Publique des Hôpitaux de Paris, Groupe Hospitalier Pitié-Salpêtrière, 75013, Paris, France
- Department of Neurophysiology, Saint Antoine Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France
| |
Collapse
|
34
|
Paul T, Bhardwaj P, Mondal A, Bandyopadhyay TK, Mahata N, Bhunia B. Identification of Novel Protein Targets of Prodigiosin for Breast Cancer Using Inverse Virtual Screening Methods. Appl Biochem Biotechnol 2023; 195:7236-7254. [PMID: 36988846 DOI: 10.1007/s12010-023-04426-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/15/2023] [Indexed: 03/30/2023]
Abstract
Prodigiosin (PG) is chemically formulated as 4-methoxy-5-[(5-methyl-4-pentyl-2H-pyrrol-2ylidene)methyl]-2,2'-bi-1H-pyrrole and it is an apoptotic agent. Only a few protein targets for PG have been identified so far for regulating various diseases; nevertheless, finding more PG targets is crucial for novel drug discovery research. A bioinformatics method was applied in this work to find additional potential PG targets. Initially, a text mining analysis was conducted to determine the relationship between PG and a variety of metabolic processes. One hundred sixteen proteins from the KEGG pathway were selected for the docking study. Inverse virtual screening was performed by Discovery Studio software 4.1 using CHARMm-based docking tool. Twelve proteins are screened out of 116 because their CDOCKER interaction energy is larger than - 40.22 kcal/mol. The best docking score with PG was reported to be - 44.25 kcal/mol, - 44.99 kcal/mol, and - 40.91 kcal/mol for three novel proteins, such as human epidermal growth factor-2 (HER-2), mitogen-activated protein kinase (MEK), and S6 kinase protein (S6K) respectively. The interactions in the S6K/PG complex are predominantly hydrophobic; however, hydrogen bond interactions can be identified in the MEK/PG and HER-2/PG complexes. The root-mean-square deviation (RMSD) and key interaction score system (KISS) were further used to validate the docking approach. The docking approach employed in this work has a low RMSD value (2.44 Å) and a high KISS score (0.5), indicating that it is significant.
Collapse
Affiliation(s)
- Tania Paul
- Department of Chemical Engineering, National Institute of Technology, Agartala, 799046, India
| | - Prashant Bhardwaj
- Department of Computer Science and Engineering, National Institute of Technology, Agartala, 799046, India
| | - Abhijit Mondal
- Department of Chemical Engineering, Birla Institute of Technology Mesra, Mesra, Jharkhand, 835215, India
| | | | - Nibedita Mahata
- Department of Biotechnology, National Institute of Technology, Durgapur, India
| | - Biswanath Bhunia
- Department of Bio Engineering, National Institute of Technology, Agartala, 799046, India.
| |
Collapse
|
35
|
Ketcham M, Ganokratanaa T, Sridoung N. Classification of broadband network devices using text mining technique. MethodsX 2023; 11:102346. [PMID: 37674865 PMCID: PMC10477059 DOI: 10.1016/j.mex.2023.102346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
The Broadband Internet industry is highly competitive, with service providers investing heavily in network development to meet customer demands and competing on pricing. Effective cost management is crucial for profitability in this market. This work proposes a model for classifying broadband network devices based on text mining techniques applied to a device list from a leading broadband network company in Thailand. The device descriptions are used to generate a feature vector, which is then employed by a classification algorithm to categorize devices into core, access, and last mile hierarchies. Various algorithms including decision tree, naïve Bayes, Bayesian network, k-nearest neighbor, support vector machine, and deep neural network are compared, with support vector machine achieving the highest accuracy of 90.35%. The results are visualized to provide insights into network hierarchy, device replacement dates, and budget requirements, enabling support for cost management, budget planning, maintenance, and investment decision-making. The methodology outline includes,•Obtaining a device list from a major broadband network company and extracting device descriptions through text mining and generating a feature vector.•Using a support vector machine for classification and comparing algorithm performances.•Visualizing the results for actionable insights in cost management, budget planning, and investment decisions.
Collapse
Affiliation(s)
- Mahasak Ketcham
- Department of Information Technology Management, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
| | - Thittaporn Ganokratanaa
- Applied Computer Science Programme, Department of Mathematics, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
| | - Nattapat Sridoung
- Department of Information Technology Management, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
| |
Collapse
|
36
|
Nguyen CT, Nguyen HT, Vu TMT, Le Vu MN, Vu GT, Latkin CA, Ho CSH, Ho RCM. Mapping Studies of Alcohol Use Among People Living with HIV/AIDS During 1990-2019 (GAPRESEARCH). AIDS Behav 2023; 27:3981-3991. [PMID: 37338623 DOI: 10.1007/s10461-023-04112-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/07/2023] [Indexed: 06/21/2023]
Abstract
Alcohol drinking has long been reported to be common in people living with HIV/AIDS, having biological and behavioral impacts on the transmission, progression, and prevention of HIV/AIDS. A total of 7059 eligible articles and reviews published in English from 1990 to 2019 were extracted from the WOS. Results show an increase in publication volume, while citations peak for papers published in 2006. Content analysis reveals a wide-ranging coverage of topics, with the most popular being effects of alcohol consumption on ART adherence and outcomes, alcohol-related sexual behaviors, TB co-infection, and psycho-socio-cultural considerations in examining and designing measures targeting alcohol use and interventions to reduce alcohol dependence in PLWHA. This calls for more active engagement of governments in research and in designing and implementing interventions, as well as collaborations and knowledge transfer from high-income countries to developing counterparts, to effectively address alcohol use-related issues in PLWHA, moving toward the HIV/AIDS eradication target.
Collapse
Affiliation(s)
- Cuong Tat Nguyen
- Institute for Global Health Innovations, Duy Tan University, Da Nang, 550000, Vietnam.
- Faculty of Medicine, Duy Tan University, Da Nang, 550000, Vietnam.
| | - Hien Thu Nguyen
- Institute for Global Health Innovations, Duy Tan University, Da Nang, 550000, Vietnam
- Faculty of Medicine, Duy Tan University, Da Nang, 550000, Vietnam
| | - Thuc Minh Thi Vu
- Institute of Health Economics and Technology, Hanoi, 100000, Vietnam
| | - Minh Ngoc Le Vu
- Institute of Health Economics and Technology, Hanoi, 100000, Vietnam
| | - Giang Thu Vu
- Center of Excellence in Evidence-based Medicine, Nguyen Tat Thanh University, Ho Chi Minh City, 700000, Vietnam
| | - Carl A Latkin
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Cyrus S H Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Singapore
| | - Roger C M Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Singapore
- Institute for Health Innovation and Technology (iHealthtech), National University of Singapore, Singapore, 119077, Singapore
| |
Collapse
|
37
|
Rabby G, D'Souza J, Oelen A, Dvorackova L, Svátek V, Auer S. Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. J Biomed Semantics 2023; 14:18. [PMID: 38017587 PMCID: PMC10683290 DOI: 10.1186/s13326-023-00298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/13/2023] [Indexed: 11/30/2023] Open
Abstract
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.
Collapse
Affiliation(s)
- Gollam Rabby
- L3S Research Center, Leibniz University Hannover, Hanover, Germany.
- Department of Information and Knowledge Engineering, Prague University of Economics and Business, nám. Winstona Churchilla 1938/4, 120 00, Prague, Czech Republic.
| | - Jennifer D'Souza
- Leibniz Information Centre for Science and Technology, Hannover, Germany
| | - Allard Oelen
- Leibniz Information Centre for Science and Technology, Hannover, Germany
| | - Lucie Dvorackova
- Department of Econometrics, Prague University of Economics and Business, Prague, Czech Republic
| | - Vojtěch Svátek
- Department of Information and Knowledge Engineering, Prague University of Economics and Business, nám. Winstona Churchilla 1938/4, 120 00, Prague, Czech Republic
| | - Sören Auer
- L3S Research Center, Leibniz University Hannover, Hanover, Germany
- Leibniz Information Centre for Science and Technology, Hannover, Germany
| |
Collapse
|
38
|
Fuenteslópez CV, McKitrick A, Corvi J, Ginebra MP, Hakimi O. Biomaterials text mining: A hands-on comparative study of methods on polydioxanone biocompatibility. N Biotechnol 2023; 77:161-175. [PMID: 37673372 DOI: 10.1016/j.nbt.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/14/2023] [Accepted: 09/02/2023] [Indexed: 09/08/2023]
Abstract
Scientific information extraction is fundamental for research and innovation, but is currently mostly a manual, time-consuming process. Text Mining tools (TMTs) enable automated, accurate and quick information extraction from text, but there is little precedent of their use in the biomaterials field. Here, we compare the ability of various TMTs to extract useful information from biomaterials abstracts. Focusing on the biocompatibility of polydioxanone, a biodegradable polymer for which there are relatively few scientific publications, we tested several tools ranging from machine learning approaches and statistical text analysis to MeSH indexing and domain-specific semantic tools for Named Entity Recognition. We also evaluated their output alongside a manual review of systematic reviews and meta-analyses. The findings show that TMTs can be highly efficient and powerful for mapping biomaterials texts and rapidly yield up-to-date information. Here, TMTs enable one to identify dominating themes, see the evolution of specific terms and topics, and learn about key medical applications in biomaterials literature over the years. The analysis also shows that ambiguity around biomaterials nomenclature is a significant challenge in mining biomedical literature that is yet to be tackled. This research showcases the potential value of using Natural Language Processing and domain-specific tools to extract and organize biomaterials data.
Collapse
Affiliation(s)
- Carla V Fuenteslópez
- Institute of Biomedical Engineering, Botnar Research Centre, Nuffield Orthopaedic Centre, University of Oxford, Oxford OX3 7LD, UK.
| | - Austin McKitrick
- Institute of Social Research, University of Michigan, MI 48104, USA
| | - Javier Corvi
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Maria-Pau Ginebra
- Department of Materials Science and Engineering, Universitat Politècnica de Catalunya, Barcelona 08019, Spain
| | - Osnat Hakimi
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Department of Materials Science and Engineering, Universitat Politècnica de Catalunya, Barcelona 08019, Spain; Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona 08017, Spain.
| |
Collapse
|
39
|
Niezni D, Taub-Tabib H, Harris Y, Sason H, Amrusi Y, Meron-Azagury D, Avrashami M, Launer-Wachs S, Borchardt J, Kusold M, Tiktinsky A, Hope T, Goldberg Y, Shamay Y. Extending the boundaries of cancer therapeutic complexity with literature text mining. Artif Intell Med 2023; 145:102681. [PMID: 37925210 DOI: 10.1016/j.artmed.2023.102681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/30/2023] [Accepted: 10/03/2023] [Indexed: 11/06/2023]
Abstract
Drug combination therapy is a main pillar of cancer therapy. As the number of possible drug candidates for combinations grows, the development of optimal high complexity combination therapies (involving 4 or more drugs per treatment) such as RCHOP-I and FOLFIRINOX becomes increasingly challenging due to combinatorial explosion. In this paper, we propose a text mining (TM) based tool and workflow for rapid generation of high complexity combination treatments (HCCT) in order to extend the boundaries of complexity in cancer treatments. Our primary objectives were: (1) Characterize the existing limitations in combination therapy; (2) Develop and introduce the Plan Builder (PB) to utilize existing literature for drug combination effectively; (3) Evaluate PB's potential in accelerating the development of HCCT plans. Our results demonstrate that researchers and experts using PB are able to create HCCT plans at much greater speed and quality compared to conventional methods. By releasing PB, we hope to enable more researchers to engage with HCCT planning and demonstrate its clinical efficacy.
Collapse
Affiliation(s)
- Danna Niezni
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | | | - Yuval Harris
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - Hagit Sason
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - Yakir Amrusi
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - Dana Meron-Azagury
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - Maytal Avrashami
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - Shaked Launer-Wachs
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | | | - M Kusold
- Allen Institute for AI, Seattle, USA
| | | | - Tom Hope
- Allen Institute for AI, Tel Aviv, Israel; The Hebrew University, Jerusalem, Israel
| | - Yoav Goldberg
- Allen Institute for AI, Tel Aviv, Israel; Bar-Ilan University, Ramat-Gan, Israel
| | - Yosi Shamay
- Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
40
|
Peng Z, Li M, Wang Y, Ho GTS. Combating the COVID-19 infodemic using Prompt-Based curriculum learning. Expert Syst Appl 2023; 229:120501. [PMID: 37274611 PMCID: PMC10193815 DOI: 10.1016/j.eswa.2023.120501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 05/14/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The COVID-19 pandemic has been accompanied by a proliferation of online misinformation and disinformation about the virus. Combating this 'infodemic' has been identified as one of the top priorities of the World Health Organization, because false and misleading information can lead to a range of negative consequences, including the spread of false remedies, conspiracy theories, and xenophobia. This paper aims to combat the COVID-19 infodemic on multiple fronts, including determining the credibility of information, identifying its potential harm to society, and the necessity of intervention by relevant organizations. We present a prompt-based curriculum learning method to achieve this goal. The proposed method could overcome the challenges of data sparsity and class imbalance issues. Using online social media texts as input, the proposed model can verify content from multiple perspectives by answering a series of questions concerning the text's reliability. Experiments revealed the effectiveness of prompt tuning and curriculum learning in assessing the reliability of COVID-19-related text. The proposed method outperforms typical text classification methods, including fastText and BERT. In addition, the proposed method is robust to the hyperparameter settings, making it more applicable with limited infrastructure resources.
Collapse
Affiliation(s)
- Zifan Peng
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingchen Li
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
| | - Yue Wang
- Department of Supply Chain and Information Management, The Hang Seng University of Hong Kong, Hong Kong SAR, China
| | - George T S Ho
- Department of Supply Chain and Information Management, The Hang Seng University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
41
|
Wu S, Liu M, Xiao S, Lai M, Wei L, Li D, Wang L, Yin F, Zeng X. Identification and verification of novel ferroptosis biomarkers predicts the prognosis of hepatocellular carcinoma. Genomics 2023; 115:110733. [PMID: 37866659 DOI: 10.1016/j.ygeno.2023.110733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 09/28/2023] [Accepted: 10/19/2023] [Indexed: 10/24/2023]
Abstract
BACKGROUND Big data mining and experiments are widely used to mine new prognostic markers. METHODS Candidate genes were identified from CROEMINE and FerrDb. Kaplan-Meier survival and Cox regression analysis were applied to assess the association of genes with Overall survival time (OS) and Disease-free survival time (DFS) in two HCC cohorts. Real-time quantitative polymerase chain reaction (RT-qPCR) and Immunohistochemistry were performed in HCC samples. RESULTS 21 and 15 genes that can predict OS and DFS, which had not been reported before, were identified from 719 genes, respectively. Survival analysis showed elevated mRNA expression of GLMP, SLC38A6, and WDR76 were associated with poor prognosis, and three genes combination signature was an independent prognostic factor in HCC. RT-qPCR and Immunohistochemistry confirmed the results. CONCLUSIONS We established a novel computational process, which identified the expression levels of GLMP, SLC38A6, and WDR76 as potential ferroptosis-related biomarkers indicating the prognosis of HCC.
Collapse
Affiliation(s)
- Siqian Wu
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Meiliang Liu
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Suyang Xiao
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Mingshuang Lai
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Liling Wei
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Deyuan Li
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Lijun Wang
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China
| | - Fuqiang Yin
- Life Sciences Institute, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China; Key Laboratory of High-Incidence-Tumor Prevention and Treatment (Guangxi Medical University), Ministry of Education, Nanning, China.
| | - Xiaoyun Zeng
- Department of Epidemiology and Health Statistics, School of public health, Guangxi Medical University, 22 Shuangyong Road, Nanning 530021, Guangxi, China; Key Laboratory of High-Incidence-Tumor Prevention and Treatment (Guangxi Medical University), Ministry of Education, Nanning, China.
| |
Collapse
|
42
|
Nilashi M, Abumalloh RA, Ahmadi H, Samad S, Alrizq M, Abosaq H, Alghamdi A. The nexus between quality of customer relationship management systems and customers' satisfaction: Evidence from online customers' reviews. Heliyon 2023; 9:e21828. [PMID: 38034804 PMCID: PMC10682139 DOI: 10.1016/j.heliyon.2023.e21828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 12/02/2023] Open
Abstract
Customer Relationship Management (CRM) is a method of management that aims to establish, develop, and improve relationships with targeted customers in order to maximize corporate profitability and customer value. There have been many CRM systems in the market. These systems are developed based on the combination of business requirements, customer needs, and industry best practices. The impact of CRM systems on the customers' satisfaction and competitive advantages as well as tangible and intangible benefits are widely investigated in the previous studies. However, there is a lack of studies to assess the quality dimensions of these systems to meet an organization's CRM strategy. This study aims to investigate customers' satisfaction with CRM systems through online reviews. We collected 5172 online customers' reviews from 8 CRM systems in the Google play store platform. The satisfaction factors were extracted using Latent Dirichlet Allocation (LDA) and grouped into three dimensions; information quality, system quality, and service quality. Data segmentation is performed using Learning Vector Quantization (LVQ). In addition, feature selection is performed by the entropy-weight approach. We then used the Adaptive Neuro Fuzzy Inference System (ANFIS), the hybrid of fuzzy logic and neural networks, to assess the relationship between these dimensions and customer satisfaction. The results are discussed and research implications are provided.
Collapse
Affiliation(s)
- Mehrbakhsh Nilashi
- UCSI Graduate Business School, UCSI University, 56000, Cheras, Kuala Lumpur, Malaysia
- Centre for Global Sustainability Studies (CGSS), Universiti Sains Malaysia, 11800, Penang, Malaysia
| | - Rabab Ali Abumalloh
- Department of Computer Science and Engineering, Qatar University, Doha, 2713, Qatar
| | - Hossein Ahmadi
- Faculty of Health, University of Plymouth, Plymouth, PL4 8AA, UK
| | - Sarminah Samad
- Department of Business Administration, College of Business and Administration, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Mesfer Alrizq
- Information Systems Dept. College of Computer Science and Information Systems Najran University, Najran, Saudi Arabia
| | - Hamad Abosaq
- Computer Science Dept. College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| | - Abdullah Alghamdi
- Information Systems Dept. College of Computer Science and Information Systems Najran University, Najran, Saudi Arabia
| |
Collapse
|
43
|
Anandakrishnan M, Ross KE, Chen C, Shanker V, Cowart J, Wu CH. KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases. PeerJ 2023; 11:e16164. [PMID: 37818330 PMCID: PMC10561642 DOI: 10.7717/peerj.16164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
Background Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.
Collapse
Affiliation(s)
- Manju Anandakrishnan
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Karen E. Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| | - Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Vijay Shanker
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Julie Cowart
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
| | - Cathy H. Wu
- Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America
| |
Collapse
|
44
|
Ma S, Bai C, Chen C, Bai J, Yu M, Zhou Y. Public sense of dental implants on social media: A cross-sectional study based on text analysis of comments. J Dent 2023; 137:104671. [PMID: 37604395 DOI: 10.1016/j.jdent.2023.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/17/2023] [Accepted: 08/16/2023] [Indexed: 08/23/2023] Open
Abstract
OBJECTIVES To investigate the most discussed topics and possible new interests in dental implants among the public, as well as the public sentiments toward dental implants through topic and sentiment analysis of online comments. METHODS Comments of the top 100 most viewed dental implant-related YouTube videos were studied. The comments were analyzed by topic analysis (LDA topic model, Word co-occurrence analysis) and sentiment analysis. The basic information of videos was collected and classified. Video quality was evaluated by GQS criteria and 9-point usefulness scoring system. Statistical analyses were performed using Kruskal-Wallis test, Mann-Whitney U-tests, and Spearman correlation analysis. RESULTS 74 videos with 61,618 comments were considered eligible in this study. Most videos targeted the public with high viewing and comments, but the theme was single and the quality was low. From topic analysis, the most discussed topics in the comments were procedure, cost, feelings associated with prognosis, and expectations. Multidisciplinary approaches in implant dentistry were frequently discussed. From sentiment analysis, the public mainly expressed positive sentiment through comments. In detail, the public had positive feelings about aesthetics and health, negative feelings about pain, and neutral feelings about cost. CONCLUSION The hot topics of public concern were procedure, cost, feelings associated with prognosis, and expectations. Intriguingly, multidisciplinary approaches in implant dentistry have emerged as a new hot subtopic within the topic "procedure". Based on the sentiment analysis of the comments, the general sentiment expressed by the public toward dental implants was predominantly positive. CLINICAL SIGNIFICANCE Text mining can extract data from social media to explore public interest in dentistry. Clinicians should convey reasonable expectations and understanding about dental implants, especially addressing the most public-concerned topics (procedure, cost, feelings, and expectations), and provide patients with well-grounded multidisciplinary treatment plans to meet the growing public demand.
Collapse
Affiliation(s)
- Siyao Ma
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province, Hangzhou 310000
| | - Chenhao Bai
- Sir Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunchun Chen
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province, Hangzhou 310000
| | - Jingyao Bai
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province, Hangzhou 310000
| | - Mengfei Yu
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province, Hangzhou 310000.
| | - Yi Zhou
- Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Engineering Research Center of Oral Biomaterials and Devices of Zhejiang Province, Hangzhou 310000.
| |
Collapse
|
45
|
Wu J, Peng Y. Understanding unmet medical needs through medical crowdfunding in China. Public Health 2023; 223:202-208. [PMID: 37672833 DOI: 10.1016/j.puhe.2023.07.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 07/08/2023] [Accepted: 07/21/2023] [Indexed: 09/08/2023]
Abstract
OBJECTIVES Online medical crowdfunding has gained popularity in recent years in China. The objective of this study was to identify unmet medical needs in the public healthcare system through analysis of Chinese medical crowdfunding data. STUDY DESIGN Text information extraction and statistical analysis based on large-scale data. METHODS From 19 June 2011 to 15 March 2020, data from 30,704 medical crowdfunding projects were collected from Tencent GongYi, which is one of the largest Chinese medical crowdfunding platforms. Text mining methods were used to extract data on the medical conditions and locations of the applicants of medical crowdfunding. In addition, 125 medical crowdfunding projects initiated by leukaemia patients in Chongqing and Nanyang were further investigated through manual data extraction, and the factors impacting the fundraising goals were explored using a generalised linear model. RESULTS The most common conditions using medical crowdfunding to raise funds were as follows: cancer (31.87%), chronic conditions (18.14%), accidental injury (7.80%) and blood system-related conditions (7.75%). Treatments for cancer and blood system-related conditions are expensive and have serious long-term impacts on the lives of patients. Results showed that the cities of Nanyang and Chongqing had the largest number of crowdfunding projects. CONCLUSIONS This study found that the medical conditions that prompted individuals to apply for crowdfunding were those with long treatment cycles, complexities and expensive medical or non-medical costs. Furthermore, discrepancies in health insurance policies between different regions and residents seeking treatments outside their insurance locations were also important factors that triggered medical crowdfunding applications. Adjusting health insurance policies accordingly may improve the efficiency of utilising health insurance resources and reduce the financial burden on patients.
Collapse
Affiliation(s)
- Junhong Wu
- School of Management and Economics, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi Peng
- School of Management and Economics, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
46
|
Keerthigha C, Singh S, Chan KQ, Caltabiano N. Helicopter parenting through the lens of reddit: A text mining study. Heliyon 2023; 9:e20970. [PMID: 37886774 PMCID: PMC10597765 DOI: 10.1016/j.heliyon.2023.e20970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/22/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
The study aimed to understand Reddit users' experience with helicopter parenting through first-hand accounts. Text mining and natural language processing techniques were employed to extract data from the subreddit r/helicopterparents. A total of 713 original posts were processed from unstructured texts to tidy formats. Latent Dirichlet Allocation (LDA), a popular topic modeling method, was used to discover hidden themes within the corpus. The data revealed common environmental contexts of helicopter parenting (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships. These collectively suggested the importance of autonomy-supportive parenting and mindfulness interventions as viable solutions to the problems posed by helicopter parenting. In addition, findings lent support to past research that has identified more maternal than paternal models of helicopter parenting. Further research on the implications of the COVID-19 pandemic on helicopter parenting is warranted.
Collapse
Affiliation(s)
- C. Keerthigha
- School of Social and Health Sciences, James Cook University, Singapore
| | - Smita Singh
- School of Social and Health Sciences, James Cook University, Singapore
| | - Kai Qin Chan
- School of Social and Health Sciences, James Cook University, Singapore
| | - Nerina Caltabiano
- College of Healthcare Sciences, James Cook University, Cairns, Australia
| |
Collapse
|
47
|
Kim M, Cho S. Monetary policy document analysis for prediction of monetary policy board decision. Heliyon 2023; 9:e20696. [PMID: 37876460 PMCID: PMC10590846 DOI: 10.1016/j.heliyon.2023.e20696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/26/2023] Open
Abstract
In terms of market capitalization, the bond market is larger than the stock market, and the bond market is affected by macroeconomic indicators. Despite this, there has been relatively little research, making it a good candidate for the use of data mining techniques. In this paper, a novel approach designed to predict the vote results of the Korean Monetary Policy Committee regarding the base interest rate was proposed. To predict sentence sentiment, prior monetary policy decision text was used as input for classification models. The sentence sentiment prediction model showed 83.7% performance when using a support vector machine. In addition, it was observed that the bigrams extracted from documents provided important descriptions of the Korean economy at the time. Finally, the document sentiment of monetary policy decision was calculated using aggregating sentence sentiment, and the vote results were predicted using this sentiment. As a result, when using the support vector machine to predict the Monetary Policy Committee vote results, the performance improved by 29.5% over the baseline model. Statistical tests confirmed whether there is a difference in document sentiments between unanimous and non-unanimous, and the null hypothesis was rejected at a significance level of 5%.
Collapse
Affiliation(s)
- Misuk Kim
- Department of Data Science, Sejong University, Republic of Korea
| | - Sungzoon Cho
- Department of Industrial Engineering and Big Data AI Center, Seoul National University, Republic of Korea
| |
Collapse
|
48
|
Kilicoglu H, Jiang L, Hoang L, Mayo-Wilson E, Vinkers CH, Otte WM. Methodology reporting improved over time in 176,469 randomized controlled trials. J Clin Epidemiol 2023; 162:19-28. [PMID: 37562729 PMCID: PMC10829891 DOI: 10.1016/j.jclinepi.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/25/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023]
Abstract
OBJECTIVES To describe randomized controlled trial (RCT) methodology reporting over time. STUDY DESIGN AND SETTING We used a deep learning-based sentence classification model based on the Consolidated Standards of Reporting Trials (CONSORT) statement, considered minimum requirements for reporting RCTs. We included 176,469 RCT reports published between 1966 and 2018. We analyzed the reporting trends over 5-year time periods, grouping trials from 1966 to 1990 in a single stratum. We also explored the effect of journal impact factor (JIF) and medical discipline. RESULTS Population, Intervention, Comparator, Outcome (PICO) items were commonly reported during each period, and reporting increased over time (e.g., interventions: 79.1% during 1966-1990 to 87.5% during 2010-2018). Reporting of some methods information has increased, although there is room for improvement (e.g., sequence generation: 10.8-41.8%). Some items are reported infrequently (e.g., allocation concealment: 5.1-19.3%). The number of items reported and JIF are weakly correlated (Pearson's r (162,702) = 0.16, P < 0.001). The differences in the proportion of items reported between disciplines are small (<10%). CONCLUSION Our analysis provides large-scale quantitative support for the hypothesis that RCT methodology reporting has improved over time. Extending these models to all CONSORT items could facilitate compliance checking during manuscript authoring and peer review, and support metaresearch.
Collapse
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA.
| | - Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Evan Mayo-Wilson
- Department of Epidemiology, University of North Carolina School of Global Public Health, Chapel Hill, NC, USA
| | - Christiaan H Vinkers
- Department of Psychiatry and Anatomy & Neurosciences, Amsterdam University Medical Center Location Vrije Universiteit Amsterdam, 1081 HV, Amsterdam, The Netherlands; Amsterdam Public Health, Mental Health Program and Amsterdam Neuroscience, Mood, Anxiety, Psychosis, Sleep & Stress Program, Amsterdam, The Netherlands; GGZ inGeest Mental Health Care, 1081 HJ, Amsterdam, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht, and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
49
|
Yang L, Wu S, Li G, Yuan Y. Explore public concerns about environmental protection on Sina Weibo: evidence from text mining. Environ Sci Pollut Res Int 2023; 30:104067-104085. [PMID: 37700122 DOI: 10.1007/s11356-023-29757-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/03/2023] [Indexed: 09/14/2023]
Abstract
The increasingly serious problem of ecological environmental pollution warns the importance of human environmental protection behavior. However, public attention to environmental protection plays an important role in solving environmental problems. Therefore, in order to explore the environmental concerns of Chinese residents, the trends in time and space, the relationship between online retweets, and the extraction of environmental concerns, this study analyzed the data of Sina Weibo users and their comments on related posts. At the same time, we used the text mining analysis method to analyze the social media text data, and the results are as follows. In that analysis of concern about environmental protection, women show a stronger attitude and willingness to protect the environment than men, and the public in economically developed areas is more concerned. In order to further investigate the public's environmental concerns, this study also utilized the PageRank algorithm to further study the forwarding relationships between users. The study found that celebrities and some good media organizations can attract environmental attention. Finally, we use pyLDAvis technology to visualize and analyze popular environmental themes and propose reasonable countermeasures and suggestions to enhance public environmental awareness based on the research results.
Collapse
Affiliation(s)
- Lifeng Yang
- School of Economics, Fuyang Normal University, Fuyang, 236037, China
| | - Shaotong Wu
- School of Business, Fuyang Normal University, Fuyang, 236037, China
| | - Guangxia Li
- School of Urban Economics and Management, Beijing University of Civil Engineering and Architecture, Beijing, 100000, China.
| | - Yunyun Yuan
- School of Management and Economics, Beijing Institute of Technology, Beijing, 100000, China
| |
Collapse
|
50
|
Vuori MA, Kiiskinen T, Pitkänen N, Kurki S, Laivuori H, Laitinen T, Mäntylahti S, Palotie A, FinnGen, Niiranen TJ. Use of electronic health record data mining for heart failure subtyping. BMC Res Notes 2023; 16:208. [PMID: 37697398 PMCID: PMC10496250 DOI: 10.1186/s13104-023-06469-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
OBJECTIVE To assess whether electronic health record (EHR) data text mining can be used to improve register-based heart failure (HF) subtyping. EHR data of 43,405 individuals from two Finnish hospital biobanks were mined for unstructured text mentions of ejection fraction (EF) and validated against clinical assessment in two sets of 100 randomly selected individuals. Structured laboratory data was then incorporated for a categorization by HF subtype (HF with mildly reduced EF, HFmrEF; HF with preserved EF, HFpEF; HF with reduced EF, HFrEF; and no HF). RESULTS In 86% of the cases, the algorithm-identified EF belonged to the correct HF subtype range. Sensitivity, specificity, PPV and NPV of the algorithm were 94-100% for HFrEF, 85-100% for HFmrEF, and 96%, 67%, 53% and 98% for HFpEF. Survival analyses using the traditional diagnosis of HF were in concordance with the algorithm-based ones. Compared to healthy individuals, mortality increased from HFmrEF (hazard ratio [HR], 1.91; 95% confidence interval [CI], 1.24-2.95) to HFpEF (2.28; 1.80-2.88) to HFrEF group (2.63; 1.97-3.50) over a follow-up of 1.5 years. We conclude that quantitative EF data can be efficiently extracted from EHRs and used with laboratory data to subtype HF with reasonable accuracy, especially for HFrEF.
Collapse
Affiliation(s)
- Matti A Vuori
- Division of Medicine, University of Turku, Kiinamyllynkatu 10, Turku, FI-20520, Finland.
- Turku University Hospital, Kiinamyllynkatu 4-8, Box 52, Turku, FI-20521, Finland.
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland.
| | - Tuomo Kiiskinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - Niina Pitkänen
- Auria Biobank, Kiinamyllynkatu 10, PO Box 30, Turku, FI-20520, Finland
| | - Samu Kurki
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
- Auria Biobank, Kiinamyllynkatu 10, PO Box 30, Turku, FI-20520, Finland
| | - Hannele Laivuori
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
- Centre for Child, Adolescent, and Maternal Health Research, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- Department of Obstetrics and Gynecology, Tampere University Hospital, Tampere, Finland
| | - Tarja Laitinen
- Administration Center, Tampere University Hospital and University of Tampere, P.O. Box 2000, Tampere, 33521, Finland
| | | | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - FinnGen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - Teemu J Niiranen
- Division of Medicine, University of Turku, Kiinamyllynkatu 10, Turku, FI-20520, Finland
- Turku University Hospital, Kiinamyllynkatu 4-8, Box 52, Turku, FI-20521, Finland
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, PO Box 30, Helsinki, FI-00271, Finland
| |
Collapse
|