51
|
Wu J, Peng Y. Understanding unmet medical needs through medical crowdfunding in China. Public Health 2023; 223:202-208. [PMID: 37672833 DOI: 10.1016/j.puhe.2023.07.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 07/08/2023] [Accepted: 07/21/2023] [Indexed: 09/08/2023]
Abstract
OBJECTIVES Online medical crowdfunding has gained popularity in recent years in China. The objective of this study was to identify unmet medical needs in the public healthcare system through analysis of Chinese medical crowdfunding data. STUDY DESIGN Text information extraction and statistical analysis based on large-scale data. METHODS From 19 June 2011 to 15 March 2020, data from 30,704 medical crowdfunding projects were collected from Tencent GongYi, which is one of the largest Chinese medical crowdfunding platforms. Text mining methods were used to extract data on the medical conditions and locations of the applicants of medical crowdfunding. In addition, 125 medical crowdfunding projects initiated by leukaemia patients in Chongqing and Nanyang were further investigated through manual data extraction, and the factors impacting the fundraising goals were explored using a generalised linear model. RESULTS The most common conditions using medical crowdfunding to raise funds were as follows: cancer (31.87%), chronic conditions (18.14%), accidental injury (7.80%) and blood system-related conditions (7.75%). Treatments for cancer and blood system-related conditions are expensive and have serious long-term impacts on the lives of patients. Results showed that the cities of Nanyang and Chongqing had the largest number of crowdfunding projects. CONCLUSIONS This study found that the medical conditions that prompted individuals to apply for crowdfunding were those with long treatment cycles, complexities and expensive medical or non-medical costs. Furthermore, discrepancies in health insurance policies between different regions and residents seeking treatments outside their insurance locations were also important factors that triggered medical crowdfunding applications. Adjusting health insurance policies accordingly may improve the efficiency of utilising health insurance resources and reduce the financial burden on patients.
Collapse
Affiliation(s)
- Junhong Wu
- School of Management and Economics, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi Peng
- School of Management and Economics, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
52
|
Keerthigha C, Singh S, Chan KQ, Caltabiano N. Helicopter parenting through the lens of reddit: A text mining study. Heliyon 2023; 9:e20970. [PMID: 37886774 PMCID: PMC10597765 DOI: 10.1016/j.heliyon.2023.e20970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/22/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
The study aimed to understand Reddit users' experience with helicopter parenting through first-hand accounts. Text mining and natural language processing techniques were employed to extract data from the subreddit r/helicopterparents. A total of 713 original posts were processed from unstructured texts to tidy formats. Latent Dirichlet Allocation (LDA), a popular topic modeling method, was used to discover hidden themes within the corpus. The data revealed common environmental contexts of helicopter parenting (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships. These collectively suggested the importance of autonomy-supportive parenting and mindfulness interventions as viable solutions to the problems posed by helicopter parenting. In addition, findings lent support to past research that has identified more maternal than paternal models of helicopter parenting. Further research on the implications of the COVID-19 pandemic on helicopter parenting is warranted.
Collapse
Affiliation(s)
- C. Keerthigha
- School of Social and Health Sciences, James Cook University, Singapore
| | - Smita Singh
- School of Social and Health Sciences, James Cook University, Singapore
| | - Kai Qin Chan
- School of Social and Health Sciences, James Cook University, Singapore
| | - Nerina Caltabiano
- College of Healthcare Sciences, James Cook University, Cairns, Australia
| |
Collapse
|
53
|
Kim M, Cho S. Monetary policy document analysis for prediction of monetary policy board decision. Heliyon 2023; 9:e20696. [PMID: 37876460 PMCID: PMC10590846 DOI: 10.1016/j.heliyon.2023.e20696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/26/2023] Open
Abstract
In terms of market capitalization, the bond market is larger than the stock market, and the bond market is affected by macroeconomic indicators. Despite this, there has been relatively little research, making it a good candidate for the use of data mining techniques. In this paper, a novel approach designed to predict the vote results of the Korean Monetary Policy Committee regarding the base interest rate was proposed. To predict sentence sentiment, prior monetary policy decision text was used as input for classification models. The sentence sentiment prediction model showed 83.7% performance when using a support vector machine. In addition, it was observed that the bigrams extracted from documents provided important descriptions of the Korean economy at the time. Finally, the document sentiment of monetary policy decision was calculated using aggregating sentence sentiment, and the vote results were predicted using this sentiment. As a result, when using the support vector machine to predict the Monetary Policy Committee vote results, the performance improved by 29.5% over the baseline model. Statistical tests confirmed whether there is a difference in document sentiments between unanimous and non-unanimous, and the null hypothesis was rejected at a significance level of 5%.
Collapse
Affiliation(s)
- Misuk Kim
- Department of Data Science, Sejong University, Republic of Korea
| | - Sungzoon Cho
- Department of Industrial Engineering and Big Data AI Center, Seoul National University, Republic of Korea
| |
Collapse
|
54
|
Kilicoglu H, Jiang L, Hoang L, Mayo-Wilson E, Vinkers CH, Otte WM. Methodology reporting improved over time in 176,469 randomized controlled trials. J Clin Epidemiol 2023; 162:19-28. [PMID: 37562729 PMCID: PMC10829891 DOI: 10.1016/j.jclinepi.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/25/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023]
Abstract
OBJECTIVES To describe randomized controlled trial (RCT) methodology reporting over time. STUDY DESIGN AND SETTING We used a deep learning-based sentence classification model based on the Consolidated Standards of Reporting Trials (CONSORT) statement, considered minimum requirements for reporting RCTs. We included 176,469 RCT reports published between 1966 and 2018. We analyzed the reporting trends over 5-year time periods, grouping trials from 1966 to 1990 in a single stratum. We also explored the effect of journal impact factor (JIF) and medical discipline. RESULTS Population, Intervention, Comparator, Outcome (PICO) items were commonly reported during each period, and reporting increased over time (e.g., interventions: 79.1% during 1966-1990 to 87.5% during 2010-2018). Reporting of some methods information has increased, although there is room for improvement (e.g., sequence generation: 10.8-41.8%). Some items are reported infrequently (e.g., allocation concealment: 5.1-19.3%). The number of items reported and JIF are weakly correlated (Pearson's r (162,702) = 0.16, P < 0.001). The differences in the proportion of items reported between disciplines are small (<10%). CONCLUSION Our analysis provides large-scale quantitative support for the hypothesis that RCT methodology reporting has improved over time. Extending these models to all CONSORT items could facilitate compliance checking during manuscript authoring and peer review, and support metaresearch.
Collapse
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA.
| | - Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Evan Mayo-Wilson
- Department of Epidemiology, University of North Carolina School of Global Public Health, Chapel Hill, NC, USA
| | - Christiaan H Vinkers
- Department of Psychiatry and Anatomy & Neurosciences, Amsterdam University Medical Center Location Vrije Universiteit Amsterdam, 1081 HV, Amsterdam, The Netherlands; Amsterdam Public Health, Mental Health Program and Amsterdam Neuroscience, Mood, Anxiety, Psychosis, Sleep & Stress Program, Amsterdam, The Netherlands; GGZ inGeest Mental Health Care, 1081 HJ, Amsterdam, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht, and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
55
|
Yang L, Wu S, Li G, Yuan Y. Explore public concerns about environmental protection on Sina Weibo: evidence from text mining. Environ Sci Pollut Res Int 2023; 30:104067-104085. [PMID: 37700122 DOI: 10.1007/s11356-023-29757-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/03/2023] [Indexed: 09/14/2023]
Abstract
The increasingly serious problem of ecological environmental pollution warns the importance of human environmental protection behavior. However, public attention to environmental protection plays an important role in solving environmental problems. Therefore, in order to explore the environmental concerns of Chinese residents, the trends in time and space, the relationship between online retweets, and the extraction of environmental concerns, this study analyzed the data of Sina Weibo users and their comments on related posts. At the same time, we used the text mining analysis method to analyze the social media text data, and the results are as follows. In that analysis of concern about environmental protection, women show a stronger attitude and willingness to protect the environment than men, and the public in economically developed areas is more concerned. In order to further investigate the public's environmental concerns, this study also utilized the PageRank algorithm to further study the forwarding relationships between users. The study found that celebrities and some good media organizations can attract environmental attention. Finally, we use pyLDAvis technology to visualize and analyze popular environmental themes and propose reasonable countermeasures and suggestions to enhance public environmental awareness based on the research results.
Collapse
Affiliation(s)
- Lifeng Yang
- School of Economics, Fuyang Normal University, Fuyang, 236037, China
| | - Shaotong Wu
- School of Business, Fuyang Normal University, Fuyang, 236037, China
| | - Guangxia Li
- School of Urban Economics and Management, Beijing University of Civil Engineering and Architecture, Beijing, 100000, China.
| | - Yunyun Yuan
- School of Management and Economics, Beijing Institute of Technology, Beijing, 100000, China
| |
Collapse
|
56
|
Vuori MA, Kiiskinen T, Pitkänen N, Kurki S, Laivuori H, Laitinen T, Mäntylahti S, Palotie A, FinnGen, Niiranen TJ. Use of electronic health record data mining for heart failure subtyping. BMC Res Notes 2023; 16:208. [PMID: 37697398 PMCID: PMC10496250 DOI: 10.1186/s13104-023-06469-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
OBJECTIVE To assess whether electronic health record (EHR) data text mining can be used to improve register-based heart failure (HF) subtyping. EHR data of 43,405 individuals from two Finnish hospital biobanks were mined for unstructured text mentions of ejection fraction (EF) and validated against clinical assessment in two sets of 100 randomly selected individuals. Structured laboratory data was then incorporated for a categorization by HF subtype (HF with mildly reduced EF, HFmrEF; HF with preserved EF, HFpEF; HF with reduced EF, HFrEF; and no HF). RESULTS In 86% of the cases, the algorithm-identified EF belonged to the correct HF subtype range. Sensitivity, specificity, PPV and NPV of the algorithm were 94-100% for HFrEF, 85-100% for HFmrEF, and 96%, 67%, 53% and 98% for HFpEF. Survival analyses using the traditional diagnosis of HF were in concordance with the algorithm-based ones. Compared to healthy individuals, mortality increased from HFmrEF (hazard ratio [HR], 1.91; 95% confidence interval [CI], 1.24-2.95) to HFpEF (2.28; 1.80-2.88) to HFrEF group (2.63; 1.97-3.50) over a follow-up of 1.5 years. We conclude that quantitative EF data can be efficiently extracted from EHRs and used with laboratory data to subtype HF with reasonable accuracy, especially for HFrEF.
Collapse
Affiliation(s)
- Matti A Vuori
- Division of Medicine, University of Turku, Kiinamyllynkatu 10, Turku, FI-20520, Finland.
- Turku University Hospital, Kiinamyllynkatu 4-8, Box 52, Turku, FI-20521, Finland.
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland.
| | - Tuomo Kiiskinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - Niina Pitkänen
- Auria Biobank, Kiinamyllynkatu 10, PO Box 30, Turku, FI-20520, Finland
| | - Samu Kurki
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
- Auria Biobank, Kiinamyllynkatu 10, PO Box 30, Turku, FI-20520, Finland
| | - Hannele Laivuori
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
- Centre for Child, Adolescent, and Maternal Health Research, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- Department of Obstetrics and Gynecology, Tampere University Hospital, Tampere, Finland
| | - Tarja Laitinen
- Administration Center, Tampere University Hospital and University of Tampere, P.O. Box 2000, Tampere, 33521, Finland
| | | | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - FinnGen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki, Finland
| | - Teemu J Niiranen
- Division of Medicine, University of Turku, Kiinamyllynkatu 10, Turku, FI-20520, Finland
- Turku University Hospital, Kiinamyllynkatu 4-8, Box 52, Turku, FI-20521, Finland
- Department of Public Health Solutions, Finnish Institute for Health and Welfare, PO Box 30, Helsinki, FI-00271, Finland
| |
Collapse
|
57
|
Schmidt L, Sinyor M, Webb RT, Marshall C, Knipe D, Eyles EC, John A, Gunnell D, Higgins JPT. A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses. Z Evid Fortbild Qual Gesundhwes 2023; 181:65-75. [PMID: 37596160 DOI: 10.1016/j.zefq.2023.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 06/19/2023] [Accepted: 06/25/2023] [Indexed: 08/20/2023]
Abstract
Living reviews are an increasingly popular research paradigm. The purpose of a 'living' approach is to allow rapid collation, appraisal and synthesis of evolving evidence on an important research topic, enabling timely influence on patient care and public health policy. However, living reviews are time- and resource-intensive. The accumulation of new evidence and the possibility of developments within the review's research topic can introduce unique challenges into the living review workflow. To investigate the potential of software tools to support living systematic or rapid reviews, we present a narrative review informed by an examination of tools contained on the Systematic Review Toolbox website. We identified 11 tools with relevant functionalities and discuss the important features of these tools with respect to different steps of the living review workflow. Four tools (NestedKnowledge, SWIFT-ActiveScreener, DistillerSR, EPPI-Reviewer) covered multiple, successive steps of the review process, and the remaining tools addressed specific components of the workflow, including scoping and protocol formulation, reference retrieval, automated data extraction, write-up and dissemination of data. We identify several ways in which living reviews can be made more efficient and practical. Most of these focus on general workflow management, or automation through artificial intelligence and machine-learning, in the screening process. More sophisticated uses of automation mostly target living rapid reviews to increase the speed of production or evidence maps to broaden the scope of the map. We use a case study to highlight some of the barriers and challenges to incorporating tools into the living review workflow and processes. These include increased workload, the need for organisation, ensuring timely dissemination and challenges related to the development of bespoke automation tools to facilitate the review process. We describe how current end-user tools address these challenges, and which knowledge gaps remain that could be addressed by future tool development. Dedicated web presences for automatic dissemination of in-progress evidence updates, rather than solely relying on peer-reviewed journal publications, help to make the effort of a living evidence synthesis worthwhile. Despite offering basic living review functionalities, existing end-user tools could be further developed to be interoperable with other tools to support multiple workflow steps seamlessly, to address broader automatic evidence retrieval from a larger variety of sources, and to improve dissemination of evidence between review updates.
Collapse
Affiliation(s)
- Lena Schmidt
- National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle, UK; Sciome LLC, Research Triangle Park, North Carolina, USA.
| | - Mark Sinyor
- Department of Psychiatry, Sunnybrook Health Sciences Centre, Toronto, Canada; Department of Psychiatry, University of Toronto, Toronto, Canada
| | - Roger T Webb
- Division of Psychology and Mental Health, The University of Manchester, Manchester, UK; National Institute for Health and Care Research Greater Manchester Patient Safety Translational Research Centre (NIHR GM PSTRC), Manchester, UK
| | | | - Duleeka Knipe
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Emily C Eyles
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; The National Institute of Health and Care Research Applied Research Collaboration West (NIHR ARC West), University Hospitals Bristol NHS Foundation Trust, Bristol, UK
| | - Ann John
- Population Data Science, Swansea University, Swansea, UK; Public Health Wales NHS Trust, Wales, UK
| | - David Gunnell
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; The National Institute of Health and Care Research Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and the University of Bristol, Bristol, UK
| | - Julian P T Higgins
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; The National Institute of Health and Care Research Applied Research Collaboration West (NIHR ARC West), University Hospitals Bristol NHS Foundation Trust, Bristol, UK; The National Institute of Health and Care Research Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and the University of Bristol, Bristol, UK
| |
Collapse
|
58
|
Fuller K, Lupton-Smith C, Hubal R, McLaughlin JE. Automated Analysis of Preceptor Comments: A Pilot Study Using Sentiment Analysis to Identify Potential Student Issues in Experiential Education. Am J Pharm Educ 2023; 87:100005. [PMID: 37714650 DOI: 10.1016/j.ajpe.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
OBJECTIVE The purpose of this paper is to describe a sentiment analysis program that aids in identifying pharmacy students at risk for progression issues by automatically scoring preceptor comments as positive or negative. METHODS An R-based program to analyze advanced pharmacy practice experiences and introductory pharmacy practice experiences midpoint evaluation of preceptor comments was piloted in phase 1 by comparing the sentiment analysis algorithm results to human coding. The algorithm was refined in phase 2. In phase 3, the validation phase, the final sentiment analysis algorithm analyzed all midpoint student evaluations (n = 1560). Sentiment scores were generated for each preceptor comment, and correlations were performed between sentiment scores and the quantitative scoring provided on the assessment. RESULTS In phase 1, agreement between faculty coders and sentiment analysis was 96%, and in phase 2, agreement between the final codes and sentiment analysis was 92.4% once keywords were added to the sentiment dictionary. In phase 3, a total of 3919 comments from 1560 evaluations were analyzed, and overall, the sentiment analysis results aligned with the quantitative data. CONCLUSION This sentiment analysis algorithm was accurate in capturing positive and negative comments corresponding to pharmacy student performance. Given the accuracy of this preliminary validation for flagging preceptor comments, there are numerous implications when considering the use of sentiment analysis in pharmacy education. Using a sentiment analysis program minimizes the number of qualitative preceptor comments needing review by experiential faculty, as this program can aid in identifying students at risk of progression issues.
Collapse
Affiliation(s)
- Kathryn Fuller
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | - Carly Lupton-Smith
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Robert Hubal
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
59
|
Pu Y, Beck D, Verspoor K. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023; 145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]
Abstract
OBJECTIVE We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.
Collapse
Affiliation(s)
- Yiyuan Pu
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Daniel Beck
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia; School of Computing Technologies, RMIT University, Melbourne, Victoria, Australia.
| |
Collapse
|
60
|
VanSchaik JT, Jain P, Rajapuri A, Cheriyan B, Thyvalikakath TP, Chakraborty S. Using transfer learning-based causality extraction to mine latent factors for Sjögren's syndrome from biomedical literature. Heliyon 2023; 9:e19265. [PMID: 37809371 PMCID: PMC10558331 DOI: 10.1016/j.heliyon.2023.e19265] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/11/2023] [Accepted: 08/15/2023] [Indexed: 10/10/2023] Open
Abstract
Understanding causality is a longstanding goal across many different domains. Different articles, such as those published in medical journals, disseminate newly discovered knowledge that is often causal. In this paper, we use this intuition to build a model that leverages causal relations to unearth factors related to Sjögren's syndrome from biomedical literature. Sjögren's syndrome is an autoimmune disease affecting up to 3.1 million Americans. Due to the uncommon nature of the illness, symptoms across different specialties coupled with common symptoms of other autoimmune conditions such as rheumatoid arthritis, it is difficult for clinicians to diagnose the disease timely. Due to the lack of a dedicated dataset for causal relationships built from biomedical literature, we propose a transfer learning-based approach, where the relationship extraction model is trained on a wide variety of datasets. We conduct an empirical analysis of numerous neural network architectures and data transfer strategies for causal relation extraction. By conducting experiments with various contextual embedding layers and architectural components, we show that an ELECTRA-based sentence-level relation extraction model generalizes better than other architectures across varying web-based sources and annotation strategies. We use this empirical observation to create a pipeline for identifying causal sentences from literature text, extracting the causal relationships from causal sentences, and building a causal network consisting of latent factors related to Sjögren's syndrome. We show that our approach can retrieve such factors with high precision and recall values. Comparative experiments show that this approach leads to 25% improvement in retrieval F1-score compared to several state-of-the-art biomedical models, including BioBERT and Gram-CNN. We apply this model to a corpus of research articles related to Sjögren's syndrome collected from PubMed to create a causal network for Sjögren's syndrome. The proposed causal network for Sjögren's syndrome will potentially help clinicians with a holistic knowledge base for faster diagnosis.
Collapse
Affiliation(s)
- Jack T. VanSchaik
- Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, 46202, IN, USA
| | - Palak Jain
- Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, 46202, IN, USA
| | - Anushri Rajapuri
- Indiana University School of Dentistry, Indianapolis, 46202, IN, USA
| | - Biju Cheriyan
- Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, 46202, IN, USA
| | - Thankam P. Thyvalikakath
- Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, 46202, IN, USA
- Indiana University School of Dentistry, Indianapolis, 46202, IN, USA
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, 46202, IN, USA
| | - Sunandan Chakraborty
- Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, 46202, IN, USA
| |
Collapse
|
61
|
Lyons EL, Watson D, Alodadi MS, Haugabook SJ, Tawa GJ, Hannah-Shmouni F, Porter FD, Collins JR, Ottinger EA, Mudunuri US. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus. BMC Genomics 2023; 24:460. [PMID: 37587458 PMCID: PMC10433598 DOI: 10.1186/s12864-023-09561-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/08/2023] [Indexed: 08/18/2023] Open
Abstract
BACKGROUND Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.
Collapse
Affiliation(s)
- Erica L Lyons
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Daniel Watson
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Mohammad S Alodadi
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Sharie J Haugabook
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Gregory J Tawa
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Fady Hannah-Shmouni
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Forbes D Porter
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Jack R Collins
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Elizabeth A Ottinger
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Uma S Mudunuri
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
| |
Collapse
|
62
|
Cenikj G, Eftimov T, Koroušić Seljak B. FooDis: A food-disease relation mining pipeline. Artif Intell Med 2023; 142:102586. [PMID: 37316100 DOI: 10.1016/j.artmed.2023.102586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 04/07/2023] [Accepted: 05/16/2023] [Indexed: 06/16/2023]
Abstract
Nowadays, it is really important and crucial to follow the new biomedical knowledge that is presented in scientific literature. To this end, Information Extraction pipelines can help to automatically extract meaningful relations from textual data that further require additional checks by domain experts. In the last two decades, a lot of work has been performed for extracting relations between phenotype and health concepts, however, the relations with food entities which are one of the most important environmental concepts have never been explored. In this study, we propose FooDis, a novel Information Extraction pipeline that employs state-of-the-art approaches in Natural Language Processing to mine abstracts of biomedical scientific papers and automatically suggests potential cause or treat relations between food and disease entities in different existing semantic resources. A comparison with already known relations indicates that the relations predicted by our pipeline match for 90% of the food-disease pairs that are common in our results and the NutriChem database, and 93% of the common pairs in the DietRx platform. The comparison also shows that the FooDis pipeline can suggest relations with high precision. The FooDis pipeline can be further used to dynamically discover new relations between food and diseases that should be checked by domain experts and further used to populate some of the existing resources used by NutriChem and DietRx.
Collapse
Affiliation(s)
- Gjorgjina Cenikj
- Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia; Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia.
| | - Tome Eftimov
- Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.
| | | |
Collapse
|
63
|
Zhao T, Sun S, Gao Y, Rong Y, Wang H, Qi S, Li Y. Luteolin and triptolide: Potential therapeutic compounds for post-stroke depression via protein STAT. Heliyon 2023; 9:e18622. [PMID: 37600392 PMCID: PMC10432979 DOI: 10.1016/j.heliyon.2023.e18622] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 07/18/2023] [Accepted: 07/24/2023] [Indexed: 08/22/2023] Open
Abstract
Post stroke depression (PSD) is a common neuropsychiatric complication following stroke closely associated with the immune system. The development of medications for PSD remains to be a considerable challenge due to the unclear mechanism of PSD. Multiple researches agree that the functions of gene ontology (GO) are efficient for the investigation of disease mechanisms, and DeepPurpose (DP) is extremely valuable for the mining of new drugs. However, GO terms and DP have not yet been applied to explore the pathogenesis and drug treatment of PSD. This study aimed to interpret the mechanism of PSD and discover important drug candidates targeting risk proteins, based on immune-related risk GO functions and informatics algorithms. According to the risk genes of PSD, we identified 335 immune-related risk GO functions and 37 compounds. Based on the construction of the GO function network, we found that STAT protein may be a pivot protein in underlying the mechanism of PSD. Additionally, we also established networks of Protein-Protein Interaction as well as Gene-GO function to facilitate the evaluation of key genes. Based on DP, a total of 37 candidate compounds targeting 7 key proteins were identified with a potential for the therapy of PSD. Furthermore, we noted that the mechanisms by which luteolin and triptolide acting on STAT-related GO function might involve three crucial pathways, including specifically hsa04010 (MAPK signaling pathway), hsa04151 (PI3K-Akt signaling pathway) and hsa04060 (Cytokine-cytokine receptor interaction). Thus, this study provided fresh and powerful information for the mechanism and therapeutic strategies of PSD.
Collapse
Affiliation(s)
- Tianyang Zhao
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Siqi Sun
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yueyue Gao
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuting Rong
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hanwenchen Wang
- The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Sihua Qi
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yan Li
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
64
|
Zhu Y, Liao H, Huang D. Using text mining and multilevel association rules to process and analyze incident reports in China. Accid Anal Prev 2023; 191:107224. [PMID: 37506406 DOI: 10.1016/j.aap.2023.107224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 01/24/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Incident investigation reports provide information on defects related to the system safety and indications for improvements. Currently, the analysis of these reports relies heavily on expert' experience. The foreseeable work-load and lack of understanding about the importance of near misses have created a situation where severe accidents are rigorously investigated, and minor incidents are often omitted. Consequently, incident reports have not been fully analyzed to provide sufficient solutions. The aim of this research is to propose a framework that uses text mining and multilevel association rules to efficiently structure Chinese incident reports and identify important incident patterns, providing an analysis of trends, rectification strategies, and guidance for safety management. A case study of a construction company in China was conducted using two years of incident data dated 2018-2019, including accidents and near misses. To identify incident elements, a pattern extraction workflow involving TextRank, and domain pertinence was devised based on the linguistic and writing styles of Chinese reports. A concept hierarchy was applied to determine the taxonomic relationships within the risk factors. Multilevel association rule mining was adopted and proven to deliver more comprehensive pattern indications. Comparative and cross-analysis of patterns in different time periods revealed the severity and temporal features of incidents as well as the effectiveness of preventive and precautionary measures. The results also highlight the importance of learning from near miss events. Decision makers can formulate countermeasures and management policies based on these results to improve safety performance.
Collapse
Affiliation(s)
- Yuqian Zhu
- School of Resources and Safety Engineering, Central South University, Changsha 410006, China
| | - Huimin Liao
- School of Resources and Safety Engineering, Central South University, Changsha 410006, China.
| | - Dengchi Huang
- Institute of Technology, Sichuan Normal University, Chengdu 610101, China
| |
Collapse
|
65
|
Kafkas Ș, Abdelhakim M, Uludag M, Althagafi A, Alghamdi M, Hoehndorf R. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes. BMC Bioinformatics 2023; 24:294. [PMID: 37479972 PMCID: PMC10362560 DOI: 10.1186/s12859-023-05406-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/10/2023] [Indexed: 07/23/2023] Open
Abstract
BACKGROUND Identifying variants associated with diseases is a challenging task in medical genetics research. Current studies that prioritize variants within individual genomes generally rely on known variants, evidence from literature and genomes, and patient symptoms and clinical signs. The functionalities of the existing tools, which rank variants based on given patient symptoms and clinical signs, are restricted to the coverage of ontologies such as the Human Phenotype Ontology (HPO). However, most clinicians do not limit themselves to HPO while describing patient symptoms/signs and their associated variants/genes. There is thus a need for an automated tool that can prioritize variants based on freely expressed patient symptoms and clinical signs. RESULTS STARVar is a Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes. STARVar uses patient symptoms and clinical signs, either linked to HPO or expressed in free text format. It returns a ranked list of variants based on a combined score from two classifiers utilizing evidence from genomics and literature. STARVar improves over related tools on a set of synthetic patients. In addition, we demonstrated its distinct contribution to the domain on another synthetic dataset covering publicly available clinical genotype-phenotype associations by using symptoms and clinical signs expressed in free text format. CONCLUSIONS STARVar stands as a unique and efficient tool that has the advantage of ranking variants with flexibly expressed patient symptoms in free-form text. Therefore, STARVar can be easily integrated into bioinformatics workflows designed to analyze disease-associated genomes. AVAILABILITY STARVar is freely available from https://github.com/bio-ontology-research-group/STARVar .
Collapse
Affiliation(s)
- Șenay Kafkas
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Marwa Abdelhakim
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Mahmut Uludag
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, 21655 Taif, Saudi Arabia
| | - Malak Alghamdi
- Medical Genetic Division, Department of Pediatrics, College of Medicine, King Saud University, 2925 Riyadh, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
66
|
Otsuka K, Takata T, Sasaki H, Shikano M. Horizon Scanning in Tissue Engineering Using Citation Network Analysis. Ther Innov Regul Sci 2023; 57:810-822. [PMID: 37204641 PMCID: PMC10276778 DOI: 10.1007/s43441-023-00529-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 04/28/2023] [Indexed: 05/20/2023]
Abstract
BACKGROUND Establishing a horizon scanning method is critical for identifying technologies that require new guidelines or regulations. We studied the application of bibliographic citation network analysis to horizon scanning. OBJECTIVE The possibility of applying the proposed method to interdisciplinary fields was investigated with the emphasis on tissue engineering and its example, three-dimensional bio-printing. METHODOLOGY AND RESULTS In all, 233,968 articles on tissue engineering, regenerative medicine, biofabrication, and additive manufacturing published between January 1, 1900 and November 3, 2021 were obtained from the Web of Science Core Collection. The citation network of the articles was analyzed for confirmation that the evolution of 3D bio-printing is reflected by tracking the key articles in the field. However, the results revealed that the major articles on the clinical application of 3D bio-printed products are located in clusters other than that of 3D bio-printers. We investigated the research trends in this field by analyzing the articles published between 2019 and 2021 and detected various basic technologies constituting tissue engineering, including microfluidics and scaffolds such as electrospinning and conductive polymers. The results suggested that the research trend of technologies required for product development and future clinical applications of the product are sometimes detected independently by bibliographic citation network analysis, particularly for interdisciplinary fields. CONCLUSION This method can be applied to the horizon scanning of an interdisciplinary field. However, identifying basic technologies of the targeted field and following the progress of research and the integration process of each component of technology are critical.
Collapse
Affiliation(s)
- Kouhei Otsuka
- Faculty of Pharmaceutical Sciences, Tokyo University of Science, Tokyo, Japan
| | - Takuya Takata
- Faculty of Pharmaceutical Sciences, Tokyo University of Science, Tokyo, Japan
| | - Hajime Sasaki
- Institute for Future Initiatives, The University of Tokyo, Tokyo, Japan
| | - Mayumi Shikano
- Faculty of Pharmaceutical Sciences, Tokyo University of Science, Tokyo, Japan.
| |
Collapse
|
67
|
Vora J, Navelkar R, Vijay-Shanker K, Edwards N, Martinez K, Ding X, Wang T, Su P, Ross K, Lisacek F, Hayes C, Kahsay R, Ranzinger R, Tiemeyer M, Mazumder R. The Glycan Structure Dictionary-a dictionary describing commonly used glycan structure terms. Glycobiology 2023; 33:354-357. [PMID: 36799723 PMCID: PMC10243773 DOI: 10.1093/glycob/cwad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 01/28/2023] [Accepted: 02/08/2023] [Indexed: 02/18/2023] Open
Abstract
Recent technological advances in glycobiology have resulted in a large influx of data and the publication of many papers describing discoveries in glycoscience. However, the terms used in describing glycan structural features are not standardized, making it difficult to harmonize data across biomolecular databases, hampering the harvesting of information across studies and hindering text mining and curation efforts. To address this shortcoming, the Glycan Structure Dictionary has been developed as a reference dictionary to provide a standardized list of widely used glycan terms that can help in the curation and mapping of glycan structures described in publications. Currently, the dictionary has 190 glycan structure terms with 297 synonyms linked to 3,332 publications. For a term to be included in the dictionary, it must be present in at least 2 peer-reviewed publications. Synonyms, annotations, and cross-references to GlyTouCan, GlycoMotif, and other relevant databases and resources are also provided when available. The purpose of this effort is to facilitate biocuration, assist in the development of text mining tools, improve the harmonization of search, and browse capabilities in glycoinformatics resources and help to map glycan structures to function and disease. It is also expected that authors will use these terms to describe glycan structures in their manuscripts over time. A mechanism is also provided for researchers to submit terms for potential incorporation. The dictionary is available at https://wiki.glygen.org/Glycan_structure_dictionary.
Collapse
Affiliation(s)
- Jeet Vora
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - Rahi Navelkar
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - K Vijay-Shanker
- Department of Computer and Information Science, University of Delaware, Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Nathan Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, 3900 Reservoir Rd NW #337, DC 20007, USA
| | - Karina Martinez
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - Xiying Ding
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - Tianyi Wang
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - Peng Su
- Department of Computer and Information Science, University of Delaware, Smith Hall, 18 Amstel Ave Newark, DE 19716, USA
| | - Karen Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, 3900 Reservoir Rd NW #337, DC 20007, USA
| | - Frederique Lisacek
- University of Geneva and Swiss Institute of Bioinformatics, CUI - 7, route de Drize, Geneva 1211, Switzerland
| | - Catherine Hayes
- University of Geneva and Swiss Institute of Bioinformatics, CUI - 7, route de Drize, Geneva 1211, Switzerland
| | - Robel Kahsay
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| | - Rene Ranzinger
- Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Rd, Athens, GA 30602, USA
| | - Michael Tiemeyer
- Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Rd, Athens, GA 30602, USA
| | - Raja Mazumder
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, 2300 I Street NW, Washington, DC 20037, USA
| |
Collapse
|
68
|
Jaylet T, Coustillet T, Jornod F, Margaritte-Jeannin P, Audouze K. AOP-helpFinder 2.0: Integration of an event-event searches module. Environ Int 2023; 177:108017. [PMID: 37295163 DOI: 10.1016/j.envint.2023.108017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 04/25/2023] [Accepted: 06/01/2023] [Indexed: 06/12/2023]
Abstract
To support the use of alternative methods in regulatory assessment of chemical risks, the concept of adverse outcome pathway (AOP) constitutes an important toxicological tool. AOP represents a structured representation of existing knowledge, linking molecular initiating event (MIE) initiated by a prototypical stressor that leads to a cascade of biological key event (KE) to an adverse outcome (AO). Biological information to develop such AOP is very dispersed in various data sources. To increase the chance of capturing relevant existing data to develop a new AOP, the AOP-helpFinder tool was recently implemented to assist researchers to design new AOP. Here, an updated version of AOP-helpFinder proposes novel functionalities. The main one being the implementation of an automatic screening of the abstracts from the PubMed database to identify and extract event-event associations. In addition, a new scoring system was created to classify the identified co-occurred terms (stressor-event or event-event (which represent key event relationships) to help prioritization and support the weight of evidence approach, allowing a global assessment of the strength and reliability of the AOP. Moreover, to facilitate interpretation of the results, visualization options are also proposed. The AOP-helpFinder source code are fully accessible via GitHub, and searches can be performed via a web interface at http://aop-helpfinder-v2.u-paris-sciences.fr/.
Collapse
Affiliation(s)
- Thomas Jaylet
- Université Paris Cité, Inserm U1124, 45 rue des Saints Pères, 75006 Paris, France
| | - Thibaut Coustillet
- Université Paris Cité, Inserm U1124, 45 rue des Saints Pères, 75006 Paris, France
| | - Florence Jornod
- Université Paris Cité, Inserm U1124, 45 rue des Saints Pères, 75006 Paris, France
| | | | - Karine Audouze
- Université Paris Cité, Inserm U1124, 45 rue des Saints Pères, 75006 Paris, France.
| |
Collapse
|
69
|
Moussa HN, Mourhir A. DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect. Data Brief 2023; 48:109234. [PMID: 37383818 PMCID: PMC10293988 DOI: 10.1016/j.dib.2023.109234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/08/2023] [Accepted: 05/09/2023] [Indexed: 06/30/2023] Open
Abstract
DarNERcorp is a manually annotated named entity recognition (NER) dataset in the Moroccan dialect, also called Darija. The dataset consists of 65,905 tokens and their corresponding tags according to BIO scheme. 13.8% of the tokens are named entities spanning four categories: person, location, organization, and miscellaneous. The data were scraped from the Moroccan Dialect section of Wikipedia and processed and annotated using open-source libraries and tools. The data are useful for the Arabic natural language processing (NLP) community as they address the lack in dialectal Arabic annotated corpora. This dataset can be used to train and evaluate named entity recognition systems in dialectal and mixed Arabic.
Collapse
|
70
|
Artner-Nehls A, Uthes S. Slurry Tales: Newspaper Coverage of Livestock Slurry Reproduces Public Discourse on Agriculture in Germany. Environ Manage 2023; 71:1213-1227. [PMID: 36781453 PMCID: PMC10183430 DOI: 10.1007/s00267-023-01798-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 01/29/2023] [Indexed: 05/15/2023]
Abstract
The rapid transition of livestock husbandry in the 20th century involved a broad adoption of slurry-based livestock housing systems that resulted in farm economic benefits, but also in societal debate related to the environment and animal welfare. In this article, we apply the method of topic modeling to four major German newspapers to identify thematic emphases and changes in coverage around "slurry". We considered more than 2300 articles published between 1971 and 2020. Our results show that reporting encompasses economic, environmental, and social topics in which slurry is represented mostly critically ("poisonous substance"), occasionally neutrally ("scent of countryside"), or rarely positively ("input for the bioeconomy"). Three meta-themes overarch the majority of issues and reflect public discourse on agriculture: (i) the dichotomy of agricultural industrialization and family farming; (ii) contrasting actualities of factory farming and animal welfare; and (iii) the responsibility of policy for the emergence, existence and solution of livestock and slurry-related problems. A more balanced recognition of mutual values and constraints by the media could contribute to a discursive reconciliation of public and private interests.
Collapse
Affiliation(s)
- Astrid Artner-Nehls
- Leibniz Centre for Agricultural Landscape Research (ZALF), Müncheberg, Germany.
| | - Sandra Uthes
- Leibniz Centre for Agricultural Landscape Research (ZALF), Müncheberg, Germany
| |
Collapse
|
71
|
Chandrasekaran R, Bapat P, Venkata PJ, Moustakas E. Face time with physicians: How do patients assess providers in video-visits? Heliyon 2023; 9:e16883. [PMID: 37292342 PMCID: PMC10238118 DOI: 10.1016/j.heliyon.2023.e16883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/30/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction The COVID-19 pandemic has triggered a massive acceleration in the use of virtual and video-visits. As more patients and providers engage in video-visits over varied digital platforms, it is important to understand how patients assess their providers and the video-visit experiences. We also need to examine the relative importance of the factors that patients use in their assessment of video-visits in order to improve the overall healthcare experience and delivery. Methods A data set of 5149 reviews of patients completing a video-visit was assembled through web scraping. Sentiment analysis was performed on the reviews and topic modeling was used to extract latent topics embedded in the reviews and their relative importance. Results Most patient reviews (89.53%) reported a positive sentiment towards their providers in video-visits. Seven distinct topics underlying the reviews were identified: bedside manners, professional expertise, virtual experience, appointment scheduling and follow-up process, wait times, costs, and communication. Communication, bedside manners and professional expertise were the top factors patients alluded to in the positive reviews. Appointment-scheduling and follow-ups, wait-times, costs, virtual experience and professional expertise were important factors in the negative reviews. Discussion To improve the overall experience of patients in video-visits, providers need to engage in clear communication, grow excellent bedside and webside manners, promptly attend the video-visit with minimal delays and follow-up with patients after the visit.
Collapse
Affiliation(s)
| | - Prathamesh Bapat
- Department of Information & Decision Sciences, University of Illinois at Chicago, USA
| | | | | |
Collapse
|
72
|
Ding K, Niu Y, Choo WC. The evolution of Airbnb research: A systematic literature review using structural topic modeling. Heliyon 2023; 9:e17090. [PMID: 37484274 PMCID: PMC10361235 DOI: 10.1016/j.heliyon.2023.e17090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/01/2023] [Accepted: 06/07/2023] [Indexed: 07/25/2023] Open
Abstract
This study employs advanced text-mining techniques to offer an in-depth and comprehensive overview of the extensive body of research on Airbnb. By analyzing 1021 articles published in 416 journals spanning the period from 2015 to 2022, this study aims at revealing Airbnb research topics and trends. The results show that the primary focus of academic inquiry regarding Airbnb revolves around two domains: the company's operational practices and its impacts on various domains. Within the realm of Airbnb's operational practices, four distinct research topics emerge as particularly prominent and extensively explored. These encompass the dynamics of 'trust in Airbnb,' the formulation and implementation of 'house rules,' the mechanisms of governing 'Airbnb pricing' strategies, and the critical examination of 'value creation in Airbnb' initiatives. Meanwhile, the most researched impacts of Airbnb are on urban tourism, rental housing markets, tourist destinations, and hotels. These spheres have received significant scholarly attention due to the profound implications and transformative effects engendered by Airbnb's disruptive presence in these areas. Moreover, the findings underscore that research pertaining to Airbnb's operational aspects has witnessed a significant increase in popularity over time, indicating a marked shift in the focal points of Airbnb research. Notably, the research topics that have experienced substantial growth include 'trust in Airbnb,' 'Airbnb pricing,' and 'impacts on tourist destinations.' Lastly, this study found that Airbnb-related research articles in hospitality and tourism journals tend to be more delving into industry-specific phenomena and challenges. Conversely, non-hospitality and tourism journals provide a broader coverage of topics related to Airbnb, encapsulating diverse areas of inquiry beyond the boundaries of the industry. This literature review provides valuable insights into existing research on Airbnb and highlights several critical areas for future research.
Collapse
Affiliation(s)
- Kai Ding
- School of Business Administration, Ningbo University of Finance and Economics, Ningbo, China
| | - Yue Niu
- Department of Applied Psychology, University of Nottingham Malaysia Campus, Semenyih, Malaysia
| | - Wei Chong Choo
- School of Business and Economics, Universiti Putra Malaysia, Serdang, Malaysia
- Institute for Mathematical Research, Universiti Putra Malaysia, Serdang, Malaysia
| |
Collapse
|
73
|
Yuan Z, Hu W. Urban resilience to socioeconomic disruptions during the COVID-19 pandemic: Evidence from China. Int J Disaster Risk Reduct 2023; 91:103670. [PMID: 37041883 PMCID: PMC10073087 DOI: 10.1016/j.ijdrr.2023.103670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 05/05/2023]
Abstract
The COVID-19 pandemic and the associated restrictions have raised the awareness of building pandemic-resilient cities. Prior studies often evaluated the resilience of one type of urban system while lacking a comparison across various urban subsystems. This study fills this gap by measuring and comparing the adaptive resilience to the pandemic of various urban subsystems in Chinese cities. We propose a novel outcome measurement of the pandemic's socioeconomic impacts on cities, i.e., the citizens' complaints data, and use its temporal changes to measure cities' adaptive resilience to the pandemic. We find a wide range of urban subsystems were severely shocked by the pandemic, including the urban economy, construction-and-housing sector, welfare system, and education system. Different urban subsystems exhibit divergent degrees of adaptive resilience to the pandemic. Using cluster analysis, we also identify three types of cities with different patterns of adaptive resilience: cities whose general economies were the least resilient, cities whose construction-and-housing system was the least resilient, and cities that were mostly affected by restriction measures. Our findings contribute to the understanding of the pandemic's socioeconomic costs and help identify the divergent resilience of different urban subsystems so as to develop targeted policy interventions to improve cities' resilience to the pandemic.
Collapse
Affiliation(s)
- Zhihang Yuan
- Department of Public and International Affairs, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Wanyang Hu
- Department of Public and International Affairs, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| |
Collapse
|
74
|
Schlicht IB, Fernandez E, Chulvi B, Rosso P. Automatic detection of health misinformation: a systematic review. J Ambient Intell Humaniz Comput 2023:1-13. [PMID: 37360776 PMCID: PMC10220340 DOI: 10.1007/s12652-023-04619-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 04/30/2023] [Indexed: 06/28/2023]
Abstract
The spread of health misinformation has the potential to cause serious harm to public health, from leading to vaccine hesitancy to adoption of unproven disease treatments. In addition, it could have other effects on society such as an increase in hate speech towards ethnic groups or medical experts. To counteract the sheer amount of misinformation, there is a need to use automatic detection methods. In this paper we conduct a systematic review of the computer science literature exploring text mining techniques and machine learning methods to detect health misinformation. To organize the reviewed papers, we propose a taxonomy, examine publicly available datasets, and conduct a content-based analysis to investigate analogies and differences among Covid-19 datasets and datasets related to other health domains. Finally, we describe open challenges and conclude with future directions.
Collapse
Affiliation(s)
| | | | - Berta Chulvi
- Universitat Politècnica de València, Valencia, Spain
| | - Paolo Rosso
- Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
75
|
Gurcan F. What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities. PeerJ Comput Sci 2023; 9:e1361. [PMID: 37346688 PMCID: PMC10280584 DOI: 10.7717/peerj-cs.1361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/31/2023] [Indexed: 06/23/2023]
Abstract
Background Because of the growing involvement of communities from various disciplines, data science is constantly evolving and gaining popularity. The growing interest in data science-based services and applications presents numerous challenges for their development. Therefore, data scientists frequently turn to various forums, particularly domain-specific Q&A websites, to solve difficulties. These websites evolve into data science knowledge repositories over time. Analysis of such repositories can provide valuable insights into the applications, topics, trends, and challenges of data science. Methods In this article, we investigated what data scientists are asking by analyzing all posts to date on DSSE, a data science-focused Q&A website. To discover main topics embedded in data science discussions, we used latent Dirichlet allocation (LDA), a probabilistic approach for topic modeling. Results As a result of this analysis, 18 main topics were identified that demonstrate the current interests and issues in data science. We then examined the topics' popularity and difficulty. In addition, we identified the most commonly used tasks, techniques, and tools in data science. As a result, "Model Training", "Machine Learning", and "Neural Networks" emerged as the most prominent topics. Also, "Data Manipulation", "Coding Errors", and "Tools" were identified as the most viewed (most popular) topics. On the other hand, the most difficult topics were identified as "Time Series", "Computer Vision", and "Recommendation Systems". Our findings have significant implications for many data science stakeholders who are striving to advance data-driven architectures, concepts, tools, and techniques.
Collapse
|
76
|
Xu Z, Chan CS, Fung J, Tsang C, Zhang Q, Xu Y, Cheung F, Cheng W, Chan E, Yip PSF. Developing and validating a parser-based suicidality detection model in text-based mental health services. J Affect Disord 2023; 335:228-232. [PMID: 37150217 DOI: 10.1016/j.jad.2023.04.128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 03/19/2023] [Accepted: 04/29/2023] [Indexed: 05/09/2023]
Abstract
BACKGROUND Advances in text-mining can potentially aid online text-based mental health services in detecting suicidality. However, false positive remains a challenge. METHODS Data of a free 24/7 online text-based counseling service in Hong Kong were used to develop a novel parser-based algorithm (PBSD) to detect suicidal ideation while minimizing false alarms. Sessions containing keywords related to suicidality were extracted (N = 1267). PBSD first applies a sentence parser to work out the grammatical structure of each sentence, including subject, object, dependent and modifier. Then a set of syntax rules were applied to judge if a flagged sentence is a true or false positive. Half of the sessions were randomly selected to train PBSD, the remaining were used as the test set. A standard keywords matching model was adopted as the baseline comparison. Accuracy and recall were reported to demonstrate models' performance. RESULTS Of the 1267 sessions, 585 (46.2 %) were false alarms. The false alarms were categorized into four types: negation-induced false alarms (NIFA; 14 %), subject-induced false alarms (SIFA; 19 %), tense-induced false alarms (TIFA; 30 %), and other types of false alarms (OTFA; 37 %). PBSD significantly outperforms the baseline keywords matching model on accuracy (0.68 vs 0.53, 28.3 %). It successfully amended 36.8 % (105/297) lexicon matching-caused false alarms. The reduction on recall was marginal (1 vs 0.96, 4 %). CONCLUSIONS The proposed model significantly improves the use of lexicon-based method by reducing false alarms and improving the accuracy of suicidality detection. It can potentially reduce unnecessary panic and distraction caused by false alarms among frontline service-providers.
Collapse
Affiliation(s)
- Zhongzhi Xu
- School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Christian S Chan
- Department of Psychology, The University of Hong Kong, Hong Kong; International Christian University, Tokyo, Japan.
| | - Jerry Fung
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong
| | - Christy Tsang
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong
| | - Qingpeng Zhang
- School of Data Science, City University of Hong Kong, Hong Kong
| | - Yucan Xu
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong
| | - Florence Cheung
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong
| | - Weibin Cheng
- School of Data Science, City University of Hong Kong, Hong Kong
| | - Evangeline Chan
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong
| | - Paul S F Yip
- Hong Kong Jockey Club Centre for Suicide Research and Prevention, The University of Hong Kong, Hong Kong.
| |
Collapse
|
77
|
Ashraf M, Ahammad SZ, Chakma S. Advancements in the dominion of fate and transport of pharmaceuticals and personal care products in the environment-a bibliometric study. Environ Sci Pollut Res Int 2023; 30:64313-64341. [PMID: 37067715 PMCID: PMC10108824 DOI: 10.1007/s11356-023-26796-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The study on the fate and transport of Pharmaceuticals and Personal Care Products, PPCPs (FTP) in the environment, has received particular attention for over two decades. The PPCPs threaten ecology and human health even at low concentrations due to their synergistic effects and long-range transport. The research aims to provide an inclusive map of the scientific background of FTP research over the last 25 years, from 1996 to 2020, to identify the main characteristics, evolution, salient research themes, trends, and research hotspots in the field of interest. Bibliometric networks were synthesized and analyzed for 577 journal articles extracted from the Scopus database. Consequently, seven major themes of FTP research were identified as follows: (i) PPCPs category; (ii) hazardous effects; (iii) occurrence of PPCPs; (iv) PPCPs in organisms; (v) remediation; (vi) FTP-governing processes; and (vii) assessment in the environment. The themes gave an in-depth picture of the sources of PPCPs and their transport and fate processes in the environment, which originated from sewage treatment plants and transported further to sediment/soils/groundwater/oceans that act as the PPCPs' major sink. The article provided a rigorous analysis of the research landscape in the FTP study conducted during the specified years. The prominent research themes, content analysis, and research hotspots identified in the study may serve as the basis of real-time guidance to lead future research areas and a prior review for policymakers and practitioners.
Collapse
Affiliation(s)
- Maliha Ashraf
- School of Interdisciplinary Research, Indian Institute of Technology, Delhi, India.
| | - Shaikh Ziauddin Ahammad
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology, Delhi, India
| | - Sumedha Chakma
- Department of Civil Engineering, Indian Institute of Technology, Delhi, India
| |
Collapse
|
78
|
Yuting P, Yinfeng J, Jingli Z. Current status of digital humanities research in Taiwan. Heliyon 2023; 9:e15851. [PMID: 37223717 PMCID: PMC10200843 DOI: 10.1016/j.heliyon.2023.e15851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 11/27/2022] [Accepted: 04/24/2023] [Indexed: 05/25/2023] Open
Abstract
Purpose Review the current research status of the theory, techniques, and practice of digital humanities in Taiwan. Methods Select the 8 issues of the Journal of Digital Archives and Digital Humanities from its inception in 2018-2021, and the papers of the 5-year International Conference of Digital Archives and Digital Humanities from 2017 to 2021 as the research data, and conduct text analysis of the collected 252 articles. Results From the statistical analysis results, the number of practical articles is the largest, followed by tools and techniques, and the least number of theoretical articles. Text tools and literature research are the most concentrated aspects of digital humanities research in Taiwan. Limitations It still needs to be further compared with the current research status of digital humanities in Mainland China. Conclusions Digital humanities in Taiwan focuses on the development of tools and techniques, and practical applications of literature and history, and focuses on Taiwan's native culture to form its own digital humanities research characteristics.
Collapse
Affiliation(s)
- Pan Yuting
- Nankai University, Department of Information Resources Management, Business School, Tianjin 300071, PR China
| | - Jiang Yinfeng
- Army Medical University Library, Chongqing 400038, PR China
| | - Zhang Jingli
- Army Medical University Library, Chongqing 400038, PR China
| |
Collapse
|
79
|
Dos Reis AHS, de Oliveira ALM, Fritsch C, Zouch J, Ferreira P, Polese JC. Usefulness of machine learning softwares to screen titles of systematic reviews: a methodological study. Syst Rev 2023; 12:68. [PMID: 37061711 PMCID: PMC10105467 DOI: 10.1186/s13643-023-02231-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/05/2023] [Indexed: 04/17/2023] Open
Abstract
OBJECTIVE To investigate the usefulness and performance metrics of three freely-available softwares (Rayyan®, Abstrackr® and Colandr®) for title screening in systematic reviews. STUDY DESIGN AND SETTING In this methodological study, the usefulness of softwares to screen titles in systematic reviews was investigated by the comparison between the number of titles identified by software-assisted screening and those by manual screening using a previously published systematic review. To test the performance metrics, sensitivity, specificity, false negative rate, proportion missed, workload and timing savings were calculated. A purposely built survey was used to evaluate the rater's experiences regarding the softwares' performances. RESULTS Rayyan® was the most sensitive software and raters correctly identified 78% of the true positives. All three softwares were specific and raters correctly identified 99% of the true negatives. They also had similar values for precision, proportion missed, workload and timing savings. Rayyan®, Abstrackr® and Colandr® had 21%, 39% and 34% of false negatives rates, respectively. Rayyan presented the best performance (35/40) according to the raters. CONCLUSION Rayyan®, Abstrackr® and Colandr® are useful tools and provided good metric performance results for systematic title screening. Rayyan® appears to be the best ranked on the quantitative and on the raters' perspective evaluation. The most important finding of this study is that the use of software to screen titles does not remove any title that would meet the inclusion criteria for the final review, being valuable resources to facilitate the screening process.
Collapse
Affiliation(s)
- Ana Helena Salles Dos Reis
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Ana Luiza Miranda de Oliveira
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Carolina Fritsch
- Faculty of Medicine and Health, School of Health Sciences, Sydney Musculoskeletal Health, The Kolling Institute, The University of Sydney, Sydney, NSW, Australia
| | - James Zouch
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Paulo Ferreira
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Janaine Cunha Polese
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
80
|
Jalali M, Zahedi M, Basiri A. Deterministic solution of algebraic equations in sentiment analysis. Multimed Tools Appl 2023; 82:1-18. [PMID: 37362725 PMCID: PMC10054214 DOI: 10.1007/s11042-023-15140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/06/2022] [Accepted: 03/13/2023] [Indexed: 06/28/2023]
Abstract
Text mining methods usually use statistical information to solve text and language-independent procedures. Text mining methods such as polarity detection based on stochastic patterns and rules need many samples to train. On the other hand, deterministic and non-probabilistic methods are easy to solve and faster than other methods but are not efficient in NLP data. In this article, a fast and efficient deterministic method for solving the problems is proposed. In the proposed method firstly we transform text and labels into a set of equations. In the second step, a mathematical solution of ill-posed equations known as Tikhonov regularization was used as a deterministic and non-probabilistic way including additional assumptions, such as smoothness of solution to assign a weight that can reflect the semantic information of each sentimental word. We confirmed the efficiency of the proposed method in the SemEval-2013 competition, ESWC Database and Taboada database as three different cases. We observed improvement of our method over negative polarity due to our proposed mathematical step. Moreover, we demonstrated the effectiveness of our proposed method over the most common and traditional machine learning, stochastic and fuzzy methods.
Collapse
Affiliation(s)
- Maryam Jalali
- Faculty of Computer and IT Engineering, Shahrood University of Technology, Shahrood, Iran
| | - Morteza Zahedi
- Faculty of Computer and IT Engineering, Shahrood University of Technology, Shahrood, Iran
| | - Abdolali Basiri
- Faculty of Mathematics and Computer Science, University of Damghan, Damghan, Iran
| |
Collapse
|
81
|
Iyo M, Akiyoshi H, Sekine D, Shibasaki Y, Mamiya N. An exploratory database study of factors influencing the continuation of brexpiprazole treatment (prescription) in patients with schizophrenia using information from psychiatric electronic medical records processed with natural language processing. Schizophr Res 2023; 255:122-131. [PMID: 36989669 DOI: 10.1016/j.schres.2023.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 01/13/2023] [Accepted: 03/03/2023] [Indexed: 03/31/2023]
Abstract
Using natural language processing (NLP) technology to analyze and organize textual information in psychiatric electronic medical records can identify undiscovered factors associated with treatment discontinuation. This study aimed to evaluate brexpiprazole treatment continuation rate and factors affecting brexpiprazole discontinuation using a database that employs the MENTAT® system with NLP technology. This retrospective observational study evaluated patients with schizophrenia who were newly initiated on brexpiprazole (April 18, 2018-May 15, 2020). The first prescriptions of brexpiprazole were followed up for 180 days. Factors associated with brexpiprazole discontinuation were assessed using structured and unstructured patient data (April 18, 2017-December 31, 2020). The analysis population comprised 515 patients; mean (standard deviation) age of patients was 48.0 (15.3) years, and 47.8 % were male. Using Kaplan-Meier analysis, the cumulative brexpiprazole continuation rate at 180 days was 29 % (estimate: 0.29; 95 % confidence interval, 0.25-0.33). Univariate Cox proportional hazards analysis identified 16 variables independently associated with brexpiprazole discontinuation. Multivariate analysis identified eight variables associated with treatment discontinuation: variables with hazard ratio <1 were the presence of physical complications, longer hospitalization duration, and maximum chlorpromazine-equivalent dose of antipsychotics of >200 to ≤400 mg/day vs ≤200 mg/day in the past year; variables with hazard ratio >1 were previous electroconvulsive therapy, availability of key contact person information, a history of crime committed/reported, increase in brexpiprazole dose to 2 mg in >28 days, and appearance/worsening of symptoms other than positive symptoms. In conclusion, we identified potential new factors that may be associated with brexpiprazole discontinuation, which may improve the treatment strategy and continuation rate in patients with schizophrenia.
Collapse
Affiliation(s)
- Masaomi Iyo
- Department of Psychiatry, Graduate School of Medicine, Chiba University, Japan
| | - Hisashi Akiyoshi
- Medical Affairs Department, Otsuka Pharmaceutical Co., Ltd., Japan.
| | - Daisuke Sekine
- Medical Affairs Department, Otsuka Pharmaceutical Co., Ltd., Japan
| | | | - Noriyuki Mamiya
- Medical Affairs Department, Otsuka Pharmaceutical Co., Ltd. (contractor), Japan
| |
Collapse
|
82
|
Hadikhah Mozhdehi M, Eftekhari Moghadam A. Textual emotion detection utilizing a transfer learning approach. J Supercomput 2023; 79:1-15. [PMID: 37359334 PMCID: PMC10032627 DOI: 10.1007/s11227-023-05168-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/05/2023] [Indexed: 06/28/2023]
Abstract
Many attempts have been made to overcome the challenges of automating textual emotion detection using different traditional deep learning models such as LSTM, GRU, and BiLSTM. But the problem with these models is that they need large datasets, massive computing resources, and a lot of time to train. Also, they are prone to forgetting and cannot perform well when applied to small datasets. In this paper, we aim to demonstrate the capability of transfer learning techniques to capture the better contextual meaning of the text and as a result better detection of the emotion represented in the text, even without a large amount of data and training time. To do this, we conduct an experiment utilizing a pre-trained model called EmotionalBERT, which is based on bidirectional encoder representations from transformers (BERT), and we compare its performance to RNN-based models on two benchmark datasets, with a focus on the amount of training data and how it affects the models' performance.
Collapse
|
83
|
Yip WS, Zhou H, To S. A critical analysis on the triple bottom line of sustainable manufacturing: key findings and implications. Environ Sci Pollut Res Int 2023; 30:41388-41404. [PMID: 36631618 PMCID: PMC9838463 DOI: 10.1007/s11356-022-25122-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 12/29/2022] [Indexed: 06/17/2023]
Abstract
Because of the environmental consequences of manufacturing activities, the general public, industry, and academia are becoming more aware of sustainable manufacturing (SM), which incorporates environmentally friendly manufacturing processes while emphasizing overall triple bottom line (TBL) performance in manufacturing. This article employs various text mining techniques and bibliometric analysis including cluster analysis, Pearson coefficient and research landscape to conduct an extensive investigation on SM with a focus on the TBL, in which the research content of SM with the TBL is reviewed and discussed systematically from a wide angle and with reduced bias. In this study, three new indicators about the ratios of the number of scientific papers between social, environmental, and economic dimensions of SM are devised to show the weight and level of importance of dimensions in SM, covering scientific papers from 30 years. The findings from this study indicate that the influential power of SM varies across the three dimensions, with a particular emphasis on the social dimension of SM from various countries, implying a current state of imbalance status in TBL for SM, at the same time, the economic and environmental dimensions share similar research topics and academic emphasis in SM. Based on these findings, recommendations based on sustainable development goals (SDGs) of the United Nations (UN) are made to increase the social influence of SM. This article firstly reveals the individual status of the social dimension and the situation of unbalanced TBL in SM, providing sustainable suggestions for enhancing the effectiveness of SM and achieving balanced TBL regarding the SDGs.
Collapse
Affiliation(s)
- Wai Sze Yip
- State Key Laboratory of Ultra-Precision Machining Technology, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR China
| | - HongTing Zhou
- State Key Laboratory of Ultra-Precision Machining Technology, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR China
| | - Suet To
- State Key Laboratory of Ultra-Precision Machining Technology, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR China
| |
Collapse
|
84
|
Tounsi A, Temimi M. A systematic review of natural language processing applications for hydrometeorological hazards assessment. Nat Hazards (Dordr) 2023; 116:2819-2870. [PMID: 36776702 PMCID: PMC9905760 DOI: 10.1007/s11069-023-05842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 01/28/2023] [Indexed: 06/18/2023]
Abstract
Natural language processing (NLP) is a promising tool for collecting data that are usually hard to obtain during extreme weather, like community response and infrastructure performance. Patterns and trends in abundant data sources such as weather reports, news articles, and social media may provide insights into potential impacts and early warnings of impending disasters. This paper reviews the peer-reviewed studies (journals and conference proceedings) that used NLP to assess extreme weather events, focusing on heavy rainfall events. The methodology searches four databases (ScienceDirect, Web of Science, Scopus, and IEEE Xplore) for articles published in English before June 2022. The preferred reporting items for systematic reviews and meta-analysis reviews and meta-analysis guidelines were followed to select and refine the search. The method led to the identification of thirty-five studies. In this study, hurricanes, typhoons, and flooding were considered. NLP models were implemented in information extraction, topic modeling, clustering, and classification. The findings show that NLP remains underutilized in studying extreme weather events. The review demonstrated that NLP could potentially improve the usefulness of social media platforms, newspapers, and other data sources that could improve weather event assessment. In addition, NLP could generate new information that should complement data from ground-based sensors, reducing monitoring costs. Key outcomes of NLP use include improved accuracy, increased public safety, improved data collection, and enhanced decision-making are identified in the study. On the other hand, researchers must overcome data inadequacy, inaccessibility, nonrepresentative and immature NLP approaches, and computing skill requirements to use NLP properly.
Collapse
Affiliation(s)
- Achraf Tounsi
- Department of Civil, Environmental, and Ocean Engineering, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ 07030 USA
| | - Marouane Temimi
- Department of Civil, Environmental, and Ocean Engineering, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ 07030 USA
| |
Collapse
|
85
|
Kinariwala S, Deshmukh S. Short text topic modelling using local and global word-context semantic correlation. Multimed Tools Appl 2023; 82:1-23. [PMID: 36747894 PMCID: PMC9891888 DOI: 10.1007/s11042-023-14352-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/21/2022] [Accepted: 01/02/2023] [Indexed: 06/18/2023]
Abstract
Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.
Collapse
Affiliation(s)
| | - Sachin Deshmukh
- Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra India
| |
Collapse
|
86
|
Liu L, Chen J, Wang C, Wang Q. Quantitative evaluation of China's basin ecological compensation policies based on the PMC index model. Environ Sci Pollut Res Int 2023; 30:17532-17545. [PMID: 36197610 DOI: 10.1007/s11356-022-23354-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Policy evaluation is the premise of the scientific formulation and effective implementation of a basin ecological compensation policy. However, whether the formulation of the basin ecological compensation policy (BECP) is reasonable or not lacks theoretical and technical support. This study constructed a model based on the PMC and text mining methods. PMC index model enables decision-makers to determine the level of consistency and the strengths and weaknesses of any policy from multiple angles and makes the evaluation results more targeted and operable. By establishing an evaluation system for BECP and building a multi-input-output table, the score of each policy is calculated. Based on this, the rationality of nine ecological compensation policies in the Yangtze and Yellow River basins was then examined. The results show that the average value of the PMC index for the nine policies is 7.23, which indicate the formulation of the basin ecological compensation policy in China is generally reasonable. Ranking of policy scores from high to low is P2 > P1 > P5 > P7 > P3 > P4 > P6 > P9 > P8. However, deficiencies exist in policy timeliness, incentive measures, and policy receptors. In addition, there is a large gap in the formulation of policies at different levels. Moreover, the level of local policies is uneven.
Collapse
Affiliation(s)
- Liming Liu
- Business School, Hohai University, Nanjing, 210098, China
- Jiangsu Research Base of Yangtze Institute for Conservation and High-Quality Development, Nanjing, 210098, Jiangsu, China
| | - Junfei Chen
- Business School, Hohai University, Nanjing, 210098, China.
- Yangtze Institute for Conservation and Development, Hohai University, Nanjing, 210098, China.
- Jiangsu Research Base of Yangtze Institute for Conservation and High-Quality Development, Nanjing, 210098, Jiangsu, China.
| | - Chunbao Wang
- Business School, Hohai University, Nanjing, 210098, China
| | - Qian Wang
- Business School, Hohai University, Nanjing, 210098, China
- Jiangsu Research Base of Yangtze Institute for Conservation and High-Quality Development, Nanjing, 210098, Jiangsu, China
| |
Collapse
|
87
|
Hu Y, Li X, Song Y, Huang C. Data-driven evaluation framework for the effectiveness of rural vitalization in China: an empirical case study of Hubei Province. Environ Sci Pollut Res Int 2023; 30:20235-20254. [PMID: 36251194 DOI: 10.1007/s11356-022-23393-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
Rural vitalization (RV) has attracted more and more attention in China, especially since the Rural Vitalization Strategy (RVS) was proposed to restrict rural decline in 2017. The evaluation of RV is an effective means to objectively identify the characteristics and problems of rural development, so exploring scientific and rational evaluation methods is important for sustainable rural development. Therefore, this study builds a data-driven evaluation framework from a "bottom-up" perspective, and selects Hubei Province as the object to evaluate the effectiveness of RV. The evaluation index system is formed based on the concept and connotation of RV, which contains six dimensions, namely thriving businesses (TB), pleasant living environments (PLE), social etiquette and civility (SEC), effective governance (EG), living in prosperity (LP), and organization system (OS). The empirical results indicate that there is a low level of variation of the total scores but an obvious disparity in the dimensional scores in 13 prefecture-level and 83 county-level regions. At county-level, the regional development stage has an impact on the effectiveness of RV, and regions with a higher economy or endowed with better resources perform better. The results of spatial analysis further reveal that there is regional agglomeration as well as differences in various dimensions, and regions with characteristic industries or policy support perform better. Compared with the traditional evaluation method, differentiated evaluation objectives and diversified data are considered in the evaluation process of this study. The results and discussion shown in this study could provide empirical evidence for policymakers to effectively promote RV in the future.
Collapse
Affiliation(s)
- Yingen Hu
- Department of Land Management, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiang Li
- Department of Land Management, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yu Song
- Department of Land Management, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Chen Huang
- Department of Land Management, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
88
|
Jimeno Yepes AJ, Verspoor K. Classifying literature mentions of biological pathogens as experimentally studied using natural language processing. J Biomed Semantics 2023; 14:1. [PMID: 36721225 PMCID: PMC9889128 DOI: 10.1186/s13326-023-00282-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/17/2023] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION N/A.
Collapse
Affiliation(s)
- Antonio Jose Jimeno Yepes
- School of Computing Technologies, RMIT University, Melbourne, Australia.
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, Melbourne, Australia
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
89
|
Wang C, Wang L, Li Q, Wu W, Yuan J, Wang H, Lu X. Computational Drug Discovery in Ankylosing Spondylitis-induced Osteoporosis Based on Data Mining and bioinformatics analysis. World Neurosurg 2023:S1878-8750(23)00107-9. [PMID: 36716856 DOI: 10.1016/j.wneu.2023.01.092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/21/2023] [Accepted: 01/23/2023] [Indexed: 01/29/2023]
Abstract
BACKGROUND Ankylosing spondylitis (AS) and osteoporosis (OP) are both prevalent illnesses in spine surgery, with OP being a possible consequence of AS. However, the mechanism of AS-induced OP (AS-OP) remains unknown, limiting etiological research and therapy of the illness. In order to mine targetable medicine for the prevention and treatment of AS-OP, this project will analyze public datasets using bioinformatics to identify genes and biological pathways relevant to AS-OP. METHODS First, text mining was utilized to identify common genes associated with AS and OP, after which functional analysis was carried out. STRING database and Cytoscape software were used to create protein-protein interaction (PPI) networks. Finally, hub genes and potential drugs were discovered using drug-gene interaction analysis and transcriptional factors (TFs)-gene interaction analysis. RESULTS The results of text mining revealed 241 genes common to 'AS' and 'OP', from which 115 key symbols were sorted out by functional analysis. As options for treating AS-OP, PPI analysis yielded 20 genes that may be targeted by thirteen medications. CONCLUSION In conclusion, CARLUMAB, BERMEKIMAB, RILONACEPT, RILOTUMUMAB and FICLATUZUMAB were first identified as the potential drugs for the treatment of AS-OP, proving the value of text mining and pathway analysis in drug discovery.
Collapse
Affiliation(s)
- Chenfeng Wang
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Liang Wang
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Qisheng Li
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Weiqing Wu
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Jincan Yuan
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Haibin Wang
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Xuhua Lu
- Department of Orthopedics, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
90
|
van Es B, Reteig LC, Tan SC, Schraagen M, Hemker MM, Arends SRS, Rios MAR, Haitjema S. Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods. BMC Bioinformatics 2023; 24:10. [PMID: 36624385 PMCID: PMC9830789 DOI: 10.1186/s12859-022-05130-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 12/30/2022] [Indexed: 01/11/2023] Open
Abstract
When developing models for clinical information retrieval and decision support systems, the discrete outcomes required for training are often missing. These labels need to be extracted from free text in electronic health records. For this extraction process one of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand.
Collapse
Affiliation(s)
- Bram van Es
- grid.7692.a0000000090126352Central Diagnostic Laboratory, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands ,MedxAI, Amsterdam, The Netherlands
| | - Leon C. Reteig
- grid.7692.a0000000090126352Center for Translational Immunology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sander C. Tan
- grid.7692.a0000000090126352Department for Research & Data Technology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Marijn Schraagen
- grid.5477.10000000120346234Institute for Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
| | - Myrthe M. Hemker
- grid.5477.10000000120346234Utrecht Institute of Linguistics OTS & Department of Languages, Literature and Communication, Utrecht University, Utrecht, The Netherlands
| | - Sebastiaan R. S. Arends
- grid.7177.60000000084992262Department of Medical Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Miguel A. R. Rios
- grid.10420.370000 0001 2286 1424Centre for Translation Studies, University of Vienna, Vienna, Austria
| | - Saskia Haitjema
- grid.7692.a0000000090126352Central Diagnostic Laboratory, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
91
|
Supianto AA, Nurdiansyah R, Weng CW, Zilvan V, Yuwana RS, Arisal A, Pardede HF, Lee MM, Huang CH, Ng KL. Cluster-based text mining for extracting drug candidates for the prevention of COVID-19 from the biomedical literature. J Taibah Univ Med Sci 2023; 18:787-801. [PMID: 36618881 PMCID: PMC9810500 DOI: 10.1016/j.jtumed.2022.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 10/14/2022] [Accepted: 12/12/2022] [Indexed: 01/05/2023] Open
Abstract
Objective The coronavirus disease 2019 (COVID-19) health crisis that began at the end of 2019 made researchers around the world quickly race to find effective solutions. Related literature exploded and it was inevitable that an automated approach was needed to find useful information, namely text mining, to overcome COVID-19, especially in terms of drug candidate discovery. While text mining methods for finding drug candidates mostly try to extract bioentity associations from PubMed, very few of them mine with a clustering approach. The purpose of this study was to demonstrate the effectiveness of our approach to identify drugs for the prevention of COVID-19 through literature review, cluster analysis, drug docking calculations, and clinical trial data. Methods This research was conducted in four main stages. First, the text mining stage was carried out by involving Bidirectional Encoder Representations from Transformers for Biomedical to obtain vector representation of each word in the sentence from texts. The next stage generated the disease-drug associations, which were obtained from the correlation between disease and drug. Next, the clustering stage grouped the rules through the similarity of diseases by utilizing Term Frequency-Inverse Document Frequency as its feature. Finally, the drug candidate extraction stage was processed through leveraging PubChem and DrugBank databases. We further used the drug docking package AUTODOCK VINA in PyRx software to verify the results. Results Comparative analyses showed that the percentage of findings using mining with clustering outperformed mining without clustering in all experimental settings. In addition, we suggest that the top three drugs/phytochemicals by drug docking analysis may be effective in preventing COVID-19. Conclusions The proposed method for text mining utilizing the clustering method is quite promising in the discovery of drug candidates for the prevention of COVID-19 through the biomedical literature.
Collapse
Affiliation(s)
- Ahmad Afif Supianto
- Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia
| | - Rizky Nurdiansyah
- Department of Bioinformatics, Indonesia International Institute for Life Sciences, Indonesia
| | - Chia-Wei Weng
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
| | - Vicky Zilvan
- Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia
| | - Raden Sandra Yuwana
- Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia
| | - Andria Arisal
- Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia
| | | | - Min-Min Lee
- Department of Food Nutrition and Health Biotechnology, Asia University, Taiwan
| | - Chien-Hung Huang
- Department of Computer Science and Information Engineering, National Formosa University, Taiwan
| | - Ka-Lok Ng
- Department of Bioinformatics and Medical Engineering, Asia University, Taiwan,Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan,Center for Artificial Intelligence and Precision Medicine Research, Asia University, Taiwan,Corresponding address: Department of Bioinformatics and Medical Engineering, No. 500, LiuFeng Rd., WuFeng Dist., Taichung City, 41354, Taiwan.
| |
Collapse
|
92
|
Knisely BM, Pavliscsak HH. Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics 2023; 128:3197-3224. [PMID: 37101971 PMCID: PMC10083066 DOI: 10.1007/s11192-023-04689-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/07/2023] [Indexed: 04/28/2023]
Abstract
Funding institutions often solicit text-based research proposals to evaluate potential recipients. Leveraging the information contained in these documents could help institutions understand the supply of research within their domain. In this work, an end-to-end methodology for semi-supervised document clustering is introduced to partially automate classification of research proposals based on thematic areas of interest. The methodology consists of three stages: (1) manual annotation of a document sample; (2) semi-supervised clustering of documents; (3) evaluation of cluster results using quantitative metrics and qualitative ratings (coherence, relevance, distinctiveness) by experts. The methodology is described in detail to encourage replication and is demonstrated on a real-world data set. This demonstration sought to categorize proposals submitted to the US Army Telemedicine and Advanced Technology Research Center (TATRC) related to technological innovations in military medicine. A comparative analysis of method features was performed, including unsupervised vs. semi-supervised clustering, several document vectorization techniques, and several cluster result selection strategies. Outcomes suggest that pretrained Bidirectional Encoder Representations from Transformers (BERT) embeddings were better suited for the task than older text embedding techniques. When comparing expert ratings between algorithms, semi-supervised clustering produced coherence ratings ~ 25% better on average compared to standard unsupervised clustering with negligible differences in cluster distinctiveness. Last, it was shown that a cluster result selection strategy that balances internal and external validity produced ideal results. With further refinement, this methodological framework shows promise as a useful analytical tool for institutions to unlock hidden insights from untapped archives and similar administrative document repositories. Supplementary Information The online version contains supplementary material available at 10.1007/s11192-023-04689-3.
Collapse
Affiliation(s)
- Benjamin M. Knisely
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| | - Holly H. Pavliscsak
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| |
Collapse
|
93
|
Wu B, Wang L, Lv SX, Zeng YR. Forecasting oil consumption with attention-based IndRNN optimized by adaptive differential evolution. APPL INTELL 2023; 53:5473-96. [PMID: 35789694 DOI: 10.1007/s10489-022-03720-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2022] [Indexed: 11/02/2022]
Abstract
Accurate prediction of oil consumption plays a dominant role in oil supply chain management. However, because of the effects of the coronavirus disease 2019 (COVID-19) pandemic, oil consumption has exhibited an uncertain and volatile trend, which leads to a huge challenge to accurate predictions. The rapid development of the Internet provides countless online information (e.g., online news) that can benefit predict oil consumption. This study adopts a novel news-based oil consumption prediction methodology-convolutional neural network (CNN) to fetch online news information automatically, thereby illustrating the contribution of text features for oil consumption prediction. This study also proposes a new approach called attention-based JADE-IndRNN that combines adaptive differential evolution (adaptive differential evolution with optional external archive, JADE) with an attention-based independent recurrent neural network (IndRNN) to forecast monthly oil consumption. Experimental results further indicate that the proposed news-based oil consumption prediction methodology improves on the traditional techniques without online oil news significantly, as the news might contain some explanations of the relevant confinement or reopen policies during the COVID-19 period.
Collapse
|
94
|
Arınık N, Van Bortel W, Boudoua B, Busani L, Decoupes R, Interdonato R, Kafando R, van Kleef E, Roche M, Alam Syed M, Teisseire M. An annotated dataset for event-based surveillance of antimicrobial resistance. Data Brief 2023; 46:108870. [PMID: 36687146 PMCID: PMC9849856 DOI: 10.1016/j.dib.2022.108870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/15/2022] [Accepted: 12/27/2022] [Indexed: 01/02/2023] Open
Abstract
This paper presents an annotated dataset used in the MOOD Antimicrobial Resistance (AMR) hackathon, hosted in Montpellier, June 2022. The collected data concerns unstructured data from news items, scientific publications and national or international reports, collected from four event-based surveillance (EBS) Systems, i.e. ProMED, PADI-web, HealthMap and MedISys. Data was annotated by relevance for epidemic intelligence (EI) purposes with the help of AMR experts and an annotation guideline. Extracted data were intended to include relevant events on the emergence and spread of AMR such as reports on AMR trends, discovery of new drug-bug resistances, or new AMR genes in human, animal or environmental reservoirs. This dataset can be used to train or evaluate classification approaches to automatically identify written text on AMR events across the different reservoirs and sectors of One Health (i.e. human, animal, food, environmental sources, such as soil and waste water) in unstructured data (e.g. news, tweets) and classify these events by relevance for EI purposes.
Collapse
Affiliation(s)
- Nejat Arınık
- INRAE, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Wim Van Bortel
- ITM, Institute of Tropical Medicine, Department of Biomedical Sciences, Antwerp, Belgium
| | - Bahdja Boudoua
- INRAE, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Luca Busani
- Center for Gender-Specific Medicine, Istituto Superiore di Sanitá Viale Regina Elena 299, 00161 Rome, Italy
| | - Rémy Decoupes
- INRAE, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Roberto Interdonato
- CIRAD, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Rodrique Kafando
- INRAE, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Esther van Kleef
- ITM, Institute of Tropical Medicine, Department of Public Health, Outbreak Research Team, Antwerp, Belgium
| | - Mathieu Roche
- CIRAD, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Mehtab Alam Syed
- CIRAD, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
| | - Maguelonne Teisseire
- INRAE, Montpellier F-34398, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France
- Corresponding author at: TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier 34090, France.
| |
Collapse
|
95
|
Sulyok J, Fehérvölgyi B, Csizmadia T, Katona AI, Kosztyán ZT. Does geography matter? Implications for future tourism research in light of COVID-19. Scientometrics 2023; 128:1601-1637. [PMID: 36647425 PMCID: PMC9833032 DOI: 10.1007/s11192-022-04615-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 11/25/2022] [Indexed: 01/13/2023]
Abstract
Due to the 2019 new coronavirus disease (COVID-19) pandemic, tourism is undergoing fundamental changes that are affecting tourism research. This situation calls for in-depth analyses of tourism research. Scholars have already published review studies on COVID-19-related research within the tourism field; however, these studies do not connect findings, such as the research focus, research methodology and target group, to form a research profile, and the geographical patterns of the findings are not identified. study, COVID-19-related tourism studies were collected and analyzed in depth following the Preferred Reporting Items for systematic reviews and meta-analyses (PRISMA) method. In addition, data-driven methods, such as spatial multilayer networks, frequent patterns and content-based analyses, were applied to identify research profiles and their geographic patterns. This study pointed out the role of geographic patterns in tourism research, going beyond the research of the authors. Moreover, topics, focus destinations, applied methodologies and employed data sources have relevant geographic patterns. Four dominant research profiles that show that a shift can be observed in tourism research toward data sources and research methods were identified. Due to COVID-19, the strengthening of the application of quantitative methods and employment of secondary data sources are needed. Supplementary Information The online version contains supplementary material available at 10.1007/s11192-022-04615-z.
Collapse
Affiliation(s)
- Judit Sulyok
- grid.7336.10000 0001 0203 5854Department of Tourism, Faculty of Business and Economics, Institute of Business, University of Pannonia, Hungary, Egyetem str. 10, Veszprém, 8200 Hungary
| | - Beáta Fehérvölgyi
- grid.7336.10000 0001 0203 5854Department of Tourism, Faculty of Business and Economics, Institute of Business, University of Pannonia, Hungary, Egyetem str. 10, Veszprém, 8200 Hungary
| | - Tibor Csizmadia
- grid.7336.10000 0001 0203 5854Department of Management, Faculty of Business and Economics, Institute of Management, University of Pannonia, Egyetem str. 10, Veszprém, 8200 Hungary
| | - Attila I. Katona
- grid.7336.10000 0001 0203 5854Department of Quantitative Methods, Faculty of Business and Economics, Institute of Management, University of Pannonia, Egyetem str. 10, Veszprém, 8200 Hungary
| | - Zsolt T. Kosztyán
- grid.7336.10000 0001 0203 5854Department of Quantitative Methods, Faculty of Business and Economics, Institute of Management, University of Pannonia, Egyetem str. 10, Veszprém, 8200 Hungary
| |
Collapse
|
96
|
Alsayat A. Customer decision-making analysis based on big social data using machine learning: a case study of hotels in Mecca. Neural Comput Appl 2023; 35:4701-22. [PMID: 36340596 DOI: 10.1007/s00521-022-07992-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 10/21/2022] [Indexed: 02/01/2023]
Abstract
Big social data and user-generated content have emerged as important sources of timely and rich knowledge to detect customers' behavioral patterns. Revealing customer satisfaction through the use of user-generated content has been a significant issue in business, especially in the tourism and hospitality context. There have been many studies on customer satisfaction that take quantitative survey approaches. However, revealing customer satisfaction using big social data in the form of eWOM (electronic word of mouth) can be an effective way to better understand customers' demands. In this study, we aim to develop a hybrid methodology based on supervised learning, text mining, and segmentation machine learning approaches to analyze big social data on travelers' decision-making regarding hotels in Mecca, Saudi Arabia. To do so, we use support vector regression with sequential minimal optimization (SMO), latent Dirichlet allocation (LDA), and k-means approaches to develop the hybrid method. We collect data from travelers' online reviews of Mecca hotels on TripAdvisor. The data are segmented, and travelers' satisfaction is revealed for each segment based on their online reviews of hotels. The results show that the method is effective for big social data analysis and traveler segmentation in Mecca hotels. The results are discussed, and several recommendations and strategies for hotel managers are provided to enhance their service quality and improve customer satisfaction.
Collapse
|
97
|
Diaz-Garcia JA, Ruiz MD, Martin-Bautista MJ. A survey on the use of association rules mining techniques in textual social media. Artif Intell Rev 2023; 56:1175-200. [PMID: 35578652 DOI: 10.1007/s10462-022-10196-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The incursion of social media in our lives has been much accentuated in the last decade. This has led to a multiplication of data mining tools aimed at obtaining knowledge from these data sources. One of the greatest challenges in this area is to be able to obtain this knowledge without the need for training processes, which requires structured information and pre-labelled datasets. This is where unsupervised data mining techniques come in. These techniques can obtain value from these unstructured and unlabelled data, providing very interesting solutions to enhance the decision-making process. In this paper, we first address the problem of social media mining, as well as the need for unsupervised techniques, in particular association rules, for its treatment. We follow with a broad overview of the applications of association rules in the domain of social media mining, specifically, their application to the problems of mining textual entities, such as tweets. We also focus on the strengths and weaknesses of using association rules for solving different tasks in textual social media. Finally, the paper provides a perspective overview of the challenges that association rules must face in the next decade within the field of social media mining.
Collapse
|
98
|
Mozafarinia M, Rajabiyazdi F, Brouillette MJ, Fellows LK, Knäuper B, Mayo NE. Effectiveness of a personalized health profile on specificity of self-management goals among people living with HIV in Canada: findings from a blinded pragmatic randomized controlled trial. Qual Life Res 2023; 32:413-424. [PMID: 36088501 PMCID: PMC9464055 DOI: 10.1007/s11136-022-03245-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2022] [Indexed: 11/24/2022]
Abstract
PURPOSE To estimate among people living with chronic HIV, to what extent providing feedback on their health outcomes will affect the number and specificity of patient-formulated self-management goals. METHODS A personalized feedback profile was produced for individuals enrolled in a Canadian HIV Brain Health Now study. Goal specificity was measured by total number of specific words (matched to a domain-specific developed lexicon) per person-words using text mining techniques. RESULTS Of 176 participants enrolled and randomly assigned to feedback and control groups, 110 responses were received. The average number of goals was similar for both groups (3.7 vs 3.9). The number of specific words used in the goals formulated by the feedback and control group were 642 and 739, respectively. Specific nouns and actionable verbs were present to some extent and "measurable" and "time-bound" words were mainly missing. Negative binomial regression showed no difference in goal specificity among groups (RR = 0.93, 95% CI 0.78-1.10). Goals set by both groups overlapped in 8 areas and had little difference in rank. CONCLUSION Personalized feedback profile did not help with formulation of high-quality goals. Text mining has the potential to help with difficulties of goal evaluation outside of the face-to-face setting. With more data and use of learning models automated answers could be generated to provide a more dynamic platform.
Collapse
Affiliation(s)
- Maryam Mozafarinia
- Division of Experimental Medicine, McGill University, Montreal, Canada. .,Center for Outcome Research and Evaluation (CORE), McGill University Health Centre Research Institute, Montréal, Quebec, Canada.
| | - Fateme Rajabiyazdi
- System and Computer Engineering, Faculty of Engineering and Design, Carleton University, Ottawa, Canada
| | - Marie-Josée Brouillette
- Department of Psychiatry, McGill University, Montreal, Canada ,Center for Outcome Research & Evaluation, McGill University, Montreal, Canada
| | - Lesley K. Fellows
- Department of Neurology and Neurosurgery and Chronic Viral Illness Service, McGill University, Montreal, Canada
| | - Bärbel Knäuper
- Department of Psychology, McGill University, Montreal, Canada
| | - Nancy E. Mayo
- Division of Experimental Medicine, McGill University, Montreal, Canada ,Department of Medicine and School of Physical and Occupational Therapy, McGill University, Montreal, Canada ,Center for Outcome Research & Evaluation, McGill University, Montreal, Canada
| |
Collapse
|
99
|
Auzoux S, Ngaba B, Christina M, Heuclin B, Roche M. Experimental variables in sugarcane intercropping in Reunion Island for data matching. Data Brief 2022; 46:108869. [PMID: 36691558 PMCID: PMC9860465 DOI: 10.1016/j.dib.2022.108869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/17/2022] [Accepted: 12/27/2022] [Indexed: 01/01/2023] Open
Abstract
This study aimed to link experimental data dealing with complex agroecological systems. For sharing and linking collected data with the generic AEGIS (Agro-Ecological Global Information System) database, our work described in this data paper consists in mapping researcher variables to the AEGIS dictionary variable for different tropical crops (sugarcane, rice, sorghum or cover crops). Additionally, this data paper presents a study case based on sugarcane intercropping systems for evaluating 3 matching measures of variables.
Collapse
Affiliation(s)
- Sandrine Auzoux
- UR AIDA (Agroecology and sustainable intensification of annual crops), University of Montpellier, CIRAD, La Réunion, France,French Agricultural Research for Development (CIRAD), France
| | - Billy Ngaba
- UR AIDA (Agroecology and sustainable intensification of annual crops), University of Montpellier, CIRAD, La Réunion, France,French Agricultural Research for Development (CIRAD), France
| | - Mathias Christina
- UR AIDA (Agroecology and sustainable intensification of annual crops), University of Montpellier, CIRAD, La Réunion, France,French Agricultural Research for Development (CIRAD), France
| | - Benjamin Heuclin
- UR AIDA (Agroecology and sustainable intensification of annual crops), University of Montpellier, CIRAD, La Réunion, France,French Agricultural Research for Development (CIRAD), France
| | - Mathieu Roche
- UMR TETIS (Land, Environment, Remote Sensing and Spatial Information), University of Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France,French Agricultural Research for Development (CIRAD), France,Corresponding author.
| |
Collapse
|
100
|
Németh R. A scoping review on the use of natural language processing in research on political polarization: trends and research prospects. J Comput Soc Sci 2022; 6:289-313. [PMID: 36568020 PMCID: PMC9762668 DOI: 10.1007/s42001-022-00196-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 11/29/2022] [Indexed: 05/05/2023]
Abstract
As part of the "text-as-data" movement, Natural Language Processing (NLP) provides a computational way to examine political polarization. We conducted a methodological scoping review of studies published since 2010 (n = 154) to clarify how NLP research has conceptualized and measured political polarization, and to characterize the degree of integration of the two different research paradigms that meet in this research area. We identified biases toward US context (59%), Twitter data (43%) and machine learning approach (33%). Research covers different layers of the political public sphere (politicians, experts, media, or the lay public), however, very few studies involved more than one layer. Results indicate that only a few studies made use of domain knowledge and a high proportion of the studies were not interdisciplinary. Those studies that made efforts to interpret the results demonstrated that the characteristics of political texts depend not only on the political position of their authors, but also on other often-overlooked factors. Ignoring these factors may lead to overly optimistic performance measures. Also, spurious results may be obtained when causal relations are inferred from textual data. Our paper provides arguments for the integration of explanatory and predictive modeling paradigms, and for a more interdisciplinary approach to polarization research. Supplementary Information The online version contains supplementary material available at 10.1007/s42001-022-00196-2.
Collapse
Affiliation(s)
- Renáta Németh
- Research Center for Computational Social Science, Faculty of Social Sciences, ELTE Eötvös Loránd University, Budapest, Hungary
| |
Collapse
|