1
|
Living review framework for better policy design and management of hazardous waste in Australia. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171556. [PMID: 38458450 DOI: 10.1016/j.scitotenv.2024.171556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/25/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
The significant increase in hazardous waste generation in Australia has led to the discussion over the incorporation of artificial intelligence into the hazardous waste management system. Recent studies explored the potential applications of artificial intelligence in various processes of managing waste. However, no study has examined the use of text mining in the hazardous waste management sector for the purpose of informing policymakers. This study developed a living review framework which applied supervised text classification and text mining techniques to extract knowledge using the domain literature data between 2022 and 2023. The framework employed statistical classification models trained using iterative training and the best model XGBoost achieved an F1 score of 0.87. Using a small set of 126 manually labelled global articles, XGBoost automatically predicted the labels of 678 Australian articles with high confidence. Then, keyword extraction and unsupervised topic modelling with Latent Dirichlet Allocation (LDA) were performed. Results indicated that there were 2 main research themes in Australian literature: (1) the key waste streams and (2) the resource recovery and recycling of waste. The implication of this framework would benefit the policymakers, researchers, and hazardous waste management organisations by serving as a real time guideline of the current key waste streams and research themes in the literature which allow robust knowledge to be applied to waste management and highlight where the gap in research remains.
Collapse
|
2
|
Evolution of renewable energy laws and policies in China. Heliyon 2024; 10:e29712. [PMID: 38681606 PMCID: PMC11046223 DOI: 10.1016/j.heliyon.2024.e29712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/03/2024] [Accepted: 04/14/2024] [Indexed: 05/01/2024] Open
Abstract
This study employs Latent Dirichlet Allocation (LDA) topic modelling methodology to analyze documents related to renewable energy laws and policies at the central level in China. The objective is to investigate the development and evolution of renewable energy policies in China and to gain insights into the national-level attitudes towards renewable energy development. The study consists of two phases: initially, renewable energy policy documents undergo keyword analysis using word clouds and keyword co-occurrence network analysis to elucidate the focal areas and their interconnections within the legal and policy texts. Subsequently, after determining the optimal number of topics for modelling based on topic perplexity and consistency results, the text undergoes data cleaning to isolate words with practical significance. These words are then incorporated into the LDA topic model to analyze the distribution and content of potential topics within the policies. Lastly, by linearly segmenting the time frame, changes in topic intensity over time are visually examined using heat maps. The findings indicate that energy policies have consistently prioritized "development" and emphasized the significance of "new energy" in renewable energy policies. Moreover, as renewable energy has progressed, governments and policymakers have come to acknowledge the importance of comprehensive energy planning, transitioning to clean energy sources, and regulating the electricity market. This growing awareness has led to efforts to strengthen policy and regulatory measures to foster renewable energy's sustainable development and utilization. In summary, this study highlights the effectiveness of the LDA topic model in analyzing renewable energy policies, advancing its adoption and furthering research in the field.
Collapse
|
3
|
Exploring research trends and priorities of genus Melia. Sci Rep 2024; 14:6265. [PMID: 38490998 PMCID: PMC10943012 DOI: 10.1038/s41598-024-53736-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 02/04/2024] [Indexed: 03/18/2024] Open
Abstract
The genus Melia is known for its secondary metabolites and recently, this genus is being explored for its timber. There are vast differences among its species. For instance, Melia azedarach is reported to be invasive and while another species, M. dubia, has diverse utility with complex germination and regeneration characteristics. Researchers globally have been working on various aspects of this genus; In this study, using topic modelling and science mapping approach, we attempted to understand research facets on this genus. The literature corpus of the Web of Science database was explored using a single keyword-"Melia" which yielded 1523 publications (1946-2022) and after scrutiny metadata of 1263 publications were used in the study. Although nine individual species were cited in the publications, only three species are accepted viz., M. dubia, M. azedarach, and M. volkensii. This implies taxonomic uncertainty, with potential confusion in assigning scientific findings to particular species. Thus, a taxonomic relook on this genus is warranted for a better assessment of the economic utility in many countries. More importantly, our results indicate that the research interests have recently shifted from the secondary metabolite constituents towards growth, biomass, wood properties, germination, plantation, and green synthesis. The shift in research focus toward wood properties of Melia sp. can impact the wood demand-supply at a global scale owing to its fast growth and the possibility of cultivation over a wider geographical range.
Collapse
|
4
|
Integrating unsupervised and supervised learning techniques to predict traumatic brain injury: A population-based study. INTELLIGENCE-BASED MEDICINE 2023; 8:100118. [PMID: 38222038 PMCID: PMC10785655 DOI: 10.1016/j.ibmed.2023.100118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
This work aimed to identify pre-existing health conditions of patients with traumatic brain injury (TBI) and develop predictive models for the first TBI event and its external causes by employing a combination of unsupervised and supervised learning algorithms. We acquired up to five years of pre-injury diagnoses for 488,107 patients with TBI and 488,107 matched control patients who entered the emergency department or acute care hospitals between April 1st, 2002, and March 31st, 2020. Diagnoses were obtained from the Ontario Health Insurance Plan (OHIP) database which contains province-wide claims data by physicians in Ontario, Canada for inpatient and outpatient services. A screening process was conducted on the OHIP diagnostic codes to limit the subsequent analysis to codes that were predictive of TBI, which concluded that 314 codes were significantly associated with TBI. The Latent Dirichlet Allocation (LDA) model was applied to the diagnostic codes and generated an optimal number of 19 topics that concur with published literature but also suggest other unexplored areas. Estimated word-topic probabilities from the LDA model helped us detect pre-morbid conditions among patients with TBI by uncovering the underlying patterns of diagnoses, meanwhile estimated document-topic probabilities were utilized in variable creation as form of a dimension reduction. We created 19 topic scores for each patient in the cohort which were utilized along with socio-demographic factors for Random Forest binary classifier models. Test set performances evaluated using area under the receiver operating characteristic curve (AUC) were: TBI event (AUC = 0.85), external cause of injury: falls (AUC = 0.85), struck by/against (AUC = 0.83), cyclist collision (AUC = 0.76), motor vehicle collision (AUC = 0.83). Our analysis successfully demonstrated the feasibility of using machine learning to predict TBI due to various external causes and identified the most important factors that contribute to this prediction.
Collapse
|
5
|
Mental health concerns precede quits: shifts in the work discourse during the Covid-19 pandemic and great resignation. EPJ DATA SCIENCE 2023; 12:49. [PMID: 37840553 PMCID: PMC10570174 DOI: 10.1140/epjds/s13688-023-00417-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 08/25/2023] [Indexed: 10/17/2023]
Abstract
To study the causes of the 2021 Great Resignation, we use text analysis and investigate the changes in work- and quit-related posts between 2018 and 2021 on Reddit. We find that the Reddit discourse evolution resembles the dynamics of the U.S. quit and layoff rates. Furthermore, when the COVID-19 pandemic started, conversations related to working from home, switching jobs, work-related distress, and mental health increased, while discussions on commuting or moving for a job decreased. We distinguish between general work-related and specific quit-related discourse changes using a difference-in-differences method. Our main finding is that mental health and work-related distress topics disproportionally increased among quit-related posts since the onset of the pandemic, likely contributing to the quits of the Great Resignation. Along with better labor market conditions, some relief came beginning-to-mid-2021 when these concerns decreased. Our study underscores the importance of having access to data from online forums, such as Reddit, to study emerging economic phenomena in real time, providing a valuable supplement to traditional labor market surveys and administrative data. Supplementary Information The online version contains supplementary material available at 10.1140/epjds/s13688-023-00417-2.
Collapse
|
6
|
Face time with physicians: How do patients assess providers in video-visits? Heliyon 2023; 9:e16883. [PMID: 37292342 PMCID: PMC10238118 DOI: 10.1016/j.heliyon.2023.e16883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/30/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction The COVID-19 pandemic has triggered a massive acceleration in the use of virtual and video-visits. As more patients and providers engage in video-visits over varied digital platforms, it is important to understand how patients assess their providers and the video-visit experiences. We also need to examine the relative importance of the factors that patients use in their assessment of video-visits in order to improve the overall healthcare experience and delivery. Methods A data set of 5149 reviews of patients completing a video-visit was assembled through web scraping. Sentiment analysis was performed on the reviews and topic modeling was used to extract latent topics embedded in the reviews and their relative importance. Results Most patient reviews (89.53%) reported a positive sentiment towards their providers in video-visits. Seven distinct topics underlying the reviews were identified: bedside manners, professional expertise, virtual experience, appointment scheduling and follow-up process, wait times, costs, and communication. Communication, bedside manners and professional expertise were the top factors patients alluded to in the positive reviews. Appointment-scheduling and follow-ups, wait-times, costs, virtual experience and professional expertise were important factors in the negative reviews. Discussion To improve the overall experience of patients in video-visits, providers need to engage in clear communication, grow excellent bedside and webside manners, promptly attend the video-visit with minimal delays and follow-up with patients after the visit.
Collapse
|
7
|
Understanding public discourse surrounding the impact of bitcoin on the environment in social media. GEOJOURNAL 2023; 88:1-25. [PMID: 38625109 PMCID: PMC10040309 DOI: 10.1007/s10708-023-10856-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/23/2023] [Indexed: 04/17/2024]
Abstract
Increasing public concerns about the environment have led to many studies that have explored current issues and approaches towards its protection. Much less studied, however, is topic of public opinion surrounding the impact that cryptocurrencies are having on the environment. The cryptocurrency market, in particular, bitcoin, currently rivals other top well-known assets such as precious metals and exchanged traded funds in market value, and its growing. This work examines public opinion expressed about the environmental impacts of bitcoin derived from Twitter feeds. Three primary research questions were addressed in this work related to topics of public interest, their location, and people and places involved. Our findings show that factions of of the public are interest in protecting the environment, with topics that resonate mainly related to energy. This discourse was also taking place at few similar locations with a mix of different people and places of interest.
Collapse
|
8
|
Short text topic modelling using local and global word-context semantic correlation. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1-23. [PMID: 36747894 PMCID: PMC9891888 DOI: 10.1007/s11042-023-14352-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/21/2022] [Accepted: 01/02/2023] [Indexed: 06/18/2023]
Abstract
Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.
Collapse
|
9
|
Critical reflections on three popular computational linguistic approaches to examine Twitter discourses. PeerJ Comput Sci 2023; 9:e1211. [PMID: 37346687 PMCID: PMC10280252 DOI: 10.7717/peerj-cs.1211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 12/19/2022] [Indexed: 06/23/2023]
Abstract
Although computational linguistic methods-such as topic modelling, sentiment analysis and emotion detection-can provide social media researchers with insights into online public discourses, it is not inherent as to how these methods should be used, with a lack of transparent instructions on how to apply them in a critical way. There is a growing body of work focusing on the strengths and shortcomings of these methods. Through applying best practices for using these methods within the literature, we focus on setting expectations, presenting trajectories, examining with context and critically reflecting on the diachronic Twitter discourse of two case studies: the longitudinal discourse of the NHS Covid-19 digital contact-tracing app and the snapshot discourse of the Ofqual A Level grade calculation algorithm, both related to the UK. We identified difficulties in interpretation and potential application in all three of the approaches. Other shortcomings, such the detection of negation and sarcasm, were also found. We discuss the need for further transparency of these methods for diachronic social media researchers, including the potential for combining these approaches with qualitative ones-such as corpus linguistics and critical discourse analysis-in a more formal framework.
Collapse
|
10
|
Public perception on 'healthy ageing' in the past decade: An unsupervised machine learning of 63,809 Twitter posts. Heliyon 2023; 9:e13118. [PMID: 36747557 PMCID: PMC9898637 DOI: 10.1016/j.heliyon.2023.e13118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 01/17/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023] Open
Abstract
The World Health Organization (WHO) started the initiative on healthy ageing from 2016 to 2020, which has now continued into the United Nations (UN) Decade of Healthy Ageing 2021-2030. Research into healthy ageing and healthy ageing communities have emphasized that the concept of healthy ageing encompasses a plurality of views and has multiple dimensions. Anchored in a transdisciplinary approach, the present report thus aimed to investigate public perceptions of healthy ageing via a deep analysis of social media posts on Twitter. Original tweets, containing the terms "Healthy Ageing" OR "healthy aging" OR "healthyageing" OR "healthyaging", and posted in English between 1 January 2012 and 30 June 2022 were extracted. Bidirectional Encoder Representations from Transformers (BERT) Named Entity Recognition was applied to select for individual users. Topic modelling, specifically BERTopic was used to generate interpretable topics and descriptions pertaining to the concept of healthy ageing. Subsequently, manual thematic analysis was performed by the study investigators, with independent reviews of the topic labels and themes. A total of 63,809 unique tweets were analyzed and clustered semantically into 16 topics. The public perception of healthy ageing could be broadly grouped into three themes: (1) healthy diet and lifestyle, (2) maintaining normal bodily functions and (3) preventive care. While most perceptions dovetail WHO's definition, there are some points regarding skin appearances, beauty and aging that should be closely considered in the design of initiatives in the UN Decade of Healthy Ageing and beyond.
Collapse
|
11
|
Unsupervised title and abstract screening for systematic review: a retrospective case-study using topic modelling methodology. Syst Rev 2023; 12:1. [PMID: 36597132 PMCID: PMC9811792 DOI: 10.1186/s13643-022-02163-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 12/21/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The importance of systematic reviews in collating and summarising available research output on a particular topic cannot be over-emphasized. However, initial screening of retrieved literature is significantly time and labour intensive. Attempts at automating parts of the systematic review process have been made with varying degree of success partly due to being domain-specific, requiring vendor-specific software or manually labelled training data. Our primary objective was to develop statistical methodology for performing automated title and abstract screening for systematic reviews. Secondary objectives included (1) to retrospectively apply the automated screening methodology to previously manually screened systematic reviews and (2) to characterize the performance of the automated screening methodology scoring algorithm in a simulation study. METHODS We implemented a Latent Dirichlet Allocation-based topic model to derive representative topics from the retrieved documents' title and abstract. The second step involves defining a score threshold for classifying the documents as relevant for full-text review or not. The score is derived based on a set of search keywords (often the database retrieval search terms). Two systematic review studies were retrospectively used to illustrate the methodology. RESULTS In one case study (helminth dataset), [Formula: see text] sensitivity compared to manual title and abstract screening was achieved. This is against a false positive rate of [Formula: see text]. For the second case study (Wilson disease dataset), a sensitivity of [Formula: see text] and specificity of [Formula: see text] were achieved. CONCLUSIONS Unsupervised title and abstract screening has the potential to reduce the workload involved in conducting systematic review. While sensitivity of the methodology on the tested data is low, approximately [Formula: see text] specificity was achieved. Users ought to keep in mind that potentially low sensitivity might occur. One approach to mitigate this might be to incorporate additional targeted search keywords such as the indexing databases terms into the search term copora. Moreover, automated screening can be used as an additional screener to the manual screeners.
Collapse
|
12
|
The first year of the Covid-19 pandemic through the lens of r/Coronavirus subreddit: an exploratory study. HEALTH AND TECHNOLOGY 2023; 13:301-326. [PMID: 36846739 PMCID: PMC9942624 DOI: 10.1007/s12553-023-00734-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 01/20/2023] [Indexed: 02/25/2023]
Abstract
Data This study looks at the content on Reddit's COVID-19 community, r/Coronavirus, to capture and understand the main themes and discussions around the global pandemic, and their evolution over the first year of the pandemic. It studies 356,690 submissions (posts) and 9,413,331 comments associated with the submissions, corresponding to the period of 20th January 2020 and 31st January 2021. Methodology On each of these datasets we carried out analysis based on lexical sentiment and topics generated from unsupervised topic modelling. The study found that negative sentiments show higher ratio in submissions while negative sentiments were of the same ratio as positive ones in the comments. Terms associated more positively or negatively were identified. Upon assessment of the upvotes and downvotes, this study also uncovered contentious topics, particularly "fake" or misleading news. Results Through topic modelling, 9 distinct topics were identified from submissions while 20 were identified from comments. Overall, this study provides a clear overview on the dominating topics and popular sentiments pertaining the pandemic during the first year. Conclusion Our methodology provides an invaluable tool for governments and health decision makers and authorities to obtain a deeper understanding of the dominant public concerns and attitudes, which is vital for understanding, designing and implementing interventions for a global pandemic.
Collapse
|
13
|
Trends and gaps in biodiversity and ecosystem services research: A text mining approach. AMBIO 2023; 52:81-94. [PMID: 36057041 PMCID: PMC9666618 DOI: 10.1007/s13280-022-01776-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 02/01/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
Understanding the relationship between biodiversity conservation and ecosystem services concepts is essential for evidence-based policy development. We used text mining augmented by topic modelling to analyse abstracts of 15 310 peer-reviewed papers (from 2000 to 2020). We identified nine major topics; "Research & Policy", "Urban and Spatial Planning", "Economics & Conservation", "Diversity & Plants", "Species & Climate change", "Agriculture", "Conservation and Distribution", "Carbon & Soil & Forestry", "Hydro-& Microbiology". The topic "Research & Policy" performed highly, considering number of publications and citation rate, while in the case of other topics, the "best" performances varied, depending on the indicator applied. Topics with human, policy or economic dimensions had higher performances than the ones with 'pure' biodiversity and science. Agriculture dominated over forestry and fishery sectors, while some elements of biodiversity and ecosystem services were under-represented. Text mining is a powerful tool to identify relations between research supply and policy demand.
Collapse
|
14
|
Public sentiment on the global outbreak of monkeypox: an unsupervised machine learning analysis of 352,182 twitter posts. Public Health 2022; 213:1-4. [PMID: 36308872 PMCID: PMC9597903 DOI: 10.1016/j.puhe.2022.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/13/2022] [Indexed: 11/05/2022]
Abstract
OBJECTIVES This study aimed to study the public's sentiments on the current monkeypox outbreaks via an unsupervised machine learning analysis of social media posts. STUDY DESIGN This was an exploratory analysis of tweets sentiments. METHODS We extracted original tweets containing the terms 'monkeypox', 'monkey pox' or 'monkey_pox' and posted them in the English language from 6 May 2022 (first case detected in the United Kingdom) to 23 July 2022 (when World Health Organization declared Monkeypox to be a global health emergency). Retweets and duplicate tweets were excluded from study. Bidirectional Encoder Representations from Transformers (BERT) Named Entity Recognition. This was followed by topic modelling (specifically BERTopic) and manual thematic analysis by the study team, with independent reviews of the topic labels and themes. RESULTS Based on topic modelling and thematic analysis of a total of 352,182 Twitter posts, we derived five topics clustered into three major themes related to the public discourse on the ongoing outbreaks. These include concerns of safety, stigmatisation of minority communities, and a general lack of faith in public institutions. The public sentiments underscore growing (and existing) partisanship, personal health worries in relation to the evolving situation, as well as concerns of the media's portrayal of lesbian, gay, bisexual, transgender and queer and minority communities, which might further stigmatise these groups. CONCLUSIONS Monkeypox is an emerging infectious disease of public concern. Our study has highlighted important societal issues, including misinformation, political mistrust and anti-gay stigma that should be sensitively considered when designing public health policies to contain the ongoing outbreaks.
Collapse
|
15
|
Identifying learners' topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques. EDUCATION AND INFORMATION TECHNOLOGIES 2022; 28:5567-5584. [PMID: 36373041 PMCID: PMC9638446 DOI: 10.1007/s10639-022-11373-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 09/22/2022] [Indexed: 05/24/2023]
Abstract
Interests play an essential role in the process of learning, thereby enriching learners 'interests will yield to an enhanced experience in MOOCs. Learners interact freely and spontaneously on social media through different forms of user-generated content which contain hidden information that reveals their real interests and preferences. In this paper, we aim to identify and extract the topical interest from the text content shared by learners on social media to enrich their course preferences in MOOCs. We apply NLP pipeline and topic modeling techniques to the textual feature using three well-known topic models: Latent Dirichlet Allocation, Latent Semantic Analysis, and BERTopic. The results of our experimentation have shown that BERTopic performed better on the scrapped dataset.
Collapse
|
16
|
An introduction to text analytics for educators. CURRENTS IN PHARMACY TEACHING & LEARNING 2022; 14:1319-1325. [PMID: 36280557 PMCID: PMC9904956 DOI: 10.1016/j.cptl.2022.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 07/23/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
OUR SITUATION Educators often find themselves in possession of large amounts of text-based materials, such as student reflections, narrative feedback, and assignments. While these materials can provide critical insight into topics of interest, they also require a substantial amount of time to read, interpret, and use. The purpose of this article is to describe and provide recommendations for text analytics. METHODOLOGICAL LITERATURE REVIEW An overview of text analytics is provided, including a brief history, common types of contemporary techniques, and the basic phases of text analytics. Several examples of common text analytics techniques are used to illustrate this approach. OUR RECOMMENDATIONS AND THEIR APPLICATIONS Practical recommendations are provided to support the use of text analytics in pharmacy education. These recommendations include: (1) clarify the purpose of the text analytics; (2) ensure the research questions are relevant and grounded in the literature; (3) develop a processing strategy and create a dictionary; (4) explore various tools for analysis and visualization; (5) establish tolerance for error; (6) train, calibrate, and validate the analytic strategy; and (7) collaborate and equip yourself. POTENTIAL IMPACT Text analytics provide a systematic approach to generating information from text-based materials. Several benefits to this approach are apparent, such as improving the efficiency of analyzing text and elucidating new knowledge. Despite recent developments in text analytics techniques, limitations to this approach remain. Efforts to improve usability and accessibility of text analytics remain ongoing, and pharmacy educators should position their work within the context of these limitations.
Collapse
|
17
|
Temporal trends and spatial distribution of research topics in anthropogenic marine debris study: Topic modelling using latent Dirichlet allocation. MARINE POLLUTION BULLETIN 2022; 182:113917. [PMID: 35908484 DOI: 10.1016/j.marpolbul.2022.113917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
The release of anthropogenic marine debris (AMD) is one of the major environmental challenges of our time. In this study, a topic model called latent Dirichlet allocation (LDA) was used to infer the research topics about AMD to provide the whole picture of the research area. The results of the LDA showed that the AMD research topics are mostly applied topics and belong to interdisciplinary or transdisciplinary research areas. Furthermore, the analysis of the temporal trends of the topics showed that topics related to such as plastic pollution exhibit an upward trend, whereas those dealing with the spatiotemporal dynamics and distribution patterns of marine debris showed a downward trend. The analysis of topic distribution over countries showed that research is scarce in landlocked countries. The findings of this study can be used as a map for the area of AMD study by various stakeholders related to marine debris issues.
Collapse
|
18
|
Psycholinguistic changes in the communication of adolescent users in a suicidal ideation online community during the COVID-19 pandemic. Eur Child Adolesc Psychiatry 2022; 32:975-985. [PMID: 36018514 PMCID: PMC9415261 DOI: 10.1007/s00787-022-02067-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 08/05/2022] [Indexed: 11/03/2022]
Abstract
Since the outbreak of the COVID-19 pandemic, increases in suicidal ideation and suicide attempts in adolescents have been registered. Many adolescents experiencing suicidal ideation turn to online communities for social support. In this retrospective observational study, we investigated the communication-language style, contents and user activity-in 7975 unique posts and 51,119 comments by N = 2862 active adolescent users in a large suicidal ideation support community (SISC) on the social media website reddit.com in the onset period of the COVID-19 pandemic. We found significant relative changes in language style markers for hopelessness such as negative emotion words (+ 10.00%) and positive emotion words (- 3.45%) as well as for social disengagement such as social references (- 8.63%) and 2nd person pronouns (- 33.97%) since the outbreak of the pandemic. Using topic modeling with Latent Dirichlet Allocation (LDA), we identified significant changes in content for the topics Hopelessness (+ 23.98%), Suicide Methods (+ 17.11%), Social Support (- 14.91%), and Reaching Out to users (- 28.97%). Changes in user activity point to an increased expression of mental health issues and decreased engagement with other users. The results indicate a potential shift in communication patterns with more adolescent users expressing their suicidal ideation rather than relating with or supporting other users during the COVID-19 pandemic.
Collapse
|
19
|
Latent DIRICHLET allocation (LDA) based information modelling on BLOCKCHAIN technology: a review of trends and research patterns used in integration. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:36805-36831. [PMID: 36035323 PMCID: PMC9391652 DOI: 10.1007/s11042-022-13500-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 09/16/2021] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
The past decade is known as the era of integrations where multiple technologies had integrated, and new research trends were seen. The security of data and information in the digital world has been a challenge to everyone; Blockchain technology has attracted many researchers in these scenarios. This paper focuses on finding the current trends in Blockchain technology to help the researchers select an area to carry future research. The data related to Blockchain Technologies have been collected from IEEE, Springer, ACM, and other digital databases. Then, the formulated corpus is used for topic modelling, and Latent Dirichlet Allocation is deployed. The outcomes of the Latent Dirichlet Allocation model are then analyzed based on various extracted key terms and key documents found for each topic. All the topic solution has been identified from the bag of words. The extracted topics are thereafter semantically mapped. Thus, based on the analysis of more than 900 papers, the most recent research trends have been discussed in this paper, ultimately focusing on the areas that need more attention from the research community. Also, the meta data analysis has been accomplished, evaluating the year wise and publication source wise research growth. More than 15 research directions are elaborated in this paper, which can direct and guide the researchers to pursuit the research in specific trends and also, find the research gaps in various technologies associated with Blockchain Technology.
Collapse
|
20
|
A macro perspective of the perceptions of the education system via topic modelling analysis. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:1783-1820. [PMID: 35702681 PMCID: PMC9186274 DOI: 10.1007/s11042-022-13202-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/05/2022] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
Education quality has become an important issue and has received considerable attention around the world, especially due to its relevant repercussions on the socio-economical development of society. In recent years, many nations have realized the need for a highly skilled workforce to thrive in the emerging knowledge-based economy. They have consequently adopted strategies to identify the lines of action to improve the education quality. In response to the government's efforts to improve the education quality in Colombia, this study examines the current perceptions of the education system from the perspective of key local stakeholders. Therefore, we used a survey that contained open-ended questions to collect information about the limitations and difficulties of the education process for several groups of participants. The collected answers were categorized into a variety of topics using a Latent Dirichlet Allocation based model. Consequently, the students', teachers' and parents' answers were analyzed separately to obtain a general landscape of the perceptions of the education system. Evaluation metrics, such as topic coherence, were quantitatively analyzed to assess the modelling performance. In addition, a methodology for the hyper-parameters setting and the final topic labelling was presented. The results suggest that topic modelling strategies are a viable alternative to identify strategic lines of action and to obtain a macro-perspective of the perceptions of the education system.
Collapse
|
21
|
Mapping research on healthcare operations and supply chain management: a topic modelling-based literature review. ANNALS OF OPERATIONS RESEARCH 2022; 315:29-55. [PMID: 35382453 PMCID: PMC8972768 DOI: 10.1007/s10479-022-04596-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/14/2022] [Indexed: 06/14/2023]
Abstract
The literature on healthcare operations and supply chain management has seen unprecedented growth over the past two decades. This paper seeks to advance the body of knowledge on this topic by utilising a topic modelling-based literature review to identify the core topics, examine their dynamic changes, and identify opportunities for further research in the area. Based on an analysis of 571 articles published until 25 January 2022, we identify numerous popular topics of research in the area, including patient waiting time, COVID-19 pandemic, Industry 4.0 technologies, sustainability, risk and resilience, climate change, circular economy, humanitarian logistics, behavioural operations, service-ecosystem, and knowledge management. We reviewed current literature around each topic and offered insights into what aspects of each topic have been studied and what are the recent developments and opportunities for more impactful future research. Doing so, this review help advance the contemporary scholarship on healthcare operations and supply chain management and offers resonant insights for researchers, research students, journal editors, and policymakers in the field.
Collapse
|
22
|
Mapping the landscape of Consumer Food Waste. Appetite 2021; 168:105702. [PMID: 34555494 DOI: 10.1016/j.appet.2021.105702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Since 2015 there has been a surge of academic publications and citations focused on consumer food waste. To introduce a special issue of Appetite focused on the drivers of consumer food waste we perform a transdisciplinary and historical review of the literature through a co-citation network analysis and topic modelling approach. We show that the rapid increase in publications is largely attributable to an urgency caused by the Sustainable Development Goals and climate change. Topic modelling reveals that the dramatic quantitative increase of publications has also produced a variety of evolving themes, and that a metaphorical Cambrian Explosion is occurring after decades of academic inactivity. Network analysis results show that consumer food waste features in thousands of articles and hundreds of journals, but that the citation practices of academics are becoming highly concentrated, as 20% of journals attract over 80% of citations. Finally, by examining the burstiness and transdisciplinary structure of citation networks we show that though the field has historically been dominated by empirical articles, it is now starting to show signs of maturity as a flurry of review papers help to consolidate knowledge.
Collapse
|
23
|
Twitter data analysis to assess the interest of citizens on the impact of marine plastic pollution. MARINE POLLUTION BULLETIN 2021; 170:112620. [PMID: 34218034 DOI: 10.1016/j.marpolbul.2021.112620] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/02/2021] [Accepted: 06/03/2021] [Indexed: 06/13/2023]
Abstract
Few studies have mined social media platforms to assess environmental concerns. In this study, Twitter was scraped to obtain a ~140,000 tweet dataset related specifically to marine plastic pollution. The goal is to understand what kind of users profiles are tweeting and how and when they do it. In addition, topic modelling and graph theory techniques have allowed us to identify main concerns on this topic: i) impact on wildlife, ii) microplastics/water pollution, iii) estimates/reports, iv) legislation/protection, and v) recycling/cleaning initiatives. Results reveal a scarce influence of organizations involved in research and marine environmental awareness, so some guidelines are depicted that could help to adjust their communication plans. This is relevant to engage society through reliable information, change habits and reinforce sustainable behaviour. A visualization tool has been created to analyze the results over time.
Collapse
|
24
|
Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization. SOCIAL NETWORK ANALYSIS AND MINING 2021; 11:57. [PMID: 34149960 PMCID: PMC8204930 DOI: 10.1007/s13278-021-00767-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 05/26/2021] [Accepted: 06/04/2021] [Indexed: 11/23/2022]
Abstract
Social media platforms like Twitter have become an easy portal for billions of people to connect and exchange their thoughts. Unfortunately, people commonly use these platforms to share misinformation which can influence other people adversely. The spread of misinformation is unavoidable in an extraordinary situation like Covid-19, and the consequences can be dreadful. This paper proposes a two-step ranking-based misinformation detection (RMiD) technique. Firstly, a novel ranking-based approach leveraging the scalable information retrieval infrastructure is applied to detect misinformation from a huge collection of unlabelled tweets based on a related but very small labelled misinformation data set. Secondly, the identified misinformation tweets are represented as a coupled matrix tensor model and Nonnegative Coupled Matrix Tensor Factorization is applied to learn their spatio-temporal topic dynamics. The experimental analysis shows that RMiD is capable of detecting misinformation with better coverage and less noise in comparison with existing techniques. Moreover, the coupled matrix tensor representation has improved the quality of topics discovered from unlabelled data up to 4% by leveraging the semantic similarity of terms in labelled data. SUPPLEMENTARY INFORMATION The online version supplementary material available at 10.1007/s13278-021-00767-7.
Collapse
|
25
|
Straws, seals, and supermarkets: Topics in the newspaper coverage of marine plastic pollution. MARINE POLLUTION BULLETIN 2021; 166:112211. [PMID: 33711608 DOI: 10.1016/j.marpolbul.2021.112211] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/26/2021] [Indexed: 06/12/2023]
Abstract
Media attention to marine plastic pollution is increasing, yet it is unclear which topics are being discussed. This paper analyses all 2019 news articles referencing marine plastics in the four leading UK online newspapers. Examining 943 articles in a structural topic model, this is the first analysis to depict what is being reported and how this varied according to political alignment (right vs. left-wing), type (broadsheet vs. tabloid), and publication date. We identified 36 topics, suggesting a large variety in the coverage, with plastic pollution ranging from the primary focus to only mentioned in passing. Greater emphasis was on explaining current issues of marine plastics, with limited reference to actionable reduction measures or producer responsibility. Many topics' prevalence varied across the media outlets. We discuss how this coverage varies across media outlets, and how it relates to a broader context (i.e. potential links to behaviour and current policy efforts).
Collapse
|
26
|
Mind the gap: Developments in autonomous driving research and the sustainability challenge. JOURNAL OF CLEANER PRODUCTION 2020; 275:124087. [PMID: 32934442 PMCID: PMC7484706 DOI: 10.1016/j.jclepro.2020.124087] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 08/25/2020] [Accepted: 08/31/2020] [Indexed: 06/11/2023]
Abstract
Scientific knowledge on autonomous-driving technology is expanding at a faster-than-ever pace. As a result, the likelihood of incurring information overload is particularly notable for researchers, who can struggle to overcome the gap between information processing requirements and information processing capacity. We address this issue by adopting a multi-granulation approach to latent knowledge discovery and synthesis in large-scale research domains. The proposed methodology combines citation-based community detection methods and topic modelling techniques to give a concise but comprehensive overview of how the autonomous vehicle (AV) research field is conceptually structured. Thirteen core thematic areas are extracted and presented by mining the large data-rich environments resulting from 50 years of AV research. The analysis demonstrates that this research field is strongly oriented towards examining the technological developments needed to enable the widespread rollout of AVs, whereas it largely overlooks the wide-ranging sustainability implications of this sociotechnical transition. On account of these findings, we call for a broader engagement of AV researchers with the sustainability concept and we invite them to increase their commitment to conducting systematic investigations into the sustainability of AV deployment. Sustainability research is urgently required to produce an evidence-based understanding of what new sociotechnical arrangements are needed to ensure that the systemic technological change introduced by AV-based transport systems can fulfill societal functions while meeting the urgent need for more sustainable transport solutions.
Collapse
|
27
|
Grounded reality meets machine learning: A deep-narrative analysis framework for energy policy research. ENERGY RESEARCH & SOCIAL SCIENCE 2020; 69:101704. [PMID: 33145178 PMCID: PMC7563684 DOI: 10.1016/j.erss.2020.101704] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 07/12/2020] [Accepted: 07/15/2020] [Indexed: 06/11/2023]
Abstract
Text-based data sources like narratives and stories have become increasingly popular as critical insight generator in energy research and social science. However, their implications in policy application usually remain superficial and fail to fully exploit state-of-the-art resources which digital era holds for text analysis. This paper illustrates the potential of deep-narrative analysis in energy policy research using text analysis tools from the cutting-edge domain of computational social sciences, notably topic modelling. We argue that a nested application of topic modelling and grounded theory in narrative analysis promises advances in areas where manual-coding driven narrative analysis has traditionally struggled with directionality biases, scaling, systematisation and repeatability. The nested application of the topic model and the grounded theory goes beyond the frequentist approach of narrative analysis and introduces insight generation capabilities based on the probability distribution of words and topics in a text corpus. In this manner, our proposed methodology deconstructs the corpus and enables the analyst to answer research questions based on the foundational element of the text data structure. We verify theoretical compatibility through a meta-analysis of a state-of-the-art bibliographic database on energy policy, narratives and computational social science. Furthermore, we establish a proof-of-concept using a narrative-based case study on energy externalities in slum rehabilitation housing in Mumbai, India. We find that the nested application contributes to the literature gap on the need for multidisciplinary methodologies that can systematically include qualitative evidence into policymaking.
Collapse
|
28
|
Beyond the topics: how deep learning can improve the discriminability of probabilistic topic modelling. PeerJ Comput Sci 2020; 6:e252. [PMID: 33816904 PMCID: PMC7924555 DOI: 10.7717/peerj-cs.252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 12/23/2019] [Indexed: 11/20/2022]
Abstract
The article presents a discriminative approach to complement the unsupervised probabilistic nature of topic modelling. The framework transforms the probabilities of the topics per document into class-dependent deep learning models that extract highly discriminatory features suitable for classification. The framework is then used for sentiment analysis with minimum feature engineering. The approach transforms the sentiment analysis problem from the word/document domain to the topics domain making it more robust to noise and incorporating complex contextual information that are not represented otherwise. A stacked denoising autoencoder (SDA) is then used to model the complex relationship among the topics per sentiment with minimum assumptions. To achieve this, a distinct topic model and SDA per sentiment polarity is built with an additional decision layer for classification. The framework is tested on a comprehensive collection of benchmark datasets that vary in sample size, class bias and classification task. A significant improvement to the state of the art is achieved without the need for a sentiment lexica or over-engineered features. A further analysis is carried out to explain the observed improvement in accuracy.
Collapse
|
29
|
Mapping research in assisted reproduction worldwide. Reprod Biomed Online 2019; 40:71-81. [PMID: 31862416 DOI: 10.1016/j.rbmo.2019.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 10/17/2019] [Accepted: 10/18/2019] [Indexed: 10/25/2022]
Abstract
RESEARCH QUESTION What are the current research trends in human assisted reproduction around the world? DESIGN An analysis of 26,000+ scientific publications (articles, letters and reviews) produced worldwide between 2005 and 2016. The corpus of publications indexed in PubMed was obtained by combining the Medical Subject Heading (MeSH) terms: 'Reproductive techniques', 'Reproductive medicine', 'Reproductive health', 'Fertility', 'Infertility' and 'Germ cells'. An analysis was then carried out using text mining algorithms to obtain the main topics of interest. RESULTS A total of 44 main topics were identified, which were then further grouped into 11 categories: 'Laboratory techniques', 'Male factor', 'Quality of ART, ethics and law', 'Female factor', 'Public health and infectious diseases', 'Basic research and genetics', 'Pregnancy complications and risks', 'General - infertility & ART', 'Psychosocial aspects', 'Cancer' and 'Research methodology'. The USA was the leading country in terms of number of publications, followed by the UK, China and France. Research content in high-income countries is fairly homogeneous across categories and it is dominated by 'Laboratory techniques' in Western-Southern Europe, and by 'Quality of ART, ethics and law' in North America, Australia and New Zealand. 'Laboratory techniques' is also the most abundant category on a yearly basis. CONCLUSIONS This study identifies the current hot topics on human assisted reproduction worldwide and their temporal trends for 2005-2016. This provides an innovative picture of the current research that could help explore the areas where further research is needed.
Collapse
|
30
|
Tracking biomedicalization in the media: Public discourses on health and medicine in the UK and Italy, 1984-2017. Soc Sci Med 2019; 243:112621. [PMID: 31677575 DOI: 10.1016/j.socscimed.2019.112621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 10/13/2019] [Accepted: 10/17/2019] [Indexed: 10/25/2022]
Abstract
This article examines historical trends in the reporting of health, illness and medicine in UK and Italian newspapers from 1984 to 2017. It focuses on the increasing "biomedicalization" of health reporting and the framing of health and medicine as a matter of technoscientific interventions. Methodologically, we relied on two large datasets consisting of all the health- and medicine-related articles published in the online archives of The Guardian (UK) and la Repubblica (Italy). These articles underwent a quantitative analysis, based on topic modelling techniques, to identify and analyse relevant topics in the datasets. Moreover, we developed some synthetic indices to support the analysis of how medical and health news are "biomedicalized" in media coverage. Theoretically, we emphasise that media represent a constitutive environment in shaping biomedicalization processes. Our analyses show that across the period under scrutiny, biomedicalization is a relevant, even if sometimes ambivalent, frame in the media sphere, placing growing centrality on three dimensions: i) health and well-being as a matter of individual commitment to self-monitoring and self-surveillance; ii) biomedicine as a large technoscientific enterprise emerging from the entanglement between research fields and their technological embodiments; iii) the multiverse reforms of welfare systems in facing the trade-off between universal health coverage and the need to render the national healthcare system more sustainable and compatible with non-expansionary monetary policies and austerity approaches in managing state government budgets.
Collapse
|
31
|
Abstract
Background Social media plays a more and more important role in the research of health and healthcare due to the fast development of internet communication and information exchange. This paper conducts a bibliometric analysis to discover the thematic change and evolution of utilizing social media for healthcare research field. Methods With the basis of 4361 publications from both Web of Science and PubMed during the year 2008–2017, the analysis utilizes methods including topic modelling and science mapping analysis. Results Utilizing social media for healthcare research has attracted increasing attention from scientific communities. Journal of Medical Internet Research is the most prolific journal with the USA dominating in the research. Overly, major research themes such as YouTube analysis and Sex event are revealed. Themes in each time period and how they evolve across time span are also detected. Conclusions This systematic mapping of the research themes and research areas helps identify research interests and how they evolve across time, as well as providing insight into future research direction. Electronic supplementary material The online version of this article (10.1186/s12911-019-0757-4) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text. J Med Syst 2019; 43:93. [PMID: 30834466 DOI: 10.1007/s10916-019-1225-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 02/21/2019] [Indexed: 11/27/2022]
Abstract
Sentiment analysis is a process of computationally finding the opinions that are expressed in a short text or a feedback by a writer towards a particular topic, product, service. The short piece of review from the user can help a business determine or understand the attitude of the user thereby predict the customer's behaviour and itsubstantiallyimproves the quality of service parameters. The proposed Robust User Sentiment Biterm Topic Mixture (RUSBTM)model discovers the user preference and their sentiment orientation views for effective Topic Modelling using Biterms or word-pair from the short text of a particular venue. Since short review or text suffers from data sparse, the user aggregation strategy is adapted to form a pseudo document and the word pairset is created for the whole corpus. The RUSBTM learns topics by generating the word co-occurrence patterns thereby inferring topics with rich corpus-level information. By analysing the sentiments of the paired words and their corresponding topics in the review corpus of the particular venue, prediction can be done that exactly portrays the user interest, preference and expectation from a particular venue. The RUSBTM model proved to be more robust and also, the extracted topics are more coherent and informative. Also the method uses accurate sentiment polarity techniques to exactly capture the sentiment orientation and the model proves to be outperforming better when compared to other state of art methods.
Collapse
|
33
|
Content based medical image retrieval using topic and location model. J Biomed Inform 2019; 91:103112. [PMID: 30738189 DOI: 10.1016/j.jbi.2019.103112] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 12/23/2018] [Accepted: 01/25/2019] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND OBJECTIVE Retrieval of medical images from an anatomically diverse dataset is a challenging task. Objective of our present study is to analyse the automated medical image retrieval system incorporating topic and location probabilities to enhance the performance. MATERIALS AND METHODS In this paper, we present an automated medical image retrieval system using Topic and Location Model. The topic information is generated using Guided Latent Dirichlet Allocation (GuidedLDA) method. A novel Location Model is proposed to incorporate the spatial information of visual words. We also introduce a new metric called position weighted Precision (wPrecision) to measure the rank order of the retrieved images. RESULTS Experiments on two large medical image datasets - IRMA 2009 and Multimodal dataset - revealed that the proposed method outperforms existing medical image retrieval systems in terms of Precision and Mean Average Precision. The proposed method achieved better Mean Average Precision (86.74%) compared to the recent medical image retrieval systems using the Multimodal dataset with 7200 images. The proposed system achieved better Precision (97.5%) for top ten images compared to the recent medical image retrieval systems using IRMA 2009 dataset with 14,410 images. CONCLUSION Supplementing spatial details of visual words to the Topic Model enhances the retrieval efficiency of medical images from large repositories. Such automated medical image retrieval systems can be used to assist physician to retrieve medical images with better precision compared to the state-of-the-art retrieval systems.
Collapse
|
34
|
A comparative quantitative study of utilizing artificial intelligence on electronic health records in the USA and China during 2008-2017. BMC Med Inform Decis Mak 2018; 18:117. [PMID: 30526643 PMCID: PMC6284279 DOI: 10.1186/s12911-018-0692-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The application of artificial intelligence techniques for processing electronic health records data plays increasingly significant role in advancing clinical decision support. This study conducts a quantitative comparison on the research of utilizing artificial intelligence on electronic health records between the USA and China to discovery their research similarities and differences. METHODS Publications from both Web of Science and PubMed are retrieved to explore the research status and academic performances of the two countries quantitatively. Bibliometrics, geographic visualization, collaboration degree calculation, social network analysis, latent dirichlet allocation, and affinity propagation clustering are applied to analyze research quantity, collaboration relations, and hot research topics. RESULTS There are 1031 publications from the USA and 173 publications from China during 2008-2017 period. The annual numbers of publications from the USA and China increase polynomially. JAMIA with 135 publications and JBI with 13 publications are the top prolific journals for the USA and China, respectively. Harvard University with 101 publications and Zhejiang University with 12 publications are the top prolific affiliations for the USA and China, respectively. Massachusetts is the most prolific region with 211 publications for the USA, while for China, Taiwan is the top 1 with 47 publications. China has relatively higher institutional and international collaborations. Nine main research areas for the USA are identified, differentiating 7 for China. CONCLUSIONS There is a steadily growing presence and increasing visibility of utilizing artificial intelligence on electronic health records for the USA and China over the years. The results of the study demonstrate the research similarities and differences, as well as strengths and weaknesses of the two countries.
Collapse
|
35
|
An illustrated approach to Soft Textual Cartography. APPLIED NETWORK SCIENCE 2018; 3:27. [PMID: 30839805 PMCID: PMC6214316 DOI: 10.1007/s41109-018-0087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 07/18/2018] [Indexed: 06/09/2023]
Abstract
We propose and illustrate an approach of Soft Textual Cartography consisting in the clustering of regions by taking into account both their spatial relationships and their textual description within a corpus. We reduce large geo-referenced textual content into topics and merge them with their spatial configuration to reveal spatial patterns. The strategy consists in constructing a complex weighted network, reflecting the geographical layout, and whose nodes are further characterised by their thematic dissimilarity, extracted form topic modelling. A soft k-means procedure, taking into account both aspects through expectation maximisation on Gaussian mixture models and label propagation, converges towards a soft membership, to be further compared with expert knowledge on regions. Application on the Wikipedia pages of Swiss municipalities demonstrate the potential of the approach, revealing textual autocorrelation and associations with official classifications. The synergy of the spatial and textual aspects appears promising in topic interpretation and geographical information retrieval, and able to incorporate expert knowledge through the choice of the initial membership.
Collapse
|
36
|
Analysing the opinions of UK veterinarians on practice-based research using corpus linguistic and mathematical methods. Prev Vet Med 2017; 150:60-69. [PMID: 29406085 DOI: 10.1016/j.prevetmed.2017.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 11/20/2017] [Accepted: 11/22/2017] [Indexed: 11/18/2022]
Abstract
The use of corpus linguistic techniques and other related mathematical analyses have rarely, if ever, been applied to qualitative data collected from the veterinary field. The aim of this study was to explore the use of a combination of corpus linguistic analyses and mathematical methods to investigate a free-text questionnaire dataset collected from 3796 UK veterinarians on evidence-based veterinary medicine, specifically, attitudes towards practice-based research (PBR) and improving the veterinary knowledge base. The corpus methods of key word, concordance and collocate analyses were used to identify patterns of meanings within the free text responses. Key words were determined by comparing the questionnaire data with a wordlist from the British National Corpus (representing general English text) using cross-tabs and log-likelihood comparisons to identify words that occur significantly more frequently in the questionnaire data. Concordance and collocation analyses were used to account for the contextual patterns in which such key words occurred, involving qualitative analysis and Mutual Information Analysis (MI3). Additionally, a mathematical topic modelling approach was used as a comparative analysis; words within the free text responses were grouped into topics based on their weight or importance within each response to find starting points for analysis of textual patterns. Results generated from using both qualitative and quantitative techniques identified that the perceived advantages of taking part in PBR centred on the themes of improving knowledge of both individuals and of the veterinary profession as a whole (illustrated by patterns around the words learning, improving, contributing). Time constraints (lack of time, time issues, time commitments) were the main concern of respondents in relation to taking part in PBR. Opinions of what vets could do to improve the veterinary knowledge base focussed on the collecting and sharing of information (record, report), particularly recording and discussing clinical cases (interesting cases), and undertaking relevant continuing professional development activities. The approach employed here demonstrated how corpus linguistics and mathematical methods can help to both identify and contextualise relevant linguistic patterns in the questionnaire responses. The results of the study inform those seeking to coordinate PBR initiatives about the motivators of veterinarians to participate in such initiatives and what concerns need to be addressed. The approach used in this study demonstrates a novel way of analysing textual data in veterinary research.
Collapse
|
37
|
What's all the talk about? Topic modelling in a mental health Internet support group. BMC Psychiatry 2016; 16:367. [PMID: 27793131 PMCID: PMC5084325 DOI: 10.1186/s12888-016-1073-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 10/17/2016] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The majority of content in an Internet Support Group (ISG) is contributed by 1 % of the users ('super users'). Computational methods, such as topic modelling, can provide a large-scale quantitative objective description of this content. Such methods may provide a new perspective on the nature of engagement on ISGs including the role of super users and their possible effect on other users. METHODS A topic model was computed for all posts (N = 131,004) in the ISG BlueBoard using Latent Dirichlet Allocation. A model containing 25 topics was selected on the basis of intelligibility as determined by diagnostic metrics and qualitative investigation. This model yielded 21 substantive topics for further analysis. Two chi-square tests were conducted separately for each topic to ascertain: (i) if the odds of super users' and other users' posting differed for each topic; and (ii) if for super users the odds of posting differed depending on whether the response was to a super user or to another user. RESULTS The 21 substantive topics covered a range of issues related to mental health and peer-support. There were significantly higher odds that super users wrote content on 13 topics, with the greatest effects being for Parenting Role (OR [95%CI] = 7.97 [7.85-8.10]), Co-created Fiction (4.22 [4.17-4.27]), Mental Illness (3.13 [3.11-3.16]) and Positive Change (2.82 [2.79-2.84]). There were significantly lower odds for super users on 7 topics, with the greatest effects being for the topics Depression (OR = 0.27 [0.27-0.28]), Medication (0.36 [0.36-0.37]), Therapy (0.55 [0.54-0.55]) and Anxiety (0.55 [0.55-0.55]). However, super users were significantly more likely to write content on 5 out of these 7 topics when responding to other users than when responding to fellow super users. CONCLUSIONS The findings suggest that super users serve the role of emotionally supportive companions with a focus on topics broadly resembling the consumer/carer model of recovery. Other users engage in topics with a greater focus on experiential knowledge, disclosure and informational support, a pattern resembling the clinical symptom-focussed approach to recovery. However, super users modify their content in response to other users in a manner consistent with being 'active help providers'.
Collapse
|
38
|
Complex temporal topic evolution modelling using the Kullback-Leibler divergence and the Bhattacharyya distance. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:16. [PMID: 27746813 PMCID: PMC5042987 DOI: 10.1186/s13637-016-0050-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 09/12/2016] [Indexed: 11/10/2022]
Abstract
The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.
Collapse
|
39
|
Topic detection using paragraph vectors to support active learning in systematic reviews. J Biomed Inform 2016; 62:59-65. [PMID: 27293211 PMCID: PMC4981645 DOI: 10.1016/j.jbi.2016.06.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Revised: 04/04/2016] [Accepted: 06/05/2016] [Indexed: 12/02/2022]
Abstract
We propose a topic detection method based on paragraph vectors. The method is integrated with an active learner to accelerate citation screening. The method outperforms LDA when applied to clinical and public health reviews.
Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/.
Collapse
|