1
|
Susnjak T. Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature. Methods Mol Biol 2024; 2742:173-183. [PMID: 38165624 DOI: 10.1007/978-1-0716-3561-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
This chapter presents a practical guide for conducting sentiment analysis using Natural Language Processing (NLP) techniques in the domain of tick-borne disease text. The aim is to demonstrate the process of how the presence of bias in the discourse surrounding chronic manifestations of the disease can be evaluated. The goal is to use a dataset of 5643 abstracts collected from scientific journals on the topic of chronic Lyme disease to demonstrate using Python, the steps for conducting sentiment analysis using pretrained language models and the process of validating the preliminary results using both interpretable machine learning tools, as well as a novel methodology of leveraging emerging state-of-the-art large language models like ChatGPT. This serves as a useful resource for researchers and practitioners interested in using NLP techniques for sentiment analysis in the medical domain.
Collapse
Affiliation(s)
- Teo Susnjak
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand.
| |
Collapse
|
2
|
Garg K, Fajardo-Yamamoto LM, Rojas-Castro FC, Susnjak T, Gilbert L. Building a Binary Classification Machine-Learning Model: A Guide to Predicting Participation in a Lyme Disease Program at a Medical Institute. Methods Mol Biol 2024; 2742:185-237. [PMID: 38165625 DOI: 10.1007/978-1-0716-3561-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
The field of data analysis, preparation, and machine learning is rapidly expanding, offering numerous libraries and resources for exploration. Researchers gain knowledge through various channels, but few resources provide a comprehensive framework for building machine-learning models. We present a step-by-step framework for constructing a robust Random Forest classification model to fill this gap. Using the trained model, we predict if individuals visiting Sanoviv Medical Institute between 2020 and 2023 participated in the Lyme disease program based on age, symptoms, blood count, and chemistry results. While not exhaustive, the methods in each step provide a valuable starting point for researchers, promoting an understanding of the fundamental approach to model creation. The framework encourages researchers to explore beyond the outlined techniques, fostering innovation and experimentation.
Collapse
Affiliation(s)
| | | | | | - Teo Susnjak
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | | |
Collapse
|
3
|
Susnjak T, Maddigan P. Forecasting patient flows with pandemic induced concept drift using explainable machine learning. EPJ Data Sci 2023; 12:11. [PMID: 37122585 PMCID: PMC10119825 DOI: 10.1140/epjds/s13688-023-00387-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 04/06/2023] [Indexed: 05/03/2023]
Abstract
Accurately forecasting patient arrivals at Urgent Care Clinics (UCCs) and Emergency Departments (EDs) is important for effective resourcing and patient care. However, correctly estimating patient flows is not straightforward since it depends on many drivers. The predictability of patient arrivals has recently been further complicated by the COVID-19 pandemic conditions and the resulting lockdowns. This study investigates how a suite of novel quasi-real-time variables like Google search terms, pedestrian traffic, the prevailing incidence levels of influenza, as well as the COVID-19 Alert Level indicators can both generally improve the forecasting models of patient flows and effectively adapt the models to the unfolding disruptions of pandemic conditions. This research also uniquely contributes to the body of work in this domain by employing tools from the eXplainable AI field to investigate more deeply the internal mechanics of the models than has previously been done. The Voting ensemble-based method combining machine learning and statistical techniques was the most reliable in our experiments. Our study showed that the prevailing COVID-19 Alert Level feature together with Google search terms and pedestrian traffic were effective at producing generalisable forecasts. The implications of this study are that proxy variables can effectively augment standard autoregressive features to ensure accurate forecasting of patient flows. The experiments showed that the proposed features are potentially effective model inputs for preserving forecast accuracies in the event of future pandemic outbreaks.
Collapse
Affiliation(s)
- Teo Susnjak
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Paula Maddigan
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| |
Collapse
|
4
|
Bunker R, Susnjak T. The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review. J ARTIF INTELL RES 2022. [DOI: 10.1613/jair.1.13509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning.
Collapse
|
5
|
Susnjak T, Ramaswami GS, Mathrani A. Learning analytics dashboard: a tool for providing actionable insights to learners. Int J Educ Technol High Educ 2022; 19:12. [PMID: 35194560 PMCID: PMC8853217 DOI: 10.1186/s41239-021-00313-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
This study investigates current approaches to learning analytics (LA) dashboarding while highlighting challenges faced by education providers in their operationalization. We analyze recent dashboards for their ability to provide actionable insights which promote informed responses by learners in making adjustments to their learning habits. Our study finds that most LA dashboards merely employ surface-level descriptive analytics, while only few go beyond and use predictive analytics. In response to the identified gaps in recently published dashboards, we propose a state-of-the-art dashboard that not only leverages descriptive analytics components, but also integrates machine learning in a way that enables both predictive and prescriptive analytics. We demonstrate how emerging analytics tools can be used in order to enable learners to adequately interpret the predictive model behavior, and more specifically to understand how a predictive model arrives at a given prediction. We highlight how these capabilities build trust and satisfy emerging regulatory requirements surrounding predictive analytics. Additionally, we show how data-driven prescriptive analytics can be deployed within dashboards in order to provide concrete advice to the learners, and thereby increase the likelihood of triggering behavioral changes. Our proposed dashboard is the first of its kind in terms of breadth of analytics that it integrates, and is currently deployed for trials at a higher education institution.
Collapse
Affiliation(s)
- Teo Susnjak
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | | | - Anuradha Mathrani
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| |
Collapse
|
6
|
Wanniarachchi VU, Scogings C, Susnjak T, Mathrani A. Fat stigma and body objectification: A text analysis approach using social media content. Digit Health 2022; 8:20552076221117404. [PMID: 35990109 PMCID: PMC9386857 DOI: 10.1177/20552076221117404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 07/15/2022] [Indexed: 11/21/2022] Open
Abstract
This study investigates how female and male genders are positioned in fat stigmatising discourses that are being conducted over social media. Weight-based linguistic data corpus, extracted from three popular social media (SM) outlets, Twitter, YouTube and Reddit, was examined for fat stigmatising content. A mixed-method analysis comprising sentiment analysis, word co-occurrences and qualitative analysis, assisted our investigation of the corpus for body objectification themes and gender-based differences. Objectification theory provided the underlying framework to examine the experiential consequences of being fat across both genders. Five objectifying themes, namely, attractiveness, physical appearance, lifestyle choices, health and psychological well-being, emerged from the analysis. A deeper investigation into more facets of the social interaction data revealed overall positive and negative attitudes towards obesity, which informed on existing notions of gendered body objectification and weight/fat stigmatisation. Our findings have provided a holistic outlook on weight/fat stigmatising content that is posted online which can further inform policymakers in planning suitable props to facilitate more inclusive SM spaces. This study showcases how lexical analytics can be conducted by combining a variety of data mining methods to draw out insightful subject-related themes that add to the existing knowledge base; therefore, has both practical and theoretical implications.
Collapse
Affiliation(s)
| | - Chris Scogings
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Teo Susnjak
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Anuradha Mathrani
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| |
Collapse
|
7
|
Mathrani A, Susnjak T, Ramaswami G, Barczak A. Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open 2021. [DOI: 10.1016/j.caeo.2021.100060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
8
|
Ramaswami G, Susnjak T, Mathrani A, Lim J, Garcia P. Using educational data mining techniques to increase the prediction accuracy of student academic performance. ILS 2019. [DOI: 10.1108/ils-03-2019-0017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
This paper aims to evaluate educational data mining methods to increase the predictive accuracy of student academic performance for a university course setting. Student engagement data collected in real time and over self-paced activities assisted this investigation.
Design/methodology/approach
Classification data mining techniques have been adapted to predict students’ academic performance. Four algorithms, Naïve Bayes, Logistic Regression, k-Nearest Neighbour and Random Forest, were used to generate predictive models. Process mining features have also been integrated to determine their effectiveness in improving the accuracy of predictions.
Findings
The results show that when general features derived from student activities are combined with process mining features, there is some improvement in the accuracy of the predictions. Of the four algorithms, the study finds Random Forest to be more accurate than the other three algorithms in a statistically significant way. The validation of the best-known classifier model is then tested by predicting students’ final-year academic performance for the subsequent year.
Research limitations/implications
The present study was limited to datasets gathered over one semester and for one course. The outcomes would be more promising if the dataset comprised more courses. Moreover, the addition of demographic information could have provided further representations of students’ performance. Future work will address some of these limitations.
Originality/value
The model developed from this research can provide value to institutions in making process- and data-driven predictions on students’ academic performances.
Collapse
|
9
|
Suriadi S, Susnjak T, M. Ponder-Sutton A, A. Watters P, Schumacher C. Using Data-Driven and Process Mining Techniques for Identifying and Characterizing Problem Gamblers in New Zealand. CSIMQ 2016. [DOI: 10.7250/csimq.2016-9.03] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
10
|
Reyes NH, Barczak AL, Susnjak T, Sinčák P, Vaščák J. Real-Time Fuzzy Logic-Based Hybrid Robot Path-Planning Strategies for a Dynamic Environment. Robotics 2013. [DOI: 10.4018/978-1-4666-4607-0.ch076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
This chapter sets out to explore the intricacies behind developing a hybrid system for real-time autonomous robot navigation, with target pursuit and obstacle avoidance behaviour, in a dynamic environment. Three complete systems are described, namely, a cascade of four fuzzy systems, a hybrid fuzzy A* system, and a hybrid fuzzy A* with a Voronoi diagram. A highly reconfigurable integration architecture is presented, allowing for the harmonious interplay between the different component algorithms, with the option of engaging or disengaging from the system. The utilization of both global and local information about the environment is examined, as well as an additional optimal global path-planning layer. Moreover, how a fuzzy system design approach could take advantage of the presence of symmetry in the input space, cutting down the number of rules and membership functions, without sacrificing control precision is illustrated. The efficiency of all the algorithms is demonstrated by employing them in a simulation of a real-world system: the robot soccer game. Results indicate that the hybrid system can generate smooth, near-shortest paths, as well as near-shortest-safest paths, when all component algorithms are activated. A systematic approach to calibrating the system is also provided.
Collapse
Affiliation(s)
| | | | | | | | - Ján Vaščák
- Technical University of Košice, Slovakia
| |
Collapse
|