126
|
Customer decision-making analysis based on big social data using machine learning: a case study of hotels in Mecca. Neural Comput Appl 2023; 35:4701-4722. [PMID: 36340596 PMCID: PMC9616417 DOI: 10.1007/s00521-022-07992-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 10/21/2022] [Indexed: 02/01/2023]
Abstract
Big social data and user-generated content have emerged as important sources of timely and rich knowledge to detect customers' behavioral patterns. Revealing customer satisfaction through the use of user-generated content has been a significant issue in business, especially in the tourism and hospitality context. There have been many studies on customer satisfaction that take quantitative survey approaches. However, revealing customer satisfaction using big social data in the form of eWOM (electronic word of mouth) can be an effective way to better understand customers' demands. In this study, we aim to develop a hybrid methodology based on supervised learning, text mining, and segmentation machine learning approaches to analyze big social data on travelers' decision-making regarding hotels in Mecca, Saudi Arabia. To do so, we use support vector regression with sequential minimal optimization (SMO), latent Dirichlet allocation (LDA), and k-means approaches to develop the hybrid method. We collect data from travelers' online reviews of Mecca hotels on TripAdvisor. The data are segmented, and travelers' satisfaction is revealed for each segment based on their online reviews of hotels. The results show that the method is effective for big social data analysis and traveler segmentation in Mecca hotels. The results are discussed, and several recommendations and strategies for hotel managers are provided to enhance their service quality and improve customer satisfaction.
Collapse
|
127
|
A survey on the use of association rules mining techniques in textual social media. Artif Intell Rev 2023; 56:1175-1200. [PMID: 35578652 PMCID: PMC9096767 DOI: 10.1007/s10462-022-10196-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The incursion of social media in our lives has been much accentuated in the last decade. This has led to a multiplication of data mining tools aimed at obtaining knowledge from these data sources. One of the greatest challenges in this area is to be able to obtain this knowledge without the need for training processes, which requires structured information and pre-labelled datasets. This is where unsupervised data mining techniques come in. These techniques can obtain value from these unstructured and unlabelled data, providing very interesting solutions to enhance the decision-making process. In this paper, we first address the problem of social media mining, as well as the need for unsupervised techniques, in particular association rules, for its treatment. We follow with a broad overview of the applications of association rules in the domain of social media mining, specifically, their application to the problems of mining textual entities, such as tweets. We also focus on the strengths and weaknesses of using association rules for solving different tasks in textual social media. Finally, the paper provides a perspective overview of the challenges that association rules must face in the next decade within the field of social media mining.
Collapse
|
128
|
Mozafarinia M, Rajabiyazdi F, Brouillette MJ, Fellows LK, Knäuper B, Mayo NE. Effectiveness of a personalized health profile on specificity of self-management goals among people living with HIV in Canada: findings from a blinded pragmatic randomized controlled trial. Qual Life Res 2023; 32:413-424. [PMID: 36088501 PMCID: PMC9464055 DOI: 10.1007/s11136-022-03245-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2022] [Indexed: 11/24/2022]
Abstract
PURPOSE To estimate among people living with chronic HIV, to what extent providing feedback on their health outcomes will affect the number and specificity of patient-formulated self-management goals. METHODS A personalized feedback profile was produced for individuals enrolled in a Canadian HIV Brain Health Now study. Goal specificity was measured by total number of specific words (matched to a domain-specific developed lexicon) per person-words using text mining techniques. RESULTS Of 176 participants enrolled and randomly assigned to feedback and control groups, 110 responses were received. The average number of goals was similar for both groups (3.7 vs 3.9). The number of specific words used in the goals formulated by the feedback and control group were 642 and 739, respectively. Specific nouns and actionable verbs were present to some extent and "measurable" and "time-bound" words were mainly missing. Negative binomial regression showed no difference in goal specificity among groups (RR = 0.93, 95% CI 0.78-1.10). Goals set by both groups overlapped in 8 areas and had little difference in rank. CONCLUSION Personalized feedback profile did not help with formulation of high-quality goals. Text mining has the potential to help with difficulties of goal evaluation outside of the face-to-face setting. With more data and use of learning models automated answers could be generated to provide a more dynamic platform.
Collapse
|
129
|
Auzoux S, Ngaba B, Christina M, Heuclin B, Roche M. Experimental variables in sugarcane intercropping in Reunion Island for data matching. Data Brief 2022; 46:108869. [PMID: 36691558 PMCID: PMC9860465 DOI: 10.1016/j.dib.2022.108869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/17/2022] [Accepted: 12/27/2022] [Indexed: 01/01/2023] Open
Abstract
This study aimed to link experimental data dealing with complex agroecological systems. For sharing and linking collected data with the generic AEGIS (Agro-Ecological Global Information System) database, our work described in this data paper consists in mapping researcher variables to the AEGIS dictionary variable for different tropical crops (sugarcane, rice, sorghum or cover crops). Additionally, this data paper presents a study case based on sugarcane intercropping systems for evaluating 3 matching measures of variables.
Collapse
|
130
|
Németh R. A scoping review on the use of natural language processing in research on political polarization: trends and research prospects. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE 2022; 6:289-313. [PMID: 36568020 PMCID: PMC9762668 DOI: 10.1007/s42001-022-00196-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 11/29/2022] [Indexed: 05/05/2023]
Abstract
As part of the "text-as-data" movement, Natural Language Processing (NLP) provides a computational way to examine political polarization. We conducted a methodological scoping review of studies published since 2010 (n = 154) to clarify how NLP research has conceptualized and measured political polarization, and to characterize the degree of integration of the two different research paradigms that meet in this research area. We identified biases toward US context (59%), Twitter data (43%) and machine learning approach (33%). Research covers different layers of the political public sphere (politicians, experts, media, or the lay public), however, very few studies involved more than one layer. Results indicate that only a few studies made use of domain knowledge and a high proportion of the studies were not interdisciplinary. Those studies that made efforts to interpret the results demonstrated that the characteristics of political texts depend not only on the political position of their authors, but also on other often-overlooked factors. Ignoring these factors may lead to overly optimistic performance measures. Also, spurious results may be obtained when causal relations are inferred from textual data. Our paper provides arguments for the integration of explanatory and predictive modeling paradigms, and for a more interdisciplinary approach to polarization research. Supplementary Information The online version contains supplementary material available at 10.1007/s42001-022-00196-2.
Collapse
|
131
|
Shtar G, Greenstein-Messica A, Mazuz E, Rokach L, Shapira B. Predicting drug characteristics using biomedical text embedding. BMC Bioinformatics 2022; 23:526. [PMID: 36476573 PMCID: PMC9730627 DOI: 10.1186/s12859-022-05083-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Drug-drug interactions (DDIs) are preventable causes of medical injuries and often result in doctor and emergency room visits. Previous research demonstrates the effectiveness of using matrix completion approaches based on known drug interactions to predict unknown Drug-drug interactions. However, in the case of a new drug, where there is limited or no knowledge regarding the drug's existing interactions, such an approach is unsuitable, and other drug's preferences can be used to accurately predict new Drug-drug interactions. METHODS We propose adjacency biomedical text embedding (ABTE) to address this limitation by using a hybrid approach which combines known drugs' interactions and the drug's biomedical text embeddings to predict the DDIs of both new and well known drugs. RESULTS Our evaluation demonstrates the superiority of this approach compared to recently published DDI prediction models and matrix factorization-based approaches. Furthermore, we compared the use of different text embedding methods in ABTE, and found that the concept embedding approach, which involves biomedical information in the embedding process, provides the highest performance for this task. Additionally, we demonstrate the effectiveness of leveraging biomedical text embedding for additional drugs' biomedical prediction task by presenting text embedding's contribution to a multi-modal pregnancy drug safety classification. CONCLUSION Text and concept embeddings created by analyzing a domain-specific large-scale biomedical corpora can be used for predicting drug-related properties such as Drug-drug interactions and drug safety prediction. Prediction models based on the embeddings resulted in comparable results to hand-crafted features, however text embeddings do not require manual categorization or data collection and rely solely on the published literature.
Collapse
|
132
|
Yan H, Ma M, Wu Y, Fan H, Dong C. Overview and analysis of the text mining applications in the construction industry. Heliyon 2022; 8:e12088. [PMID: 36506381 PMCID: PMC9730136 DOI: 10.1016/j.heliyon.2022.e12088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 09/27/2022] [Accepted: 11/25/2022] [Indexed: 12/12/2022] Open
Abstract
The data generation in the construction industry has increased dramatically. The major portion of the data in the architecture, engineering and construction (AEC) domain are unstructured textual documents. Text mining (TM) has been introduced to the construction industry to extract underlying knowledge from unstructured data. However, few articles have comprehensively reviewed applications of TM in the AEC domain. Thus, this study adopts a qualitative-quantitative method to conduct a state-of-the-art survey on the articles related to applications of TM in the construction industry which published between the year of 2000 and 2021. VOSviewer software was applied to provide an overview of TM applications regarding to the publication trend, active countries and regions, productive authors, and co-occurrence of keywords perspectives. Eight prime application fields of TM were discussed and analyzed in detail. Five key challenges and three future directions have been proposed. This review can help the research community to grasp the state-of-the-art of TM applications in the construction industry and identify the directions of further research.
Collapse
|
133
|
Sharma C, Sakhuja S, Nijjer S. Recent trends of green human resource management: Text mining and network analysis. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:84916-84935. [PMID: 35790632 PMCID: PMC9255839 DOI: 10.1007/s11356-022-21471-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 06/10/2022] [Indexed: 06/15/2023]
Abstract
Issues of the environmental crisis are being addressed by researchers, government, and organizations alike. GHRM is one such field that is receiving lots of research focus since it is targeted at greening the firms and making them eco-friendly. This research reviews 317 articles from the Scopus database published on green human resource management (GHRM) from 2008 to 2021. The study applies text mining, latent semantic analysis (LSA), and network analysis to explore the trends in the research field in GHRM and establish the relationship between the quantitative and qualitative literature of GHRM. The study has been carried out using KNIME and VOSviewer tools. As a result, the research identifies five recent research trends in GHRM using K-mean clustering. Future researchers can work upon these identified trends to solve environmental issues, make the environment eco-friendly, and motivate firms to implement GHRM in their practices.
Collapse
|
134
|
José-García A, Sneyd A, Melro A, Ollagnier A, Tarling G, Zhang H, Stevenson M, Everson R, Arthur R. C3-IoC: A Career Guidance System for Assessing Student Skills using Machine Learning and Network Visualisation. INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION 2022; 33:1-28. [PMID: 36474618 PMCID: PMC9715283 DOI: 10.1007/s40593-022-00317-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2022] [Indexed: 12/05/2022]
Abstract
Artificial Intelligence in Education (AIED) has witnessed significant growth over the last twenty-five years, providing a wide range of technologies to support academic, institutional, and administrative services. More recently, AIED applications have been developed to prepare students for the workforce, providing career guidance services for higher education. However, this remains challenging, especially concerning the rapidly changing labour market in the IT sector. In this paper, we introduce an AI-based solution named C3-IoC (https://c3-ioc.co.uk), which intends to help students explore career paths in IT according to their level of education, skills and prior experience. The C3-IoC presents a novel similarity metric method for relating existing job roles to a range of technical and non-technical skills. This also allows the visualisation of a job role network, placing the student within communities of job roles. Using a unique knowledge base, user skill profiling, job role matching, and visualisation modules, the C3-IoC supports students in self-evaluating their skills and understanding how they relate to emerging IT jobs. Supplementary Information The online version contains supplementary material available at 10.1007/s40593-022-00317-y.
Collapse
|
135
|
Rabby G, Berka P. Multi-class classification of COVID-19 documents using machine learning algorithms. J Intell Inf Syst 2022; 60:571-591. [PMID: 36465147 PMCID: PMC9707112 DOI: 10.1007/s10844-022-00768-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 11/16/2022] [Accepted: 11/17/2022] [Indexed: 11/30/2022]
Abstract
In most biomedical research paper corpus, document classification is a crucial task. Even due to the global epidemic, it is a crucial task for researchers across a variety of fields to figure out the relevant scientific research papers accurately and quickly from a flood of biomedical research papers. It can also assist learners or researchers in assigning a research paper to an appropriate category and also help to find the relevant research paper within a very short time. A biomedical document classifier needs to be designed differently to go beyond a "general" text classifier because it's not dependent only on the text itself (i.e. on titles and abstracts) but can also utilize other information like entities extracted using some medical taxonomies or bibliometric data. The main objective of this research was to find out the type of information or features and representation method creates influence the biomedical document classification task. For this reason, we run several experiments on conventional text classification methods with different kinds of features extracted from the titles, abstracts, and bibliometric data. These procedures include data cleaning, feature engineering, and multi-class classification. Eleven different variants of input data tables were created and analyzed using ten machine learning algorithms. We also evaluate the data efficiency and interpretability of these models as essential features of any biomedical research paper classification system for handling specifically the COVID-19 related health crisis. Our major findings are that TF-IDF representations outperform the entity extraction methods and the abstract itself provides sufficient information for correct classification. Out of the used machine learning algorithms, the best performance over various forms of document representation was achieved by Random Forest and Neural Network (BERT). Our results lead to a concrete guideline for practitioners on biomedical document classification.
Collapse
|
136
|
Zheng X, Du H, Luo X, Tong F, Song W, Zhao D. BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework. BMC Bioinformatics 2022; 23:501. [PMID: 36418937 PMCID: PMC9682683 DOI: 10.1186/s12859-022-05051-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/10/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance.
Collapse
|
137
|
Comparisons of deep learning and machine learning while using text mining methods to identify suicide attempts of patients with mood disorders. J Affect Disord 2022; 317:107-113. [PMID: 36029873 DOI: 10.1016/j.jad.2022.08.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/05/2022] [Accepted: 08/20/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND Suicide attempt is one of the most severe consequences for patients with mood disorders. This study aimed to perform deep learning and machine learning while using text mining to identify patients with suicide attempts and to compare their effectiveness. METHODS A total of 13,100 patients with mood disorders were selected. Two traditional text mining methods, logistic regression and Support vector machine (SVM), and one deep learning model (Convolutional neural network, CNN) were adopted to perform overall analysis and gender-specific subgroup analysis of patients to identify suicide attempts. The classification effectiveness of these models was evaluated by accuracy, F1-value, precision, recall, and the area under Receiver operator characteristic curve (ROC). RESULTS CNN's results were greater than the other two for all indicators except recall which was slightly smaller than SVM in male subgroup analysis. The accuracy values of the CNN were 98.4 %, 98.2 %, and 98.5 % in the overall analysis and the subgroup analysis for males and females, respectively. The results of McNemar's test showed that CNN and SVM models' predictions were statistically different from the logistic regression model's predictions in the overall analysis and the subgroup analysis for females (P < 0.050). LIMITATIONS A fixed number of features were selected based on document frequency to train models; this was a single-site study. CONCLUSIONS CNN model was a better way to detect suicide attempts in patients with mood disorders prior to hospital admission, saving time and resources in recognizing high-risk patients and preventing suicide.
Collapse
|
138
|
Wu L, Chen S, Guo L, Shpyleva S, Harris K, Fahmi T, Flanigan T, Tong W, Xu J, Ren Z. Development of benchmark datasets for text mining and sentiment analysis to accelerate regulatory literature review. Regul Toxicol Pharmacol 2022; 137:105287. [PMID: 36372266 DOI: 10.1016/j.yrtph.2022.105287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 10/18/2022] [Accepted: 11/06/2022] [Indexed: 11/13/2022]
Abstract
In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process. In the past decade, AI has revolutionized the area of text mining as many deep learning approaches have been developed to search, annotate, and classify relevant documents. After the great advancement of AI algorithms, a lack of high-quality data instead of the algorithms has recently become the bottleneck of AI system development. Herein, we constructed two large benchmark datasets, Chlorine Efficacy dataset (CHE) and Chlorine Safety dataset (CHS), under a regulatory scenario that sought to assess the antiseptic efficacy and toxicity of chlorine. For each dataset, ∼10,000 scientific articles were initially collected, manually reviewed, and their relevance to the review task were labeled. To ensure high data quality, each paper was labeled by a consensus among multiple experienced reviewers. The overall relevance rate was 27.21% (2,663 of 9,788) for CHE and 7.50% (761 of 10,153) for CHS, respectively. Furthermore, the relevant articles were categorized into five subgroups based on the focus of their content. Next, we developed an attention-based classification language model using these two datasets. The proposed classification model yielded 0.857 and 0.908 of Area Under the Curve (AUC) for CHE and CHS dataset, respectively. This performance was significantly better than permutation test (p < 10E-9), demonstrating that the labeling processes were valid. To conclude, our datasets can be used as benchmark to develop AI systems, which can further facilitate the literature review process in regulatory science.
Collapse
|
139
|
Macanovic A. Text mining for social science - The state and the future of computational text analysis in sociology. SOCIAL SCIENCE RESEARCH 2022; 108:102784. [PMID: 36334929 DOI: 10.1016/j.ssresearch.2022.102784] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 08/05/2022] [Accepted: 08/10/2022] [Indexed: 06/16/2023]
Abstract
The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.
Collapse
|
140
|
Kühnel L, Fluck J. We are not ready yet: limitations of state-of-the-art disease named entity recognizers. J Biomed Semantics 2022; 13:26. [PMID: 36303237 DOI: 10.1186/s13326-022-00280-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 10/12/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. RESULTS Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. CONCLUSIONS We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.
Collapse
|
141
|
Wu B, Wang L, Zeng YR. Interpretable tourism demand forecasting with temporal fusion transformers amid COVID-19. APPL INTELL 2022; 53:14493-14514. [PMID: 36320610 PMCID: PMC9607734 DOI: 10.1007/s10489-022-04254-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2022] [Indexed: 11/03/2022]
Abstract
An innovative ADE-TFT interpretable tourism demand forecasting model was proposed to address the issue of the insufficient interpretability of existing tourism demand forecasting. This model effectively optimizes the parameters of the Temporal Fusion Transformer (TFT) using an adaptive differential evolution algorithm (ADE). TFT is a brand-new attention-based deep learning model that excels in prediction research by fusing high-performance prediction with time-dynamic interpretable analysis. The TFT model can produce explicable predictions of tourism demand, including attention analysis of time steps and the ranking of input factors' relevance. While doing so, this study adds something unique to the literature on tourism by using historical tourism volume, monthly new confirmed cases of travel destinations, and big data from travel forums and search engines to increase the precision of forecasting tourist volume during the COVID-19 pandemic. The mood of travelers and the many subjects they spoke about throughout off-season and peak travel periods were examined using a convolutional neural network model. In addition, a novel technique for choosing keywords from Google Trends was suggested. In other words, the Latent Dirichlet Allocation topic model was used to categorize the major travel-related subjects of forum postings, after which the most relevant search terms for each topic were determined. According to the findings, it is possible to estimate tourism demand during the COVID-19 pandemic by combining quantitative and emotion-based characteristics.
Collapse
|
142
|
Rezaeenour J, Ahmadi M, Jelodar H, Shahrooei R. Systematic review of content analysis algorithms based on deep neural networks. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:17879-17903. [PMID: 36313481 PMCID: PMC9589819 DOI: 10.1007/s11042-022-14043-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 07/12/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
Today according to social media, the internet, Etc. Data is rapidly produced and occupies a large space in systems that have resulted in enormous data warehouses; the progress in information technology has significantly increased the speed and ease of data flow.text mining is one of the most important methods for extracting a useful model through extracting and adapting knowledge from data sets. However, many studies have been conducted based on the usage of deep learning for text processing and text mining issues.The idea and method of text mining are one of the fields that seek to extract useful information from unstructured textual data that is used very today. Deep learning and machine learning techniques in classification and text mining and their type are discussed in this paper as well. Neural networks of various kinds, namely, ANN, RNN, CNN, and LSTM, are the subject of study to select the best technique. In this study, we conducted a Systematic Literature Review to extract and associate the algorithms and features that have been used in this area. Based on our search criteria, we retrieved 130 relevant studies from electronic databases between 1997 and 2021; we have selected 43 studies for further analysis using inclusion and exclusion criteria in Section 3.2. According to this study, hybrid LSTM is the most widely used deep learning algorithm in these studies, and SVM in machine learning method high accuracy in result shown.
Collapse
|
143
|
Ozyurt O. Empirical research of emerging trends and patterns across the flipped classroom studies using topic modeling. EDUCATION AND INFORMATION TECHNOLOGIES 2022; 28:4335-4362. [PMID: 36267482 PMCID: PMC9568954 DOI: 10.1007/s10639-022-11396-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/05/2022] [Indexed: 06/16/2023]
Abstract
This study presents topic modeling based bibliometric characteristics of the articles related to the flipped classroom. The corpus of the study consists of 2959 articles published in the Scopus database as of the end of 2021. In addition to the bibliometric characteristics of the field, research interests and trends were also revealed with the study, which was based on the topic modeling-based bibliometric analysis method. According to the results of the study, an increase in the number of publications has been observed since 2015. Nearly one-third of the studies are of United States origin. According to the findings of the topic analysis in which the research interests and trends in the studies were revealed, the articles in this field were gathered under 16 topics. Considering the number of publications of the topic, it was seen that the three most voluminous topics were "Performance and perception", "Nursing education" and "Effectiveness and motivation", respectively. It is thought that the results of the study will provide a general perspective to the researchers in this field and provide important outputs in the context of monitoring the issues that may become prominent in the future.
Collapse
|
144
|
Mostafa MM. A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research. QUALITY & QUANTITY 2022; 57:1-31. [PMID: 36249708 PMCID: PMC9549032 DOI: 10.1007/s11135-022-01548-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/26/2022] [Indexed: 12/02/2022]
Abstract
International Management is a vast and multidisciplinary research domain that is heavily influenced by several other disciplines, such as Economics, Organizational Theory and Strategic Management. Based on 28,973 research articles, this study aims to analyze the knowledge structure of the international management domain from 1920 to 2019. Using computational text-based topic modeling analysis, we trace the evolution of international management knowledge by examining the major academic topics/latent themes discussed in the field. The study also diachronically visualizes the variations in topic prevalence over time. Our methodology is akin to "inductive mapping" as it is neither biased by our position nor it is guided by assumptions related to the topics we expect to find. Results indicate the existence of a wide variety of important research foci in the domain of international management. These include, among others, strategic alliances formation, international entry modes, corporate social responsibility, cross-cultural consumer behavior, technological innovation and entrepreneurship. Results also show that some topics such as "financial risk and return on investment" and "corporate social responsibility" show a declining time trend, indicating that academic research focusing on such topics was more likely to be published early on and less so recently. On the other hand, other topics such as "Emerging (East) Asian nations" and "global mergers and acquisitions" show an increasing trend, indicating that more papers were published recently. Taken together, although our findings might reflect the breadth and depth of research in international management, they might also suggest that the bounds of this field are not well defined.
Collapse
|
145
|
Zhang L, Cai J, Xiao J, Ye Z. Identification of core genes and pathways between geriatric multimorbidity and renal insufficiency: potential therapeutic agents discovered using bioinformatics analysis. BMC Med Genomics 2022; 15:212. [PMID: 36209090 PMCID: PMC9548100 DOI: 10.1186/s12920-022-01370-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 09/21/2022] [Indexed: 12/03/2022] Open
Abstract
Background Geriatric people are prone to suffer from multiple chronic diseases, which can directly or indirectly affect renal function. Through bioinformatics analysis, this study aimed to identify key genes and pathways associated with renal insufficiency in patients with geriatric multimorbidity and explore potential drugs against renal insufficiency. Methods The text mining tool Pubmed2Ensembl was used to detect genes associated with the keywords including "Geriatric", "Multimorbidity" and "Renal insufficiency". The GeneCodis program was used to specify Gene Ontology (GO) biological process terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Protein–protein interaction (PPI) networks were constructed using STRING and visualized in Cytoscape. Module analysis was performed using CytoHubba and Molecular Complex Detection (MCODE) plugins. GO and KEGG analysis of gene modules was performed using the Database for Annotation, Visualization and Integrated Discover (DAVID) platform database. Genes clustered in salient modules were selected as core genes. Then, the functions and pathways of core genes were visualized using ClueGO and CluePedia. Finally, the drug-gene interaction database was used to explore drug-gene interactions of the core genes to identify drug candidates for renal insufficiency in patients with geriatric multimorbidity. Results Through text mining, 351 genes associated with "Geriatric", "Multimorbidity" and "Renal insufficiency" were identified. A PPI network consisting of 216 nodes and 1087 edges was constructed and CytoHubba was used to sequence the genes. Five gene modules were obtained by MCODE analysis. The 26 genes clustered in module1 were selected as core candidate genes primarily associated with renal insufficiency in patients with geriatric multimorbidity. The HIF-1, PI3K-Akt, MAPK, Rap1, and FoxO signaling pathways were enriched. We found that 21 of the 26 selected genes could be targeted by 34 existing drugs. Conclusion This study indicated that CST3, SERPINA1, FN1, PF4, IGF1, KNG1, IL6, VEGFA, ALB, TIMP1, TGFB1, HGF, SERPINE1, APOA1, APOB, FGF23, EGF, APOE, VWF, TF, CP, GAS6, APP, IGFBP3, P4HB, and SPP1 were key genes potentially involved with renal insufficiency in patients with geriatric multimorbidity. In addition, 34 drugs were identified as potential agents for the treatment and management of renal insufficiency.
Collapse
|
146
|
Dhar S, Bose I. Victim crisis communication strategy on digital media: A study of the COVID-19 pandemic. DECISION SUPPORT SYSTEMS 2022; 161:113830. [PMID: 35754943 PMCID: PMC9212564 DOI: 10.1016/j.dss.2022.113830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 02/11/2022] [Accepted: 06/12/2022] [Indexed: 06/07/2023]
Abstract
The COVID-19 pandemic and the lockdown bore a devastating impact on organizations across the globe. In this crisis, organizations belonged to the victim cluster, with a low crisis responsibility. Nevertheless, organizations needed to strategize their crisis responses and communicate with stakeholders to reduce the threat to reputational capital and manage stakeholder reactions in the pandemic. In this paper, we studied organizational Twitter communication during the COVID-19 crisis through the lens of the situational crisis communication theory (SCCT). We analyzed 325,627 tweets collected from the Twitter pages of 464 organizations belonging to the Fortune 500 list. The Twitter data reflected organizational COVID-19 crisis response strategies and demonstrated organizational use of Twitter for crisis communication. We applied lexicon-based emotion mining to identify and measure emotions, and topic mining to measure crisis response topic scores from this large multi-organization dataset. We performed path analysis to test our research model derived from the SCCT. The analysis showed that instructing and adjusting information can minimize threats to organizational reputation in a victim crisis and manage stakeholder reactions. Positive emotions showed a stronger association with behavioral outcomes. Emotion neutral tweets generated more favorable stakeholder reactions. The paper contributes to the literature on situational crisis communication for a victim crisis. The multi-organization data addresses the sensitive inter-organization dependencies and improves the understanding of crisis communication. It provides practitioners an insight into the effect of the COVID-19 crisis response strategies on stakeholder emotions and behavior.
Collapse
|
147
|
Sutoyo R, Achmad S, Chowanda A, Andangsari EW, Isa SM. PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks. Data Brief 2022; 44:108554. [PMID: 36091473 PMCID: PMC9459421 DOI: 10.1016/j.dib.2022.108554] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/16/2022] [Accepted: 08/19/2022] [Indexed: 11/22/2022] Open
Abstract
Recognizing emotions is vital in communication. Emotions convey additional meanings to the communication process. Nowadays, people can communicate their emotions on many platforms; one is the product review. Product reviews in the online platform are an important element that affects customers' buying decisions. Hence, it is essential to recognize emotions from the product reviews. Emotions recognition from the product reviews can be done automatically using a machine or deep learning algorithm. Dataset can be considered as the fuel to model the recognizer. However, only a limited dataset exists in recognizing emotions from the product reviews, particularly in a local language. This research contributes to the dataset collection of 5400 product reviews in Indonesian. It was carefully curated from various (29) product categories, annotated with five emotions, and verified by an expert in clinical psychology. The dataset supports an innovative process to build automatic emotion classification on product reviews.
Collapse
|
148
|
Marengo D, Hoeboer CM, Veldkamp BP, Olff M. Text mining to improve screening for trauma-related symptoms in a global sample. Psychiatry Res 2022; 316:114753. [PMID: 35940089 DOI: 10.1016/j.psychres.2022.114753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 07/22/2022] [Accepted: 07/27/2022] [Indexed: 11/16/2022]
Abstract
Previous studies showed that textual information could be used to screen respondents for posttraumatic stress disorder (PTSD). In this study, we explored the feasibility of using language features extracted from short text descriptions respondents provided of stressful events to predict trauma-related symptoms assessed using the Global Psychotrauma Screen. Texts were analyzed with both closed- and open-vocabulary methods to extract language features representing the occurrence of words, phrases, or specific topics in the description of stressful events. We also evaluated whether combining language features with self-report information, including respondents' demographics, event characteristics, and risk factors for trauma-related disorders, would improve the prediction performance. Data were collected using an online survey on a cross-national sample of 5048 respondents. Results showed that language data achieved the highest predictive power when both closed- and open-vocabulary features were included as predictors. Combining language data and self-report information resulted in a significant increase in performance and in a model which achieved good accuracy as a screener for probable PTSD diagnosis (.7 < AUC ≤ .8), with similar results regardless of the length of the text description of the event. Overall, results indicated that short texts add to the detection of trauma-related symptoms and probable PTSD diagnosis.
Collapse
|
149
|
McLaughlin JE, Lyons K, Lupton-Smith C, Fuller K. An introduction to text analytics for educators. CURRENTS IN PHARMACY TEACHING & LEARNING 2022; 14:1319-1325. [PMID: 36280557 PMCID: PMC9904956 DOI: 10.1016/j.cptl.2022.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 07/23/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
OUR SITUATION Educators often find themselves in possession of large amounts of text-based materials, such as student reflections, narrative feedback, and assignments. While these materials can provide critical insight into topics of interest, they also require a substantial amount of time to read, interpret, and use. The purpose of this article is to describe and provide recommendations for text analytics. METHODOLOGICAL LITERATURE REVIEW An overview of text analytics is provided, including a brief history, common types of contemporary techniques, and the basic phases of text analytics. Several examples of common text analytics techniques are used to illustrate this approach. OUR RECOMMENDATIONS AND THEIR APPLICATIONS Practical recommendations are provided to support the use of text analytics in pharmacy education. These recommendations include: (1) clarify the purpose of the text analytics; (2) ensure the research questions are relevant and grounded in the literature; (3) develop a processing strategy and create a dictionary; (4) explore various tools for analysis and visualization; (5) establish tolerance for error; (6) train, calibrate, and validate the analytic strategy; and (7) collaborate and equip yourself. POTENTIAL IMPACT Text analytics provide a systematic approach to generating information from text-based materials. Several benefits to this approach are apparent, such as improving the efficiency of analyzing text and elucidating new knowledge. Despite recent developments in text analytics techniques, limitations to this approach remain. Efforts to improve usability and accessibility of text analytics remain ongoing, and pharmacy educators should position their work within the context of these limitations.
Collapse
|
150
|
Turenne N. Net activism and whistleblowing on YouTube: a text mining analysis. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:9201-9221. [PMID: 36193288 PMCID: PMC9520105 DOI: 10.1007/s11042-022-13777-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 08/04/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies.
Collapse
|