1
|
Le-Khac UN, Bolton M, Boxall NJ, Wallace SMN, George Y. Living review framework for better policy design and management of hazardous waste in Australia. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171556. [PMID: 38458450 DOI: 10.1016/j.scitotenv.2024.171556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/25/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
The significant increase in hazardous waste generation in Australia has led to the discussion over the incorporation of artificial intelligence into the hazardous waste management system. Recent studies explored the potential applications of artificial intelligence in various processes of managing waste. However, no study has examined the use of text mining in the hazardous waste management sector for the purpose of informing policymakers. This study developed a living review framework which applied supervised text classification and text mining techniques to extract knowledge using the domain literature data between 2022 and 2023. The framework employed statistical classification models trained using iterative training and the best model XGBoost achieved an F1 score of 0.87. Using a small set of 126 manually labelled global articles, XGBoost automatically predicted the labels of 678 Australian articles with high confidence. Then, keyword extraction and unsupervised topic modelling with Latent Dirichlet Allocation (LDA) were performed. Results indicated that there were 2 main research themes in Australian literature: (1) the key waste streams and (2) the resource recovery and recycling of waste. The implication of this framework would benefit the policymakers, researchers, and hazardous waste management organisations by serving as a real time guideline of the current key waste streams and research themes in the literature which allow robust knowledge to be applied to waste management and highlight where the gap in research remains.
Collapse
Affiliation(s)
- Uyen N Le-Khac
- Data Science and AI Department, Faculty of Information Technology, Monash University, Australia.
| | - Mitzi Bolton
- Monash Sustainable Development Institute, Monash University, Australia
| | - Naomi J Boxall
- Environment, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
| | - Stephanie M N Wallace
- Centre for Anthropogenic Pollution Impact and Management (CAPIM), School of BioSciences, University of Melbourne, Australia
| | - Yasmeen George
- Data Science and AI Department, Faculty of Information Technology, Monash University, Australia
| |
Collapse
|
2
|
Zhang D, Li J, Xie Y, Wulamu A. Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification. PLoS One 2023; 18:e0292582. [PMID: 37824464 PMCID: PMC10569603 DOI: 10.1371/journal.pone.0292582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/24/2023] [Indexed: 10/14/2023] Open
Abstract
Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of various additional experiments, and found that most of the classifiers improved in performance after proper preprocessing was applied. Our general conclusion is that the systematic use of preprocessing methods can have a positive impact on the classification of Chinese short text, using classification evaluation such as macro-F1, combination of preprocessing methods such as word segmentation, Chinese specific stop word and symbol removal, and classifier selection such as machine and deep learning models. We find that the best macro-f1s for categorizing text for the two datasets are 92.13% and 91.99%, which represent improvements of 0.3% and 2%, respectively over the compared baselines.
Collapse
Affiliation(s)
- Dezheng Zhang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian, Beijing, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, University of Science and Technology Beijing, Haidian, Beijing, China
| | - Jing Li
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian, Beijing, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, University of Science and Technology Beijing, Haidian, Beijing, China
| | - Yonghong Xie
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian, Beijing, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, University of Science and Technology Beijing, Haidian, Beijing, China
| | - Aziguli Wulamu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian, Beijing, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, University of Science and Technology Beijing, Haidian, Beijing, China
| |
Collapse
|
3
|
Guo Y, Zhou D, Ruan X, Cao J. Variational gated autoencoder-based feature extraction model for inferring disease-miRNA associations based on multiview features. Neural Netw 2023; 165:491-505. [PMID: 37336034 DOI: 10.1016/j.neunet.2023.05.052] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 05/19/2023] [Accepted: 05/28/2023] [Indexed: 06/21/2023]
Abstract
MicroRNAs (miRNA) play critical roles in diverse biological processes of diseases. Inferring potential disease-miRNA associations enable us to better understand the development and diagnosis of complex human diseases via computational algorithms. The work presents a variational gated autoencoder-based feature extraction model to extract complex contextual features for inferring potential disease-miRNA associations. Specifically, our model fuses three different similarities of miRNAs into a comprehensive miRNA network and then combines two various similarities of diseases into a comprehensive disease network, respectively. Then, a novel graph autoencoder is designed to extract multilevel representations based on variational gate mechanisms from heterogeneous networks of miRNAs and diseases. Finally, a gate-based association predictor is devised to combine multiscale representations of miRNAs and diseases via a novel contrastive cross-entropy function, and then infer disease-miRNA associations. Experimental results indicate that our proposed model achieves remarkable association prediction performance, proving the efficacy of the variational gate mechanism and contrastive cross-entropy loss for inferring disease-miRNA associations.
Collapse
Affiliation(s)
- Yanbu Guo
- College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China.
| | - Dongming Zhou
- School of Information Science and Engineering, Yunnan University, Kunming 650500, China.
| | - Xiaoli Ruan
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
| | - Jinde Cao
- School of Mathematics, Southeast University, Nanjing 211189, China; Yonsei Frontier Lab, Yonsei University, Seoul 03722, South Korea.
| |
Collapse
|
4
|
Liang W, Chen X, Huang S, Xiong G, Yan K, Zhou X. Federal learning edge network based sentiment analysis combating global COVID-19. COMPUTER COMMUNICATIONS 2023; 204:33-42. [PMID: 36970130 PMCID: PMC10030440 DOI: 10.1016/j.comcom.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 01/01/2023] [Accepted: 03/07/2023] [Indexed: 06/18/2023]
Abstract
As one of the important research topics in the field of natural language processing, sentiment analysis aims to analyze web data related to COVID-19, e.g., supporting China government agencies combating COVID-19. There are popular sentiment analysis models based on deep learning techniques, but their performance is limited by the size and distribution of the dataset. In this study, we propose a model based on a federal learning framework with Bert and multi-scale convolutional neural network (Fed_BERT_MSCNN), which contains a Bidirectional Encoder Representations from Transformer modules and a multi-scale convolution layer. The federal learning framework contains a central server and local deep learning machines that train local datasets. Parameter communications were processed through edge networks. The weighted average of each participant's model parameters was communicated in the edge network for final utilization. The proposed federal network not only solves the problem of insufficient data, but also ensures the data privacy of the social platform during the training process and improve the communication efficiency. In the experiment, we used datasets of six social platforms, and used accuracy and F1-score as evaluation criteria to conduct comparative studies. The performance of the proposed Fed_BERT_MSCNN model was generally superior than the existing models in the literature.
Collapse
Affiliation(s)
- Wei Liang
- Business School, Central South University, Changsha, 410083, China
- Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Changsha, 410205, China
| | - Xiaohong Chen
- Business School, Central South University, Changsha, 410083, China
- Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Changsha, 410205, China
| | - Suzhen Huang
- Big Data Institute, Central South University, Changsha, 410083, China
| | - Guanghao Xiong
- College of Information Engineering, China Jiliang University, Hangzhou, 310018, China
| | - Ke Yan
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, Singapore 117566, Singapore
| | - Xiaokang Zhou
- Faculty of Data Science, Shiga University, Hikone, 5228522, Japan
- RIKEN Center for Advanced Intelligence Project, RIKEN, Tokyo, 1030027, Japan
| |
Collapse
|
5
|
Ai W, Wang Z, Shao H, Meng T, Li K. A multi-semantic passing framework for semi-supervised long text classification. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04556-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
6
|
A multi-view method of scientific paper classification via heterogeneous graph embeddings. Scientometrics 2022. [DOI: 10.1007/s11192-022-04419-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
7
|
Jia S, Jiang S, Zhang S, Xu M, Jia X. Graph-in-Graph Convolutional Network for Hyperspectral Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:1157-1171. [PMID: 35724277 DOI: 10.1109/tnnls.2022.3182715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the development of hyperspectral sensors, accessible hyperspectral images (HSIs) are increasing, and pixel-oriented classification has attracted much attention. Recently, graph convolutional networks (GCNs) have been proposed to process graph-structured data in non-Euclidean domains and have been employed in HSI classification. But most methods based on GCN are hard to sufficiently exploit information of ground objects due to feature aggregation. To solve this issue, in this article, we proposed a graph-in-graph (GiG) model and a related GiG convolutional network (GiGCN) for HSI classification from a superpixel viewpoint. The GiG representation covers information inside and outside superpixels, respectively, corresponding to the local and global characteristics of ground objects. Concretely, after segmenting HSI into disjoint superpixels, each one is converted to an internal graph. Meanwhile, an external graph is constructed according to the spatial adjacent relationships among superpixels. Significantly, each node in the external graph embeds a corresponding internal graph, forming the so-called GiG structure. Then, GiGCN composed of internal and External graph convolution (EGC) is designed to extract hierarchical features and integrate them into multiple scales, improving the discriminability of GiGCN. Ensemble learning is incorporated to further boost the robustness of GiGCN. It is worth noting that we are the first to propose the GiG framework from the superpixel point and the GiGCN scheme for HSI classification. Experiment results on four benchmark datasets demonstrate that our proposed method is effective and feasible for HSI classification with limited labeled samples. For study replication, the code developed for this study is available at https://github.com/ShuGuoJ/GiGCN.git.
Collapse
|
8
|
Dai K, Li X, Huang X, Ye Y. SentATN: learning sentence transferable embeddings for cross-domain sentiment classification. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03434-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Abstract
As the vital technology of natural language understanding, sentence representation reasoning technology mainly focuses on sentence representation methods and reasoning models. Although the performance has been improved, there are still some problems, such as incomplete sentence semantic expression, lack of depth of reasoning model, and lack of interpretability of the reasoning process. Given the reasoning model’s lack of reasoning depth and interpretability, a deep fusion matching network is designed in this paper, which mainly includes a coding layer, matching layer, dependency convolution layer, information aggregation layer, and inference prediction layer. Based on a deep matching network, the matching layer is improved. Furthermore, the heuristic matching algorithm replaces the bidirectional long-short memory neural network to simplify the interactive fusion. As a result, it improves the reasoning depth and reduces the complexity of the model; the dependency convolution layer uses the tree-type convolution network to extract the sentence structure information along with the sentence dependency tree structure, which improves the interpretability of the reasoning process. Finally, the performance of the model is verified on several datasets. The results show that the reasoning effect of the model is better than that of the shallow reasoning model, and the accuracy rate on the SNLI test set reaches 89.0%. At the same time, the semantic correlation analysis results show that the dependency convolution layer is beneficial in improving the interpretability of the reasoning process.
Collapse
|
10
|
A Study of Text Vectorization Method Combining Topic Model and Transfer Learning. Processes (Basel) 2022. [DOI: 10.3390/pr10020350] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
With the development of Internet cloud technology, the scale of data is expanding. Traditional processing methods find it difficult to deal with the problem of information extraction of big data. Therefore, it is necessary to use machine-learning-assisted intelligent processing to extract information from data in order to solve the optimization problem in complex systems. There are many forms of data storage. Among them, text data is an important data type that directly reflects semantic information. Text vectorization is an important concept in natural language processing tasks. Because text data can not be directly used for model parameter training, it is necessary to vectorize the original text data and make it numerical, and then the feature extraction operation can be carried out. The traditional text digitization method is often realized by constructing a bag of words, but the vector generated by this method can not reflect the semantic relationship between words, and it also easily causes the problems of data sparsity and dimension explosion. Therefore, this paper proposes a text vectorization method combining a topic model and transfer learning. Firstly, the topic model is selected to model the text data and extract its keywords, to grasp the main information of the text data. Then, with the help of the bidirectional encoder representations from transformers (BERT) model, which belongs to the pretrained model, model transfer learning is carried out to generate vectors, which are applied to the calculation of similarity between texts. By setting up a comparative experiment, this method is compared with the traditional vectorization method. The experimental results show that the vector generated by the topic-modeling- and transfer-learning-based text vectorization (TTTV) proposed in this paper can obtain better results when calculating the similarity between texts with the same topic, which means that it can more accurately judge whether the contents of the given two texts belong to the same topic.
Collapse
|
11
|
Bert-Enhanced Text Graph Neural Network for Classification. ENTROPY 2021; 23:e23111536. [PMID: 34828233 PMCID: PMC8624482 DOI: 10.3390/e23111536] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/14/2021] [Accepted: 11/17/2021] [Indexed: 11/25/2022]
Abstract
Text classification is a fundamental research direction, aims to assign tags to text units. Recently, graph neural networks (GNN) have exhibited some excellent properties in textual information processing. Furthermore, the pre-trained language model also realized promising effects in many tasks. However, many text processing methods cannot model a single text unit’s structure or ignore the semantic features. To solve these problems and comprehensively utilize the text’s structure information and semantic information, we propose a Bert-Enhanced text Graph Neural Network model (BEGNN). For each text, we construct a text graph separately according to the co-occurrence relationship of words and use GNN to extract text features. Moreover, we employ Bert to extract semantic features. The former part can take into account the structural information, and the latter can focus on modeling the semantic information. Finally, we interact and aggregate these two features of different granularity to get a more effective representation. Experiments on standard datasets demonstrate the effectiveness of BEGNN.
Collapse
|
12
|
Gaye B, Zhang D, Wulamu A. Sentiment classification for employees reviews using regression vector- stochastic gradient descent classifier (RV-SGDC). PeerJ Comput Sci 2021; 7:e712. [PMID: 34712795 PMCID: PMC8507482 DOI: 10.7717/peerj-cs.712] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 08/22/2021] [Indexed: 06/13/2023]
Abstract
The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees' demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive and negative results. In this study, we perform experiments for the world's big six companies and classify their employees' reviews based on their sentiments. For this, we proposed an approach using lexicon-based and machine learning based techniques. Firstly, we extracted the sentiments of employees from text reviews and labeled the dataset as positive and negative using TextBlob. Then we proposed a hybrid/voting model named Regression Vector-Stochastic Gradient Descent Classifier (RV-SGDC) for sentiment classification. RV-SGDC is a combination of logistic regression, support vector machines, and stochastic gradient descent. We combined these models under a majority voting criteria. We also used other machine learning models in the performance comparison of RV-SGDC. Further, three feature extraction techniques: term frequency-inverse document frequency (TF-IDF), bag of words, and global vectors are used to train learning models. We evaluated the performance of all models in terms of accuracy, precision, recall, and F1 score. The results revealed that RV-SGDC outperforms with a 0.97 accuracy score using the TF-IDF feature due to its hybrid architecture.
Collapse
Affiliation(s)
- Babacar Gaye
- School of Computer and Communication Engineering, University of Science and Technology, Beijing, China
| | - Dezheng Zhang
- School of Computer and Communication Engineering, University of Science and Technology, Beijing, China
| | - Aziguli Wulamu
- School of Computer and Communication Engineering, University of Science and Technology, Beijing, China
| |
Collapse
|