1
|
Ostonov A, Moshkov M. Comparative Analysis of Deterministic and Nondeterministic Decision Trees for Decision Tables from Closed Classes. ENTROPY (BASEL, SWITZERLAND) 2024; 26:519. [PMID: 38920528 PMCID: PMC11202716 DOI: 10.3390/e26060519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/14/2024] [Accepted: 06/15/2024] [Indexed: 06/27/2024]
Abstract
In this paper, we consider classes of decision tables with many-valued decisions closed under operations of the removal of columns, the changing of decisions, the permutation of columns, and the duplication of columns. We study relationships among three parameters of these tables: the complexity of a decision table (if we consider the depth of the decision trees, then the complexity of a decision table is the number of columns in it), the minimum complexity of a deterministic decision tree, and the minimum complexity of a nondeterministic decision tree. We consider the rough classification of functions characterizing relationships and enumerate all possible seven types of relationships.
Collapse
Affiliation(s)
- Azimkhon Ostonov
- Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia;
| | | |
Collapse
|
2
|
Yoo JW, Park J, Park H. Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types. Int J Inj Contr Saf Promot 2024; 31:203-215. [PMID: 38164519 DOI: 10.1080/17457300.2023.2300424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Construction workers face a high risk of various occupational accidents, many of which can result in fatalities. This study aims to develop a prediction model for nine prevalent types of construction accidents, utilizing construction tasks, activities, and tools/materials as input features, through the application of machine learning-based multi-class classification algorithms. 152,867 construction accident summary reports, composed of both structured (construction task, construction activity, accident type) and unstructured data (tools/materials) were used for the study. The study employed several data processing techniques, including keyword extraction through text mining, Boruta feature selection, and SMOTE data resampling enhance model accuracy. Three performance metrics (Multi-class area under the receiver operating characteristic curve (MAUC), Multi-class Matthews Correlation Coefficient (MMCC), Geometric-mean (G-mean)) were used to compare the predictive performance of four machine learning algorithms, including Decision tree, Random forest, Naïve bayes, and XGBoost. Of the four algorithms, XGBoost showed the highest performance in predicting accident type (MAUC: 0.8603, MMCC: 0.3523, G-mean: 0.5009). Furthermore, a Shapley additive explanation (SHAP) analysis was conducted to visualize feature importance. The findings of this study make a valuable contribution to improving construction safety by presenting a prediction model for accident types derived from real-world big data.
Collapse
Affiliation(s)
- Joon Woo Yoo
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| | - Junsung Park
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| | - Heejun Park
- Department of Industrial Engineering, Yonsei University, Seoul, South Korea
| |
Collapse
|
3
|
Li R, Gao L, Wu G, Dong J. Multiple marine algae identification based on three-dimensional fluorescence spectroscopy and multi-label convolutional neural network. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 311:123938. [PMID: 38330754 DOI: 10.1016/j.saa.2024.123938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/14/2023] [Accepted: 01/20/2024] [Indexed: 02/10/2024]
Abstract
Accurate identification of algal populations plays a pivotal role in monitoring seawater quality. Fluorescence-based techniques are effective tools for quickly identifying different algae. However, multiple coexisting algae and their similar photosynthetic pigments can constrain the efficacy of fluorescence methods. This study introduces a multi-label classification model that combines a specific Excitation-Emission matric convolutional neural network (EEM-CNN) with three-dimensional (3D) fluorescence spectroscopy to detect single and mixed algal samples. Spectral data can be input directly into the model without transforming into images. Rectangular convolutional kernels and double convolutional layers are applied to enhance the extraction of balanced and comprehensive spectral features for accurate classification. A dataset comprising 3D fluorescence spectra from eight distinct algae species representing six different algal classes was obtained, preprocessed, and augmented to create input data for the classification model. The classification model was trained and validated using 4448 sets of test samples and 60 sets of test samples, resulting in an accuracy of 0.883 and an F1 score of 0.925. This model exhibited the highest recognition accuracy in both single and mixed algae samples, outperforming comparative methods such as ML-kNN and N-PLS-DA. Furthermore, the classification results were extended to three different algae species and mixed samples of skeletonema costatum to assess the impact of spectral similarity on multi-label classification performance. The developed classification models demonstrated robust performance across samples with varying concentrations and growth stages, highlighting CNN's potential as a promising tool for the precise identification of marine algae.
Collapse
Affiliation(s)
- Ruizhuo Li
- Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Science, Xi'an 710119, China; College of Photoelectricity, University of Chinese Academy of Science, Beijing 100049, China
| | - Limin Gao
- Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Science, Xi'an 710119, China
| | - Guojun Wu
- Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Science, Xi'an 710119, China; Laoshan Laboratory, Qingdao 266237, Shandong, China.
| | - Jing Dong
- Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Science, Xi'an 710119, China; College of Photoelectricity, University of Chinese Academy of Science, Beijing 100049, China
| |
Collapse
|
4
|
Senoussi M, Artieres T, Villoutreix P. Partial label learning for automated classification of single-cell transcriptomic profiles. PLoS Comput Biol 2024; 20:e1012006. [PMID: 38578796 PMCID: PMC11023635 DOI: 10.1371/journal.pcbi.1012006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 04/17/2024] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.
Collapse
Affiliation(s)
- Malek Senoussi
- Aix Marseille Univ, Université de Toulon, CNRS, LIS, Turing Centre for Living Systems, Marseille, France
| | - Thierry Artieres
- Aix Marseille Univ, Université de Toulon, CNRS, LIS, Turing Centre for Living Systems, Marseille, France
- Ecole Centrale de Marseille, Marseille, France
| | - Paul Villoutreix
- Aix Marseille Univ, Université de Toulon, CNRS, LIS, Turing Centre for Living Systems, Marseille, France
- Aix-Marseille Université, MMG, Inserm U1251, Turing Centre for Living systems, Marseille, France
| |
Collapse
|
5
|
Zhai T, Wang H. Online Passive-Aggressive Multilabel Classification Algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10116-10129. [PMID: 35436199 DOI: 10.1109/tnnls.2022.3164906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Most existing multilabel classification methods are batch learning methods, which may suffer from expensive retraining costs when dealing with new incoming data. In order to overcome the drawbacks of batch learning, we develop a family of online multilabel classification algorithms, which can update the model instantly and efficiently, and make a timely online prediction when new data arrive. Our algorithms all take a closed-form update, which is obtained by solving a constrained optimization problem in each round of online learning. Label correlation is explicitly modeled in our optimization problem. The label thresholding function, an important component of our online classifier, can also be learned online. Our algorithms can be easily generalized to the nonlinear prediction cases using Mercer kernels. The worst case loss bounds for our algorithms are provided. The bounds are relative to the cumulative loss suffered by the best fixed predictive model that can be attained in hindsight. Finally, we corroborate the merits of our algorithms in both linear and nonlinear predictions on nine open multilabel benchmark datasets.
Collapse
|
6
|
Zalar P, Graf Hriberšek D, Gostinčar C, Breskvar M, Džeroski S, Matul M, Novak Babič M, Čremožnik Zupančič J, Kujović A, Gunde-Cimerman N, Kavkler K. Xerophilic fungi contaminating historically valuable easel paintings from Slovenia. Front Microbiol 2023; 14:1258670. [PMID: 38029120 PMCID: PMC10653331 DOI: 10.3389/fmicb.2023.1258670] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/06/2023] [Indexed: 12/01/2023] Open
Abstract
Historically valuable canvas paintings are often exposed to conditions enabling microbial deterioration. Painting materials, mainly of organic origin, in combination with high humidity and other environmental conditions, favor microbial metabolism and growth. These preconditions are often present during exhibitions or storage in old buildings, such as churches and castles, and also in museum storage depositories. The accumulated dust serves as an inoculum for both indoor and outdoor fungi. In our study, we present the results on cultivable fungi isolated from 24 canvas paintings, mainly exhibited in Slovenian sacral buildings, dating from the 16th to 21st centuries. Fungi were isolated from the front and back of damaged and undamaged surfaces of the paintings using culture media with high- and low-water activity. A total of 465 isolates were identified using current taxonomic DNA markers and assigned to 37 genera and 98 species. The most abundant genus was Aspergillus, represented by 32 species, of which 9 xerophilic species are for the first time mentioned in contaminated paintings. In addition to the most abundant xerophilic A. vitricola, A. destruens, A. tardicrescens, and A. magnivesiculatus, xerophilic Wallemia muriae and W. canadensis, xerotolerant Penicillium chrysogenum, P. brevicompactum, P. corylophilum, and xerotolerant Cladosporium species were most frequent. When machine learning methods were used to predict the relationship between fungal contamination, damage to the painting, and the type of material present, proteins were identified as one of the most important factors and cracked paint was identified as a hotspot for fungal growth. Aspergillus species colonize paintings regardless of materials, while Wallemia spp. can be associated with animal fat. Culture media with low-water activity are suggested in such inventories to isolate and obtain an overview of fungi that are actively contaminating paintings stored indoors at low relative humidity.
Collapse
Affiliation(s)
- Polona Zalar
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Daša Graf Hriberšek
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Cene Gostinčar
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Martin Breskvar
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
| | - Mojca Matul
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Monika Novak Babič
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Jerneja Čremožnik Zupančič
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Amela Kujović
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Nina Gunde-Cimerman
- Chair of Molecular Genetics and Biology of Microorganisms, Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Katja Kavkler
- Institute for the Protection of Cultural Heritage of Slovenia, Ljubljana, Slovenia
| |
Collapse
|
7
|
Blockeel H, Devos L, Frénay B, Nanfack G, Nijssen S. Decision trees: from efficient prediction to responsible AI. Front Artif Intell 2023; 6:1124553. [PMID: 37565044 PMCID: PMC10411911 DOI: 10.3389/frai.2023.1124553] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/10/2023] [Indexed: 08/12/2023] Open
Abstract
This article provides a birds-eye view on the role of decision trees in machine learning and data science over roughly four decades. It sketches the evolution of decision tree research over the years, describes the broader context in which the research is situated, and summarizes strengths and weaknesses of decision trees in this context. The main goal of the article is to clarify the broad relevance to machine learning and artificial intelligence, both practical and theoretical, that decision trees still have today.
Collapse
Affiliation(s)
- Hendrik Blockeel
- Department of Computer Science, KU Leuven, Leuven, Belgium
- Institute for Artificial Intelligence (Leuven.AI), KU Leuven, Leuven, Belgium
| | - Laurens Devos
- Department of Computer Science, KU Leuven, Leuven, Belgium
- Institute for Artificial Intelligence (Leuven.AI), KU Leuven, Leuven, Belgium
| | - Benoît Frénay
- Faculty of Computer Science, Université de Namur, Namur, Belgium
| | - Géraldin Nanfack
- Faculty of Computer Science, Université de Namur, Namur, Belgium
| | | |
Collapse
|
8
|
Azad M, Moshkov M. Applications of Depth Minimization of Decision Trees Containing Hypotheses for Multiple-Value Decision Tables. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25040547. [PMID: 37190335 PMCID: PMC10137443 DOI: 10.3390/e25040547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/28/2023] [Accepted: 03/07/2023] [Indexed: 05/17/2023]
Abstract
In this research, we consider decision trees that incorporate standard queries with one feature per query as well as hypotheses consisting of all features' values. These decision trees are used to represent knowledge and are comparable to those investigated in exact learning, in which membership queries and equivalence queries are used. As an application, we look into the issue of creating decision trees for two cases: the sorting of a sequence that contains equal elements and multiple-value decision tables which are modified from UCI Machine Learning Repository. We contrast the efficiency of several forms of optimal (considering the parameter depth) decision trees with hypotheses for the aforementioned applications. We also investigate the efficiency of decision trees built by dynamic programming and by an entropy-based greedy method. We discovered that the greedy algorithm produces very similar results compared to the results of dynamic programming algorithms. Therefore, since the dynamic programming algorithms take a long time, we may readily apply the greedy algorithms.
Collapse
Affiliation(s)
- Mohammad Azad
- College of Computer and Information Sciences, Jouf University, Sakaka 72441, Saudi Arabia
| | - Mikhail Moshkov
- Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
9
|
Huang W, Xiao T, Liu Q, Huang Z, Ma J, Chen E. HMNet: a hierarchical multi-modal network for educational video concept prediction. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01809-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
10
|
Hoshino Y, Utsumi Y, Matsuda Y, Tanaka Y, Nakata K. IPC prediction of patent documents using neural network with attention for hierarchical structure. PLoS One 2023; 18:e0282361. [PMID: 36862711 PMCID: PMC9980776 DOI: 10.1371/journal.pone.0282361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 02/13/2023] [Indexed: 03/03/2023] Open
Abstract
International patent classifications (IPCs) are assigned to patent documents; however, since the procedure for assigning classifications is manually done by the patent examiner, it takes a lot of time and effort to select some IPCs from about 70,000 IPCs. Hence, some research has been conducted on patent classification with machine learning. However, patent documents are very voluminous, and learning with all the claims (the part describing the content of the patent) as input would run out of the necessary memory, even if the batch size is set to a very small size. Therefore, most of the existing methods learn by excluding some information, such as using only the first claim as input. In this study, we propose a model that considers the contents of all claims by extracting important information for input. In addition, we focus on the hierarchical structure of the IPC, and propose a new decoder architecture to consider it. Finally, we conducted an experiment using actual patent data to verify the accuracy of the prediction. The results showed a significant improvement in accuracy compared to existing methods, and the actual applicability of the method was also discussed.
Collapse
Affiliation(s)
- Yuki Hoshino
- Tokyo Institute of Technology, Meguro City, Tokyo, Japan
- * E-mail:
| | | | | | | | | |
Collapse
|
11
|
Shafi J, Nawab RMA, Rayson P. Semantic Tagging for the Urdu Language: Annotated Corpus and Multi-Target Classification Methods. ACM T ASIAN LOW-RESO 2023. [DOI: 10.1145/3582496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Extracting and analysing meaning-related information from natural language data has attracted the attention of researchers in various fields, such as natural language processing, corpus linguistics, information retrieval, and data science. An important aspect of such automatic information extraction and analysis is the annotation of language data using semantic tagging tools. Different semantic tagging tools have been designed to carry out various levels of semantic analysis, for instance, named entity recognition and disambiguation, sentiment analysis, word sense disambiguation, content analysis, and semantic role labelling. Common to all of these tasks, in the supervised setting, is the requirement for a manually semantically annotated corpus, which acts as a knowledge base from which to train and test potential word and phrase-level sense annotations. Many benchmark corpora have been developed for various semantic tagging tasks, but most are for English and other European languages. There is a dearth of semantically annotated corpora for the Urdu language, which is widely spoken and used around the world. To fill this gap, this study presents a large benchmark corpus and methods for the semantic tagging task for the Urdu language. The proposed corpus contains 8,000 tokens in the following domains or genres: news, social media, Wikipedia, and historical text (each domain having 2K tokens). The corpus has been manually annotated with 21 major semantic fields and 232 sub-fields with the USAS (UCREL Semantic Analysis System) semantic taxonomy which provides a comprehensive set of semantic fields for coarse-grained annotation. Each word in our proposed corpus has been annotated with at least one and up to nine semantic field tags to provide a detailed semantic analysis of the language data, which allowed us to treat the problem of semantic tagging as a supervised multi-target classification task. To demonstrate how our proposed corpus can be used for the development and evaluation of Urdu semantic tagging methods, we extracted local, topical and semantic features from the proposed corpus and applied seven different supervised multi-target classifiers to them. Results show an accuracy of 94% on our proposed corpus which is free and publicly available to download.
Collapse
Affiliation(s)
- Jawad Shafi
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, and InfoLab21, Lancaster University, Lancaster, U.K
| | | | - Paul Rayson
- School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, U.K
| |
Collapse
|
12
|
Jin Y, Lu H, Zhu W, Huo W. Deep learning based classification of multi-label chest X-ray images via dual-weighted metric loss. Comput Biol Med 2023; 157:106683. [PMID: 36905869 DOI: 10.1016/j.compbiomed.2023.106683] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/17/2022] [Accepted: 11/06/2022] [Indexed: 02/17/2023]
Abstract
-Thoracic disease, like many other diseases, can lead to complications. Existing multi-label medical image learning problems typically include rich pathological information, such as images, attributes, and labels, which are crucial for supplementary clinical diagnosis. However, the majority of contemporary efforts exclusively focus on regression from input to binary labels, ignoring the relationship between visual features and semantic vectors of labels. In addition, there is an imbalance in data amount between diseases, which frequently causes intelligent diagnostic systems to make erroneous disease predictions. Therefore, we aim to improve the accuracy of the multi-label classification of chest X-ray images. Chest X-ray14 pictures were utilized as the multi-label dataset for the experiments in this study. By fine-tuning the ConvNeXt network, we got visual vectors, which we combined with semantic vectors encoded by BioBert to map the two different forms of features into a common metric space and made semantic vectors the prototype of each class in metric space. The metric relationship between images and labels is then considered from the image level and disease category level, respectively, and a new dual-weighted metric loss function is proposed. Finally, the average AUC score achieved in the experiment reached 0.826, and our model outperformed the comparison models.
Collapse
Affiliation(s)
- Yufei Jin
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Huijuan Lu
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Wenjie Zhu
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Wanli Huo
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| |
Collapse
|
13
|
Romero M, Nakano FK, Finke J, Rocha C, Vens C. Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification. Comput Biol Med 2023; 152:106423. [PMID: 36529023 DOI: 10.1016/j.compbiomed.2022.106423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/09/2022] [Accepted: 12/11/2022] [Indexed: 12/15/2022]
Abstract
With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.
Collapse
Affiliation(s)
- Miguel Romero
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Felipe Kenji Nakano
- Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium.
| | - Jorge Finke
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Camilo Rocha
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Celine Vens
- Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium.
| |
Collapse
|
14
|
Simonič M, Majcen Hrovat M, Džeroski S, Ude A, Nemec B. Determining Exception Context in Assembly Operations from Multimodal Data. SENSORS (BASEL, SWITZERLAND) 2022; 22:7962. [PMID: 36298313 PMCID: PMC9610822 DOI: 10.3390/s22207962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 10/03/2022] [Accepted: 10/14/2022] [Indexed: 06/16/2023]
Abstract
Robot assembly tasks can fail due to unpredictable errors and can only continue with the manual intervention of a human operator. Recently, we proposed an exception strategy learning framework based on statistical learning and context determination, which can successfully resolve such situations. This paper deals with context determination from multimodal data, which is the key component of our framework. We propose a novel approach to generate unified low-dimensional context descriptions based on image and force-torque data. For this purpose, we combine a state-of-the-art neural network model for image segmentation and contact point estimation using force-torque measurements. An ensemble of decision trees is used to combine features from the two modalities. To validate the proposed approach, we have collected datasets of deliberately induced insertion failures both for the classic peg-in-hole insertion task and for an industrially relevant task of car starter assembly. We demonstrate that the proposed approach generates reliable low-dimensional descriptors, suitable as queries necessary in statistical learning.
Collapse
Affiliation(s)
- Mihael Simonič
- Department of Automatics, Biocybernetics and Robotics, Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia
- Faculty of Electrical Engineering, University of Ljubljana, Tržaška Cesta 25, 1000 Ljubljana, Slovenia
| | - Matevž Majcen Hrovat
- Department of Automatics, Biocybernetics and Robotics, Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia
| | - Aleš Ude
- Department of Automatics, Biocybernetics and Robotics, Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia
- Faculty of Electrical Engineering, University of Ljubljana, Tržaška Cesta 25, 1000 Ljubljana, Slovenia
| | - Bojan Nemec
- Department of Automatics, Biocybernetics and Robotics, Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova Cesta 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
15
|
Improving Node Classification through Convolutional Networks Built on Enhanced Message-Passing Graph. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3999144. [PMID: 36188690 PMCID: PMC9525199 DOI: 10.1155/2022/3999144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/30/2022] [Indexed: 11/30/2022]
Abstract
Enhancing message propagation is critical for solving the problem of node classification in sparse graph with few labels. The recently popularized Graph Convolutional Network (GCN) lacks the ability to propagate messages effectively to distant nodes because of over-smoothing. Besides, the GCN with numerous trainable parameters suffers from overfitting when the labeled nodes are scarce. This article addresses the problem via building GCN on Enhanced Message-Passing Graph (EMPG). The key idea is that node classification can benefit from various variants of the input graph that can propagate messages more efficiently, based on the assumption that the structure of each variant is reasonable when more unlabeled nodes are labeled properly. Specifically, the proposed method first maps the nodes to a latent space through graph embedding that captures the structural information of the input graph. Considering the node attributes together, the proposed method constructs the EMPG by adding connections between the nodes in close proximity in the latent space. With the help of the added connections, the EMPG allows a node to propagate its message to the right nodes at long distances, so that the GCN built on the EMPG need not stack multiple layers. As a result, over-smoothing is avoided. However, dense connections may cause message propagation saturation and lead to overfitting. Seeing the EMPG as an accumulation of some potential variants of the original graph, the proposed method utilizes dropout to extract a group of variants from the EMPG and then builds multichannel GCNs on them. The multichannel features learned from different dropout EMPGs are aggregated to compute the final prediction jointly. The proposed method is flexible, as a brod range of GCNs can be incorporated easily. Additionally, it is efficient and robust. Experimental results demonstrate that the proposed method yields improvements in node classification.
Collapse
|
16
|
Hierarchical classification of pollinating flying insects under changing environments. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
17
|
Zhang H, Yang Y, Wang X, Gao H, Hu Q. MLI: A multi-level inference mechanism for user attributes in social networks. ACM T INFORM SYST 2022. [DOI: 10.1145/3545797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
In the social network, each user has attributes for self-description called user attributes which are semantically hierarchical. Attribute inference has become an essential way for social platforms to realize user classifications and targeted recommendations. Most existing approaches mainly focus on the flat inference problem neglecting the semantic hierarchy of user attributes which will cause serious inconsistency in multi-level tasks. In this article, we propose a multi-level model MLI, where information propagation part collects attribute information by mining the global graph structure, and the attribute correction part realizes the mutual correction between different levels of attributes. Further, we put forward the concept of generalized semantic tree, a way of representing the hierarchical structure of user attributes, whose nodes are allowed to have multiple parent nodes unlike the regular tree. Both regular and generalized semantic tree are commonly used in practice, and can be handled by our model. Besides, by making the inference start from sub-networks with sufficient attribute information, we design a “Ripple” algorithm to improve the efficiency and effectiveness of our model. For evaluation purposes, we conduct extensive verification experiments on DBLP datasets. The experimental results show the superior effect of MLI, compared with the state-of-the-art methods.
Collapse
Affiliation(s)
- Hang Zhang
- State Key Laboratory of Communication Content Cognition, People’s Daily Online, Beijing 100733, China. College of Intelligence and Computing, Tianjin University, China
| | - Yajun Yang
- State Key Laboratory of Communication Content Cognition, People’s Daily Online, Beijing 100733, China. College of Intelligence and Computing, Tianjin University, China
| | - Xin Wang
- College of Intelligence and Computing, Tianjin University, China
| | - Hong Gao
- Faculty of Computing, Harbin Institute of Technology, China
| | - Qinghua Hu
- College of Intelligence and Computing, Tianjin University, China
| |
Collapse
|
18
|
Petković M, Džeroski S, Kocev D. Feature ranking for semi-supervised learning. Mach Learn 2022. [DOI: 10.1007/s10994-022-06181-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractThe data used for analysis are becoming increasingly complex along several directions: high dimensionality, number of examples and availability of labels for the examples. This poses a variety of challenges for the existing machine learning methods, related to analyzing datasets with a large number of examples that are described in a high-dimensional space, where not all examples have labels provided. For example, when investigating the toxicity of chemical compounds, there are many compounds available that can be described with information-rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose methods for semi-supervised learning (SSL) of feature rankings. The feature rankings are learned in the context of classification and regression, as well as in the context of structured output prediction (multi-label classification, MLC, hierarchical multi-label classification, HMLC and multi-target regression, MTR) tasks. This is the first work that treats the task of feature ranking uniformly across various tasks of semi-supervised structured output prediction. To the best of our knowledge, it is also the first work on SSL of feature rankings for the tasks of HMLC and MTR. More specifically, we propose two approaches—based on predictive clustering tree ensembles and the Relief family of algorithms—and evaluate their performance across 38 benchmark datasets. The extensive evaluation reveals that rankings based on Random Forest ensembles perform the best for classification tasks (incl. MLC and HMLC tasks) and are the fastest for all tasks, while ensembles based on extremely randomized trees work best for the regression tasks. Semi-supervised feature rankings outperform their supervised counterparts across the majority of datasets for all of the different tasks, showing the benefit of using unlabeled in addition to labeled data.
Collapse
|
19
|
Hierarchical classification for account code suggestion. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
20
|
Species-Level Microfossil Prediction for Globotruncana genus Using Machine Learning Models. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-06822-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
21
|
He S, Lu Y, Li M. Probabilistic risk analysis for coal mine gas overrun based on FAHP and BN: a case study. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:28458-28468. [PMID: 34993806 DOI: 10.1007/s11356-021-18474-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 12/29/2021] [Indexed: 05/14/2023]
Abstract
To analyze the risk of gas overrun in coal mines and improve the risk analysis, a novel risk analysis method was proposed based on FAHP and Bayesian network. The risk analysis framework consisted of causal reasoning, logical reasoning, and sensitivity analysis. The gas overrun risk analysis was conducted by taking the Laohutai Coal Mine in China as the research object. Specifically, based on prior knowledge and sample data, the probability of the gas overrun was 3.2%, belonging to a small probability event. However, the probability of gas concentration exceeding 1% was 12%, and there was still potential danger. Logical reasoning diagnosed and identified that wind speed and air leakage were the direct causes of gas overrun. Sensitivity analysis indicated that wind speed, human error, and ground stress were key factors of the gas overrun. The case study showed this fuzzy analytic hierarchy process (FAHP)-Bayesian network (BN)-based risk analysis method can provide real-time and dynamic decision support for gas overrun control and treatment in coal mines to ensure the safe and efficient mining.
Collapse
Affiliation(s)
- Shan He
- School of Resource, Environment and Safety Engineering, Hunan University of Science and Technology, Taoyuan Road, Xiangtan, 411201, Hunan Province, China
| | - Yi Lu
- School of Resource, Environment and Safety Engineering, Hunan University of Science and Technology, Taoyuan Road, Xiangtan, 411201, Hunan Province, China
| | - Min Li
- School of Resource, Environment and Safety Engineering, Hunan University of Science and Technology, Taoyuan Road, Xiangtan, 411201, Hunan Province, China.
| |
Collapse
|
22
|
Tree-based dynamic classifier chains. Mach Learn 2022. [DOI: 10.1007/s10994-022-06162-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractClassifier chains are an effective technique for modeling label dependencies in multi-label classification. However, the method requires a fixed, static order of the labels. While in theory, any order is sufficient, in practice, this order has a substantial impact on the quality of the final prediction. Dynamic classifier chains denote the idea that for each instance to classify, the order in which the labels are predicted is dynamically chosen. The complexity of a naïve implementation of such an approach is prohibitive, because it would require to train a sequence of classifiers for every possible permutation of the labels. To tackle this problem efficiently, we propose a new approach based on random decision trees which can dynamically select the label ordering for each prediction. We show empirically that a dynamic selection of the next label improves over the use of a static ordering under an otherwise unchanged random decision tree model. In addition, we also demonstrate an alternative approach based on extreme gradient boosted trees, which allows for a more target-oriented training of dynamic classifier chains. Our results show that this variant outperforms random decision trees and other tree-based multi-label classification methods. More importantly, the dynamic selection strategy allows to considerably speed up training and prediction.
Collapse
|
23
|
Zou Y, Chou CA. A combinatorial optimization approach for multi-label associative classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.108088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
TCMPR: TCM Prescription Recommendation Based on Subnetwork Term Mapping and Deep Learning. BIOMED RESEARCH INTERNATIONAL 2022; 2022:4845726. [PMID: 35224094 PMCID: PMC8872682 DOI: 10.1155/2022/4845726] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 01/03/2022] [Indexed: 11/17/2022]
Abstract
Traditional Chinese medicine (TCM) has played an indispensable role in clinical diagnosis and treatment. Based on a patient’s symptom phenotypes, computation-based prescription recommendation methods can recommend personalized TCM prescription using machine learning and artificial intelligence technologies. However, owing to the complexity and individuation of a patient’s clinical phenotypes, current prescription recommendation methods cannot obtain good performance. Meanwhile, it is very difficult to conduct effective representation for unrecorded symptom terms in an existing knowledge base. In this study, we proposed a subnetwork-based symptom term mapping method (SSTM) and constructed a SSTM-based TCM prescription recommendation method (termed TCMPR). Our SSTM can extract the subnetwork structure between symptoms from a knowledge network to effectively represent the embedding features of clinical symptom terms (especially the unrecorded terms). The experimental results showed that our method performs better than state-of-the-art methods. In addition, the comprehensive experiments of TCMPR with different hyperparameters (i.e., feature embedding, feature dimension, subnetwork filter threshold, and feature fusion) demonstrate that our method has high performance on TCM prescription recommendation and potentially promote clinical diagnosis and treatment of TCM precision medicine.
Collapse
|
25
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
26
|
Belli E, Vantini S. Measure inducing classification and regression trees for functional data. Stat Anal Data Min 2021. [DOI: 10.1002/sam.11569] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Edoardo Belli
- MOX ‐ Department of Mathematics Politecnico di Milano Milano Italy
| | - Simone Vantini
- MOX ‐ Department of Mathematics Politecnico di Milano Milano Italy
| |
Collapse
|
27
|
Hamidi F, Gilani N, Belaghi RA, Sarbakhsh P, Edgünlü T, Santaguida P. Exploration of Potential miRNA Biomarkers and Prediction for Ovarian Cancer Using Artificial Intelligence. Front Genet 2021; 12:724785. [PMID: 34899827 PMCID: PMC8656459 DOI: 10.3389/fgene.2021.724785] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 10/07/2021] [Indexed: 12/20/2022] Open
Abstract
Ovarian cancer is the second most dangerous gynecologic cancer with a high mortality rate. The classification of gene expression data from high-dimensional and small-sample gene expression data is a challenging task. The discovery of miRNAs, a small non-coding RNA with 18–25 nucleotides in length that regulates gene expression, has revealed the existence of a new array for regulation of genes and has been reported as playing a serious role in cancer. By using LASSO and Elastic Net as embedded algorithms of feature selection techniques, the present study identified 10 miRNAs that were regulated in ovarian serum cancer samples compared to non-cancer samples in public available dataset GSE106817: hsa-miR-5100, hsa-miR-6800-5p, hsa-miR-1233-5p, hsa-miR-4532, hsa-miR-4783-3p, hsa-miR-4787-3p, hsa-miR-1228-5p, hsa-miR-1290, hsa-miR-3184-5p, and hsa-miR-320b. Further, we implemented state-of-the-art machine learning classifiers, such as logistic regression, random forest, artificial neural network, XGBoost, and decision trees to build clinical prediction models. Next, the diagnostic performance of these models with identified miRNAs was evaluated in the internal (GSE106817) and external validation dataset (GSE113486) by ROC analysis. The results showed that first four prediction models consistently yielded an AUC of 100%. Our findings provide significant evidence that the serum miRNA profile represents a promising diagnostic biomarker for ovarian cancer.
Collapse
Affiliation(s)
- Farzaneh Hamidi
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Gilani
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Arabi Belaghi
- Department of Statistics, Faculty of Mathematical Science, University of Tabriz, Tabriz, Iran.,Department of Mathematics, Applied Mathematics and Statistics, Uppsala University, Uppsala, Sweden
| | - Parvin Sarbakhsh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Tuba Edgünlü
- Department of Medical Biology, Faculty of Medicine, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Pasqualina Santaguida
- Department of Health Research and Methods, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
28
|
Inácio SV, Gomes JF, Falcão AX, Martins dos Santos B, Soares FA, Nery Loiola SH, Rosa SL, Nagase Suzuki CT, Bresciani KDS. Automated Diagnostics: Advances in the Diagnosis of Intestinal Parasitic Infections in Humans and Animals. Front Vet Sci 2021; 8:715406. [PMID: 34888371 PMCID: PMC8650151 DOI: 10.3389/fvets.2021.715406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 10/19/2021] [Indexed: 11/25/2022] Open
Abstract
The increasingly close proximity between people and animals is of great concern for public health, given the risk of exposure to infectious diseases transmitted through animals, which are carriers of more than 60 zoonotic agents. These diseases, which are included in the list of Neglected Tropical Diseases, cause losses in countries with tropical and subtropical climates, and in regions with temperate climates. Indeed, they affect more than a billion people around the world, a large proportion of which are infected by one or more parasitic helminths, causing annual losses of billions of dollars. Several studies are being conducted in search for differentiated, more sensitive diagnostics with fewer errors. These studies, which involve the automated examination of intestinal parasites, still face challenges that must be overcome in order to ensure the proper identification of parasites. This includes a protocol that allows for elimination of most of the debris in samples, satisfactory staining of parasite structures, and a robust image database. Our objective here is therefore to offer a critical description of the techniques currently in use for the automated diagnosis of intestinal parasites in fecal samples, as well as advances in these techniques.
Collapse
Affiliation(s)
- Sandra Valéria Inácio
- São Paulo State University (Unesp), School of Veterinary Medicine, Araçatuba, Brazil
| | - Jancarlo Ferreira Gomes
- School of Medical Sciences, University of Campinas (UNICAMP), Campinas, Brazil
- Institute of Computing (IC), University of Campinas (UNICAMP), Campinas, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Basgalupp M, Cerri R, Schietgat L, Triguero I, Vens C. Beyond global and local multi-target learning. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
VinegarScan: A Computer Tool Based on Ultraviolet Spectroscopy for a Rapid Authentication of Wine Vinegars. CHEMOSENSORS 2021. [DOI: 10.3390/chemosensors9110296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Ultraviolet-visible (UV-vis) spectroscopy has shown successful results in the last few years to characterize and classify wine vinegar according to its quality, particularly those with a protected designation of origin (PDO). Due to these promising results, together with the simplicity, price, speed, portability of this technique and its ability to create robust hierarchical classification models, the objective of this work was the development of a computer tool or software, named VinegarScan, which uses the UV-vis spectra to be able to perform quality control and authentication of wine vinegar in a quick and user-friendly way. This software was based on the open-source GUI created in C++ using several data mining algorithms (e.g., decision trees, classification algorithms) on UV-vis spectra. This software achieved satisfactory prediction results with the available analytical UV-vis data. The future idea of utility is to combine the VinegarScan tool with a portable UV-vis device that could be used by control bodies of the wine vinegar industry to achieve a clear differentiation from their competitors to avoid fraud.
Collapse
|
31
|
Hierarchical multilabel classification by exploiting label correlations. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01371-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
32
|
Aljedani N, Alotaibi R, Taileb M. HMATC: Hierarchical multi-label Arabic text classification model using machine learning. EGYPTIAN INFORMATICS JOURNAL 2021. [DOI: 10.1016/j.eij.2020.08.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
33
|
Narayan A, Reyes FA, Ren M, Haoyong Y. Real-Time Hierarchical Classification of Time Series Data for Locomotion Mode Detection. IEEE J Biomed Health Inform 2021; 26:1749-1760. [PMID: 34410932 DOI: 10.1109/jbhi.2021.3106110] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Accurate real-time estimation of motion intent is critical for rendering useful assistance using wearable robotic prosthetic and exoskeleton devices during user-initiated motions. We aim to evaluate hierarchical classification as a strategy for real-time locomotion mode recognition for the control of wearable robotic prosthetics and exoskeletons during user-intiated motions. METHODS We collect motion data from 8 subjects using a set of 7 inertial sensors for 16 lower limb locomotion modes of different specificities. A CNN based hierarchical classifier is trained to classify the modes into a specified label hierarchy. We measure the accuracy, stability, behaviour during mode transitions and suitability for real-time inference of the classifier. RESULTS The method achieves stable classification of locomotion modes using 1280 ms of time history data. It achieves average classification accuracy of 94.34% and an average AU(PRC) of 0.773 - comparable to similar classifiers. The method produces more informative classifications at transitions between modes. Less specific classes are classified earlier than more specific classes in the hierarchy. The inference step of the classifier can be executed in less than 2 ms on embedded hardware, indicating suitability for real-time operation. CONCLUSION Hierarchical classification can achieve accurate detection of locomotion modes and can break up mode transitions into multiple transitions between modes of different specificity. SIGNIFICANCE Multi-specific hierarchical classification of locomotion modes could lead to smoother, more fine grained control adaptation of wearable robots during locomotion mode transitions.
Collapse
|
34
|
Vyšata O, Ťupa O, Procházka A, Doležal R, Cejnar P, Bhorkar AM, Dostál O, Vališ M. Classification of Ataxic Gait. SENSORS 2021; 21:s21165576. [PMID: 34451018 PMCID: PMC8402252 DOI: 10.3390/s21165576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/08/2021] [Accepted: 08/12/2021] [Indexed: 12/17/2022]
Abstract
Gait disorders accompany a number of neurological and musculoskeletal disorders that significantly reduce the quality of life. Motion sensors enable high-quality modelling of gait stereotypes. However, they produce large volumes of data, the evaluation of which is a challenge. In this publication, we compare different data reduction methods and classification of reduced data for use in clinical practice. The best accuracy achieved between a group of healthy individuals and patients with ataxic gait extracted from the records of 43 participants (23 ataxic, 20 healthy), forming 418 segments of straight gait pattern, is 98% by random forest classifier preprocessed by t-distributed stochastic neighbour embedding.
Collapse
Affiliation(s)
- Oldřich Vyšata
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 03 Hradec Králové, Czech Republic; (A.M.B.); (O.D.); (M.V.)
- Correspondence:
| | - Ondřej Ťupa
- Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, 166 28 Praha 6, Czech Republic; (O.Ť.); (A.P.); (P.C.)
| | - Aleš Procházka
- Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, 166 28 Praha 6, Czech Republic; (O.Ť.); (A.P.); (P.C.)
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, 160 00 Prague 6, Czech Republic
| | - Rafael Doležal
- Department of Chemistry, Faculty of Science, University of Hradec Králové, 500 03 Hradec Králové, Czech Republic;
| | - Pavel Cejnar
- Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, 166 28 Praha 6, Czech Republic; (O.Ť.); (A.P.); (P.C.)
| | - Aprajita Milind Bhorkar
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 03 Hradec Králové, Czech Republic; (A.M.B.); (O.D.); (M.V.)
| | - Ondřej Dostál
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 03 Hradec Králové, Czech Republic; (A.M.B.); (O.D.); (M.V.)
| | - Martin Vališ
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 03 Hradec Králové, Czech Republic; (A.M.B.); (O.D.); (M.V.)
| |
Collapse
|
35
|
Handling imbalance in hierarchical classification problems using local classifiers approaches. Data Min Knowl Discov 2021. [DOI: 10.1007/s10618-021-00762-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
36
|
Long-term Cognitive Network-based architecture for multi-label classification. Neural Netw 2021; 140:39-48. [PMID: 33744712 DOI: 10.1016/j.neunet.2021.03.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 11/21/2022]
Abstract
This paper presents a neural system to deal with multi-label classification problems that might involve sparse features. The architecture of this model involves three sequential blocks with well-defined functions. The first block consists of a multilayered feed-forward structure that extracts hidden features, thus reducing the problem dimensionality. This block is useful when dealing with sparse problems. The second block consists of a Long-term Cognitive Network-based model that operates on features extracted by the first block. The activation rule of this recurrent neural network is modified to prevent the vanishing of the input signal during the recurrent inference process. The modified activation rule combines the neurons' state in the previous abstract layer (iteration) with the initial state. Moreover, we add a bias component to shift the transfer functions as needed to obtain good approximations. Finally, the third block consists of an output layer that adapts the second block's outputs to the label space. We propose a backpropagation learning algorithm that uses a squared hinge loss function to maximize the margins between labels to train this network. The results show that our model outperforms the state-of-the-art algorithms in most datasets.
Collapse
|
37
|
Nikoloski S, Kocev D, Levatić J, Wall DP, Džeroski S. Exploiting partially-labeled data in learning predictive clustering trees for multi-target regression: A case study of water quality assessment in Ireland. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2020.101161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
38
|
Abstract
As one of the common methods to construct classifiers, naïve Bayes has become one of the most popular classification methods because of its solid theoretical basis, strong prior knowledge learning characteristics, unique knowledge expression forms, and high classification accuracy. This classification method has a symmetry phenomenon in the process of data classification. Although the naïve Bayes classifier has high classification performance in single-label classification problems, it is worth studying whether the multilabel classification problem is still valid. In this paper, with the naïve Bayes classifier as the basic research object, in view of the naïve Bayes classification algorithm’s shortage of conditional independence assumptions and label class selection strategies, the characteristics of weighted naïve Bayes is given a better label classifier algorithm framework; the introduction of cultural algorithms to search for and determine the optimal weights is proposed as the weighted naïve Bayes multilabel classification algorithm. Experimental results show that the algorithm proposed in this paper is superior to other algorithms in classification performance.
Collapse
|
39
|
Petković M, Popovski G, Seljak BK, Kocev D, Eftimov T. DietHub: Dietary habits analysis through understanding the content of recipes. Trends Food Sci Technol 2021. [DOI: 10.1016/j.tifs.2020.10.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
40
|
|
41
|
Abstract
Abstract
In many application settings, labeling data examples is a costly endeavor, while unlabeled examples are abundant and cheap to produce. Labeling examples can be particularly problematic in an online setting, where there can be arbitrarily many examples that arrive at high frequencies. It is also problematic when we need to predict complex values (e.g., multiple real values), a task that has started receiving considerable attention, but mostly in the batch setting. In this paper, we propose a method for online semi-supervised multi-target regression. It is based on incremental trees for multi-target regression and the predictive clustering framework. Furthermore, it utilizes unlabeled examples to improve its predictive performance as compared to using just the labeled examples. We compare the proposed iSOUP-PCT method with supervised tree methods, which do not use unlabeled examples, and to an oracle method, which uses unlabeled examples as though they were labeled. Additionally, we compare the proposed method to the available state-of-the-art methods. The method achieves good predictive performance on account of increased consumption of computational resources as compared to its supervised variant. The proposed method also beats the state-of-the-art in the case of very few labeled examples in terms of performance, while achieving comparable performance when the labeled examples are more common.
Collapse
|
42
|
|
43
|
Deep hiearchical multi-label classification applied to chest X-ray abnormality taxonomies. Med Image Anal 2020; 66:101811. [PMID: 32937229 DOI: 10.1016/j.media.2020.101811] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 06/30/2020] [Accepted: 07/24/2020] [Indexed: 11/23/2022]
Abstract
Chest X-rays (CXRs) are a crucial and extraordinarily common diagnostic tool, leading to heavy research for computer-aided diagnosis (CAD) solutions. However, both high classification accuracy and meaningful model predictions that respect and incorporate clinical taxonomies are crucial for CAD usability. To this end, we present a deep hierarchical multi-label classification (HMLC) approach for CXR CAD. Different than other hierarchical systems, we show that first training the network to model conditional probability directly and then refining it with unconditional probabilities is key in boosting performance. In addition, we also formulate a numerically stable cross-entropy loss function for unconditional probabilities that provides concrete performance improvements. Finally, we demonstrate that HMLC can be an effective means to manage missing or incomplete labels. To the best of our knowledge, we are the first to apply HMLC to medical imaging CAD. We extensively evaluate our approach on detecting abnormality labels from the CXR arm of the Prostate, Lung, Colorectal and Ovarian (PLCO) dataset, which comprises over 198,000 manually annotated CXRs. When using complete labels, we report a mean area under the curve (AUC) of 0.887, the highest yet reported for this dataset. These results are supported by ancillary experiments on the PadChest dataset, where we also report significant improvements, 1.2% and 4.1% in AUC and average precision, respectively over strong "flat" classifiers. Finally, we demonstrate that our HMLC approach can much better handle incompletely labelled data. These performance improvements, combined with the inherent usefulness of taxonomic predictions, indicate that our approach represents a useful step forward for CXR CAD.
Collapse
|
44
|
Fan Y, Lu X, Liu Y, Zhao J. Angle-Based Hierarchical Classification Using Exact Label Embedding. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1801450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Yiwei Fan
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Xiaoling Lu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, NC
| | - Junlong Zhao
- School of Statistics, Beijing Normal University, Beijing, China
| |
Collapse
|
45
|
Abstract
AbstractExtreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels.
In this paper, we develop a suite of algorithms, called , which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees.
We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees.
By combining the effect of shallow trees and generalized label representation, achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for is available at https://github.com/xmc-aalto/bonsai.
Collapse
|
46
|
Ensembles of extremely randomized predictive clustering trees for predicting structured outputs. Mach Learn 2020. [DOI: 10.1007/s10994-020-05894-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
47
|
Kim HC, Park JH, Kim DW, Lee J. Multilabel naïve Bayes classification considering label dependence. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.06.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
48
|
|
49
|
Ghodratnama S, Abrishami Moghaddam H. Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00887-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
50
|
Strodthoff N, Wagner P, Wenzel M, Samek W. UDSMProt: universal deep sequence models for protein classification. Bioinformatics 2020; 36:2401-2409. [PMID: 31913448 PMCID: PMC7178389 DOI: 10.1093/bioinformatics/btaa003] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 12/13/2019] [Accepted: 01/02/2020] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION Inferring the properties of a protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification are tailored to single classification tasks and rely on handcrafted features, such as position-specific-scoring matrices from expensive database searches. We argue that this level of performance can be reached or even be surpassed by learning a task-agnostic representation once, using self-supervised language modeling, and transferring it to specific tasks by a simple fine-tuning step. RESULTS We put forward a universal deep sequence model that is pre-trained on unlabeled protein sequences from Swiss-Prot and fine-tuned on protein classification tasks. We apply it to three prototypical tasks, namely enzyme class prediction, gene ontology prediction and remote homology and fold detection. The proposed method performs on par with state-of-the-art algorithms that were tailored to these specific tasks or, for two out of three tasks, even outperforms them. These results stress the possibility of inferring protein properties from the sequence alone and, on more general grounds, the prospects of modern natural language processing methods in omics. Moreover, we illustrate the prospects for explainable machine learning methods in this field by selected case studies. AVAILABILITY AND IMPLEMENTATION Source code is available under https://github.com/nstrodt/UDSMProt. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nils Strodthoff
- Department of Video Coding & Analytics, Fraunhofer Heinrich Hertz Institute, Berlin 10587, Germany
| | - Patrick Wagner
- Department of Video Coding & Analytics, Fraunhofer Heinrich Hertz Institute, Berlin 10587, Germany
| | - Markus Wenzel
- Department of Video Coding & Analytics, Fraunhofer Heinrich Hertz Institute, Berlin 10587, Germany
| | - Wojciech Samek
- Department of Video Coding & Analytics, Fraunhofer Heinrich Hertz Institute, Berlin 10587, Germany
| |
Collapse
|