1
|
Puccetti T, Nardi S, Cinquilli C, Zoppi T, Ceccarelli A. ROSPaCe: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System and IoT Networks. Sci Data 2024; 11:481. [PMID: 38729994 PMCID: PMC11087584 DOI: 10.1038/s41597-024-03311-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
Most of the intrusion detection datasets to research machine learning-based intrusion detection systems (IDSs) are devoted to cyber-only systems, and they typically collect data from one architectural layer. Often the attacks are generated in dedicated attack sessions, without reproducing the realistic alternation and overlap of normal and attack actions. We present a dataset for intrusion detection by performing penetration testing on an embedded cyber-physical system built over Robot Operating System 2 (ROS2). Features are monitored from three architectural layers: the Linux operating system, the network, and the ROS2 services. The dataset is structured as a time series and describes the expected behavior of the system and its response to ROS2-specific attacks: it repeatedly alternates periods of attack-free operation with periods when a specific attack is being performed. This allows measuring the time to detect an attacker and the number of malicious activities performed before detection. Also, it allows training an intrusion detector to minimize both, by taking advantage of the numerous alternating periods of normal and attack operations.
Collapse
Grants
- B53D23012930006 Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research)
- PE00000014 Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research)
- B53D23012930006 Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research)
- PE00000014 Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research)
- Ministero dell'Istruzione, dell'Università e della Ricerca (Ministry of Education, University and Research)
Collapse
Affiliation(s)
- Tommaso Puccetti
- Department of Mathematics and Informatics, University of Florence, Viale Morgagni 67/a, 50134, Firenze, FI, Italy.
| | - Simone Nardi
- Mermec Engineering, Via Livornese 1019, 56122, Pisa, Italy
| | - Cosimo Cinquilli
- Department of Mathematics and Informatics, University of Florence, Viale Morgagni 67/a, 50134, Firenze, FI, Italy
| | - Tommaso Zoppi
- Department of Engineering and Information Science, University of Trento, Via Sommarive 9, 38124, Trento, Italy
| | - Andrea Ceccarelli
- Department of Mathematics and Informatics, University of Florence, Viale Morgagni 67/a, 50134, Firenze, FI, Italy
| |
Collapse
|
2
|
Wardana AA, Kołaczek G, Warzyński A, Sukarno P. Ensemble averaging deep neural network for botnet detection in heterogeneous Internet of Things devices. Sci Rep 2024; 14:3878. [PMID: 38365928 PMCID: PMC10873349 DOI: 10.1038/s41598-024-54438-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 02/13/2024] [Indexed: 02/18/2024] Open
Abstract
The botnet attack is one of the coordinated attack types that can infect Internet of Things (IoT) devices and cause them to malfunction. Botnets can steal sensitive information from IoT devices and control them to launch another attack, such as a Distributed Denial-of-Service (DDoS) attack or email spam. This attack is commonly detected using a network-based Intrusion Detection System (NIDS) that monitors the network device's activity. However, IoT network is dynamic and IoT devices have many types with different configurations and vendors in IoT environments. Therefore, this research proposes an Intrusion Detection System (IDS) by ensemble-ing traffic from heterogeneous IoT devices. This research proposes Deep Neural Network (DNN) to create a training model from each heterogeneous IoT device. After that, each training model from each heterogeneous IoT device is used to predict the traffic. The prediction results from each training model are averaged using the ensemble averaging method to determine the final result. This research used the N-BaIoT dataset to validate the proposed IDS model. Based on experimental results, ensemble averaging DNN can detect botnet attacks in heterogeneous IoT devices with an average accuracy of 97.21, precision of 91.41, recall of 87.31, and F1-score 88.48.
Collapse
|
3
|
Potharlanka JL, M NB. Feature importance feedback with Deep Q process in ensemble-based metaheuristic feature selection algorithms. Sci Rep 2024; 14:2923. [PMID: 38316958 PMCID: PMC10844500 DOI: 10.1038/s41598-024-53141-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 01/29/2024] [Indexed: 02/07/2024] Open
Abstract
Feature selection is an indispensable aspect of modern machine learning, especially for high-dimensional datasets where overfitting and computational inefficiencies are common concerns. Traditional methods often employ either filter, wrapper, or embedded approaches, which have limitations in terms of robustness, computational load, or capability to capture complex interactions among features. Despite the utility of metaheuristic algorithms like Particle Swarm Optimization (PSO), Firefly Algorithm (FA), and Whale Optimization (WOA) in feature selection, there still exists a gap in efficiently incorporating feature importance feedback into these processes. This paper presents a novel approach that integrates the strengths of PSO, FA, and WOA algorithms into an ensemble model and further enhances its performance by incorporating a Deep Q-Learning framework for relevance feedbacks. The Deep Q-Learning module intelligently updates feature importance based on model performance, thereby fine-tuning the selection process iteratively. Our ensemble model demonstrates substantial gains in effectiveness over traditional and individual metaheuristic approaches. Specifically, the proposed model achieved a 9.5% higher precision, an 8.5% higher accuracy, an 8.3% higher recall, a 4.9% higher AUC, and a 5.9% higher specificity across multiple software bug prediction datasets and samples. By resolving some of the key issues in existing feature selection methods and achieving superior performance metrics, this work paves the way for more robust and efficient machine learning models in various applications, from healthcare to natural language processing scenarios. This research provides an innovative framework for feature selection that promises not only superior performance but also offers a flexible architecture that can be adapted for a variety of machine learning challenges.
Collapse
Affiliation(s)
- Jhansi Lakshmi Potharlanka
- Department of Computer Science and Engineering, Vignan's Foundation for Science Technology and Research, Guntur, 522213, India.
| | - Nirupama Bhat M
- Department of Computer Science and Engineering, Vignan's Foundation for Science Technology and Research, Guntur, 522213, India
| |
Collapse
|
4
|
Chai G, Li S, Yang Y, Zhou G, Wang Y. CTSF: An Intrusion Detection Framework for Industrial Internet Based on Enhanced Feature Extraction and Decision Optimization Approach. SENSORS (BASEL, SWITZERLAND) 2023; 23:8793. [PMID: 37960495 PMCID: PMC10647644 DOI: 10.3390/s23218793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/20/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
The traditional Transformer model primarily employs a self-attention mechanism to capture global feature relationships, potentially overlooking local relationships within sequences and thus affecting the modeling capability of local features. For Support Vector Machine (SVM), it often requires the joint use of feature selection algorithms or model optimization methods to achieve maximum classification accuracy. Addressing the issues in both models, this paper introduces a novel network framework, CTSF, specifically designed for Industrial Internet intrusion detection. CTSF effectively addresses the limitations of traditional Transformers in extracting local features while compensating for the weaknesses of SVM. The framework comprises a pre-training component and a decision-making component. The pre-training section consists of both CNN and an enhanced Transformer, designed to capture both local and global features from input data while reducing data feature dimensions. The improved Transformer simultaneously decreases certain training parameters within CTSF, making it more suitable for the Industrial Internet environment. The classification section is composed of SVM, which receives initial classification data from the pre-training phase and determines the optimal decision boundary. The proposed framework is evaluated on an imbalanced subset of the X-IIOTID dataset, which represent Industrial Internet data. Experimental results demonstrate that with SVM using both "linear" and "rbf" kernel functions, CTSF achieves an overall accuracy of 0.98875 and effectively discriminates minor classes, showcasing the superiority of this framework.
Collapse
Affiliation(s)
| | - Shiming Li
- College of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China; (G.C.); (Y.Y.); (G.Z.)
| | | | | | - Yuhe Wang
- College of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China; (G.C.); (Y.Y.); (G.Z.)
| |
Collapse
|
5
|
Lee H, Lee Y, Jo M, Nam S, Jo J, Lee C. Enhancing Diagnosis of Rotating Elements in Roll-to-Roll Manufacturing Systems through Feature Selection Approach Considering Overlapping Data Density and Distance Analysis. SENSORS (BASEL, SWITZERLAND) 2023; 23:7857. [PMID: 37765913 PMCID: PMC10534779 DOI: 10.3390/s23187857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Roll-to-roll manufacturing systems have been widely adopted for their cost-effectiveness, eco-friendliness, and mass-production capabilities, utilizing thin and flexible substrates. However, in these systems, defects in the rotating components such as the rollers and bearings can result in severe defects in the functional layers. Therefore, the development of an intelligent diagnostic model is crucial for effectively identifying these rotating component defects. In this study, a quantitative feature-selection method, feature partial density, to develop high-efficiency diagnostic models was proposed. The feature combinations extracted from the measured signals were evaluated based on the partial density, which is the density of the remaining data excluding the highest class in overlapping regions and the Mahalanobis distance by class to assess the classification performance of the models. The validity of the proposed algorithm was verified through the construction of ranked model groups and comparison with existing feature-selection methods. The high-ranking group selected by the algorithm outperformed the other groups in terms of training time, accuracy, and positive predictive value. Moreover, the top feature combination demonstrated superior performance across all indicators compared to existing methods.
Collapse
Affiliation(s)
- Haemi Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Yoonjae Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Minho Jo
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Sanghoon Nam
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeongdai Jo
- Department of Printed Electronics, Korea Institute of Machinery and Materials, 156, Gajeongbuk-ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Changwoo Lee
- Department of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| |
Collapse
|
6
|
Ma Z, Sun ZL, Liu M. CRBP-HFEF: Prediction of RBP-Binding Sites on circRNAs Based on Hierarchical Feature Expansion and Fusion. Interdiscip Sci 2023:10.1007/s12539-023-00572-0. [PMID: 37233959 DOI: 10.1007/s12539-023-00572-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 04/20/2023] [Accepted: 04/21/2023] [Indexed: 05/27/2023]
Abstract
Circular RNAs (circRNAs) participate in the regulation of biological processes by binding to specific proteins and thus influence transcriptional processes. In recent years, circRNAs have become an emerging hotspot in RNA research. Due to powerful learning ability, the various deep learning frameworks have been used to predict the binding sites of RNA-binding protein (RPB) on circRNAs. These methods usually perform only single-level feature extraction of sequence information. However, the feature acquisition may be inadequate for single-level extraction. Generally, the features of deep and shallow layers of neural network can complement each other and are both important for binding site prediction tasks. Based on this concept, we propose a method that combines deep and shallow features, namely CRBP-HFEF. Specifically, features are first extracted and expanded for different levels of network. Then, the expanded deep and shallow features are fused and fed into the classification network, which finally determines whether they are binding sites. Compared to several existing methods, the experimental results on multiple datasets show that the proposed method achieves significant improvements in a number of metrics (with an average AUC of 0.9855). Moreover, much sufficient ablation experiments are also performed to verify the effectiveness of the hierarchical feature expansion strategy.
Collapse
Affiliation(s)
- Zheng Ma
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, and School of Electrical Engineering and Automation Anhui University, Hefei, 230601, Anhui, China
| | - Zhan-Li Sun
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, and School of Electrical Engineering and Automation Anhui University, Hefei, 230601, Anhui, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| |
Collapse
|
7
|
A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack Detection. INFORMATION 2023. [DOI: 10.3390/info14030191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
Cyber attack detection technology plays a vital role today, since cyber attacks have been causing great harm and loss to organizations and individuals. Feature selection is a necessary step for many cyber-attack detection systems, because it can reduce training costs, improve detection performance, and make the detection system lightweight. Many techniques related to feature selection for cyber attack detection have been proposed, and each technique has advantages and disadvantages. Determining which technology should be selected is a challenging problem for many researchers and system developers, and although there have been several survey papers on feature selection techniques in the field of cyber security, most of them try to be all-encompassing and are too general, making it difficult for readers to grasp the concrete and comprehensive image of the methods. In this paper, we survey the filter-based feature selection technique in detail and comprehensively for the first time. The filter-based technique is one popular kind of feature selection technique and is widely used in both research and application. In addition to general descriptions of this kind of method, we also explain in detail search algorithms and relevance measures, which are two necessary technical elements commonly used in the filter-based technique.
Collapse
|
8
|
Imanbayev A, Tynymbayev S, Odarchenko R, Gnatyuk S, Berdibayev R, Baikenov A, Kaniyeva N. Research of Machine Learning Algorithms for the Development of Intrusion Detection Systems in 5G Mobile Networks and Beyond. SENSORS (BASEL, SWITZERLAND) 2022; 22:9957. [PMID: 36560333 PMCID: PMC9782871 DOI: 10.3390/s22249957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 06/17/2023]
Abstract
The introduction of fifth generation mobile networks is underway all over the world which makes many people think about the security of the network from any hacking. Over the past few years, researchers from around the world have raised this issue intensively as new technologies seek to integrate into many areas of business and human infrastructure. This paper proposes to implement an IDS (Intrusion Detection System) machine learning approach into the 5G core architecture to serve as part of the security architecture. This paper gives a brief overview of intrusion detection datasets and compares machine learning and deep learning algorithms for intrusion detection. The models are built on the basis of two network data CICIDS2017 and CSE-CIC-IDS-2018. After testing, the ML and DL models are compared to find the best fit with a high level of accuracy. Gradient Boost emerged as the top method when we compared the best results based on metrics, displaying 99.3% for a secure dataset and 96.4% for attacks on the test set.
Collapse
Affiliation(s)
- Azamat Imanbayev
- Faculty of Information Technology, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty 050000, Kazakhstan
| | - Sakhybay Tynymbayev
- Information Security Laboratory, Almaty University of Power Engineering and Telecommunications, Almaty 050013, Kazakhstan
| | - Roman Odarchenko
- Department of Telecommunication and Radioelectronic Systems, National Aviation University, 03058 Kyiv, Ukraine; (R.O.)
| | - Sergiy Gnatyuk
- Department of Telecommunication and Radioelectronic Systems, National Aviation University, 03058 Kyiv, Ukraine; (R.O.)
| | - Rat Berdibayev
- Information Security Laboratory, Almaty University of Power Engineering and Telecommunications, Almaty 050013, Kazakhstan
| | - Alimzhan Baikenov
- Information Security Laboratory, Almaty University of Power Engineering and Telecommunications, Almaty 050013, Kazakhstan
| | - Nargiz Kaniyeva
- School of Information Technology and Engineering, Kazakh-British Technical University, Almaty 050000, Kazakhstan
| |
Collapse
|
9
|
Meta-Heuristic Optimization Algorithm-Based Hierarchical Intrusion Detection System. COMPUTERS 2022. [DOI: 10.3390/computers11120170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Numerous network cyberattacks have been launched due to inherent weaknesses. Network intrusion detection is a crucial foundation of the cybersecurity field. Intrusion detection systems (IDSs) are a type of machine learning (ML) software proposed for making decisions without explicit programming and with little human intervention. Although ML-based IDS advancements have surpassed earlier methods, they still struggle to identify attack types with high detection rates (DR) and low false alarm rates (FAR). This paper proposes a meta-heuristic optimization algorithm-based hierarchical IDS to identify several types of attack and to secure the computing environment. The proposed approach comprises three stages: The first stage includes data preprocessing, feature selection, and the splitting of the dataset into multiple binary balanced datasets. In the second stage, two novel meta-heuristic optimization algorithms are introduced to optimize the hyperparameters of the extreme learning machine during the construction of multiple binary models to detect different attack types. These are combined in the last stage using an aggregated anomaly detection engine in a hierarchical structure on account of the model’s accuracy. We propose a software machine learning IDS that enables multi-class classification. It achieved scores of 98.93, 99.63, 99.19, 99.78, and 0.01, with 0.51 for average accuracy, DR, and FAR in the UNSW-NB15 and CICIDS2017 datasets, respectively.
Collapse
|
10
|
An Effective Ensemble Automatic Feature Selection Method for Network Intrusion Detection. INFORMATION 2022. [DOI: 10.3390/info13070314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The mass of redundant and irrelevant data in network traffic brings serious challenges to intrusion detection, and feature selection can effectively remove meaningless information from the data. Most current filtered and embedded feature selection methods use a fixed threshold or ratio to determine the number of features in a subset, which requires a priori knowledge. In contrast, wrapped feature selection methods are computationally complex and time-consuming; meanwhile, individual feature selection methods have a bias in evaluating features. This work designs an ensemble-based automatic feature selection method called EAFS. Firstly, we calculate the feature importance or ranks based on individual methods, then add features to subsets sequentially by importance and evaluate subset performance comprehensively by designing an NSOM to obtain the subset with the largest NSOM value. When searching for a subset, the subset with higher accuracy is retained to lower the computational complexity by calculating the accuracy when the full set of features is used. Finally, the obtained subsets are ensembled, and by comparing the experimental results on three large-scale public datasets, the method described in this study can help in the classification, and also compared with other methods, we discover that our method outperforms other recent methods in terms of performance.
Collapse
|
11
|
Abstract
The increasing popularity of the Internet of Things (IoT) has significantly impacted our daily lives in the past few years. On one hand, it brings convenience, simplicity, and efficiency for us; on the other hand, the devices are susceptible to various cyber-attacks due to the lack of solid security mechanisms and hardware security support. In this paper, we present IMIDS, an intelligent intrusion detection system (IDS) to protect IoT devices. IMIDS’s core is a lightweight convolutional neural network model to classify multiple cyber threats. To mitigate the training data shortage issue, we also propose an attack data generator powered by a conditional generative adversarial network. In the experiment, we demonstrate that IMIDS could detect nine cyber-attack types (e.g., backdoors, shellcode, worms) with an average F-measure of 97.22% and outperforms its competitors. Furthermore, IMIDS’s detection performance is notably improved after being further trained by the data generated by our attack data generator. These results demonstrate that IMIDS can be a practical IDS for the IoT scenario.
Collapse
|