1
|
Ji IH, Lee JH, Kang MJ, Park WJ, Jeon SH, Seo JT. Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. SENSORS (BASEL, SWITZERLAND) 2024; 24:898. [PMID: 38339615 PMCID: PMC10857182 DOI: 10.3390/s24030898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/31/2023] [Accepted: 01/26/2024] [Indexed: 02/12/2024]
Abstract
As cyber-attacks increase in unencrypted communication environments such as the traditional Internet, protected communication channels based on cryptographic protocols, such as transport layer security (TLS), have been introduced to the Internet. Accordingly, attackers have been carrying out cyber-attacks by hiding themselves in protected communication channels. However, the nature of channels protected by cryptographic protocols makes it difficult to distinguish between normal and malicious network traffic behaviors. This means that traditional anomaly detection models with features from packets extracted a deep packet inspection (DPI) have been neutralized. Recently, studies on anomaly detection using artificial intelligence (AI) and statistical characteristics of traffic have been proposed as an alternative. In this review, we provide a systematic review for AI-based anomaly detection techniques over encrypted traffic. We set several research questions on the review topic and collected research according to eligibility criteria. Through the screening process and quality assessment, 30 research articles were selected with high suitability to be included in the review from the collected literature. We reviewed the selected research in terms of dataset, feature extraction, feature selection, preprocessing, anomaly detection algorithm, and performance indicators. As a result of the literature review, it was confirmed that various techniques used for AI-based anomaly detection over encrypted traffic were used. Some techniques are similar to those used for AI-based anomaly detection over unencrypted traffic, but some technologies are different from those used for unencrypted traffic.
Collapse
Affiliation(s)
- Il Hwan Ji
- Department of Information Security, Gachon University, Seongnam-si 1342, Republic of Korea; (I.H.J.); (J.H.L.)
| | - Ju Hyeon Lee
- Department of Information Security, Gachon University, Seongnam-si 1342, Republic of Korea; (I.H.J.); (J.H.L.)
| | - Min Ji Kang
- Department of Computer Engineering (Smart Security), Gachon University, Seongnam-si 1342, Republic of Korea; (M.J.K.); (S.H.J.)
| | - Woo Jin Park
- Department of Software, Gachon University, Seongnam-si 1342, Republic of Korea;
| | - Seung Ho Jeon
- Department of Computer Engineering (Smart Security), Gachon University, Seongnam-si 1342, Republic of Korea; (M.J.K.); (S.H.J.)
| | - Jung Taek Seo
- Department of Computer Engineering, Gachon University, Seongnam-si 1342, Republic of Korea
| |
Collapse
|
2
|
Medina-Arco JG, Magán-Carrión R, Rodríguez-Gómez RA, García-Teodoro P. Methodology for the Detection of Contaminated Training Datasets for Machine Learning-Based Network Intrusion-Detection Systems. SENSORS (BASEL, SWITZERLAND) 2024; 24:479. [PMID: 38257574 PMCID: PMC10819357 DOI: 10.3390/s24020479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/24/2024]
Abstract
With the significant increase in cyber-attacks and attempts to gain unauthorised access to systems and information, Network Intrusion-Detection Systems (NIDSs) have become essential detection tools. Anomaly-based systems use machine learning techniques to distinguish between normal and anomalous traffic. They do this by using training datasets that have been previously gathered and labelled, allowing them to learn to detect anomalies in future data. However, such datasets can be accidentally or deliberately contaminated, compromising the performance of NIDS. This has been the case of the UGR'16 dataset, in which, during the labelling process, botnet-type attacks were not identified in the subset intended for training. This paper addresses the mislabelling problem of real network traffic datasets by introducing a novel methodology that (i) allows analysing the quality of a network traffic dataset by identifying possible hidden or unidentified anomalies and (ii) selects the ideal subset of data to optimise the performance of the anomaly detection model even in the presence of hidden attacks erroneously labelled as normal network traffic. To this end, a two-step process that makes incremental use of the training dataset is proposed. Experiments conducted on the contaminated UGR'16 dataset in conjunction with the state-of-the-art NIDS, Kitsune, conclude with the feasibility of the approach to reveal observations of hidden botnet-based attacks on this dataset.
Collapse
Affiliation(s)
- Joaquín Gaspar Medina-Arco
- Network Engineering & Security Group (NESG), University of Granada, 18012 Granada, Spain; (R.M.-C.); (R.A.R.-G.); (P.G.-T.)
| | | | | | | |
Collapse
|
3
|
Hossain MA, Islam MS. A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection. Sci Rep 2023; 13:21207. [PMID: 38040793 PMCID: PMC10692109 DOI: 10.1038/s41598-023-48230-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open
Abstract
In the age of sophisticated cyber threats, botnet detection remains a crucial yet complex security challenge. Existing detection systems are continually outmaneuvered by the relentless advancement of botnet strategies, necessitating a more dynamic and proactive approach. Our research introduces a ground-breaking solution to the persistent botnet problem through a strategic amalgamation of Hybrid Feature Selection methods-Categorical Analysis, Mutual Information, and Principal Component Analysis-and a robust ensemble of machine learning techniques. We uniquely combine these feature selection tools to refine the input space, enhancing the detection capabilities of the ensemble learners. Extra Trees, as the ensemble technique of choice, exhibits exemplary performance, culminating in a near-perfect 99.99% accuracy rate in botnet classification across varied datasets. Our model not only surpasses previous benchmarks but also demonstrates exceptional adaptability to new botnet phenomena, ensuring persistent accuracy in a landscape of evolving threats. Detailed comparative analyses manifest our model's superiority, consistently achieving over 99% True Positive Rates and an unprecedented False Positive Rate close to 0.00%, thereby setting a new precedent for reliability in botnet detection. This research signifies a transformative step in cybersecurity, offering unprecedented precision and resilience against botnet infiltrations, and providing an indispensable blueprint for the development of next-generation security frameworks.
Collapse
Affiliation(s)
- Md Alamgir Hossain
- Institute of Information and Communication Technology (IICT), Bangladesh University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh.
| | - Md Saiful Islam
- Institute of Information and Communication Technology (IICT), Bangladesh University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh
| |
Collapse
|
4
|
Alam S, Alam Y, Cui S, Akujuobi C. Data-Driven Network Analysis for Anomaly Traffic Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:8174. [PMID: 37837004 PMCID: PMC10574999 DOI: 10.3390/s23198174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/06/2023] [Accepted: 09/15/2023] [Indexed: 10/15/2023]
Abstract
Cybersecurity is a critical issue in today's internet world. Classical security systems, such as firewalls based on signature detection, cannot detect today's sophisticated zero-day attacks. Machine learning (ML) based solutions are more attractive for their capabilities of detecting anomaly traffic from benign traffic, but to develop an ML-based anomaly detection system, we need meaningful or realistic network datasets to train the detection engine. There are many public network datasets for ML applications. Still, they have limitations, such as the data creation process and the lack of diverse attack scenarios or background traffic. To create a good detection engine, we need a realistic dataset with various attack scenarios and various types of background traffic, such as HTTPs, streaming, and SMTP traffic. In this work, we have developed realistic network data or datasets considering various attack scenarios and diverse background/benign traffic. Furthermore, considering the importance of distributed denial of service (DDoS) attacks, we have compared the performance of detecting anomaly traffic of some classical supervised and our prior developed unsupervised ML algorithms based on the convolutional neural network (CNN) and pseudo auto-encoder (AE) architecture based on the created datasets. The results show that the performance of the CNN-Pseudo-AE is comparable to that of many classical supervised algorithms. Hence, the CNN-Pseudo-AE algorithm is promising in actual implementation.
Collapse
Affiliation(s)
- Shumon Alam
- Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA; (S.C.); (C.A.)
| | - Yasin Alam
- Department of Physics, University of Texas, Austin, TX 78712, USA;
| | - Suxia Cui
- Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA; (S.C.); (C.A.)
| | - Cajetan Akujuobi
- Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA; (S.C.); (C.A.)
| |
Collapse
|
5
|
Djenna A, Barka E, Benchikh A, Khadir K. Unmasking Cybercrime with Artificial-Intelligence-Driven Cybersecurity Analytics. SENSORS (BASEL, SWITZERLAND) 2023; 23:6302. [PMID: 37514596 PMCID: PMC10383531 DOI: 10.3390/s23146302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/13/2023] [Accepted: 06/15/2023] [Indexed: 07/30/2023]
Abstract
Cybercriminals are becoming increasingly intelligent and aggressive, making them more adept at covering their tracks, and the global epidemic of cybercrime necessitates significant efforts to enhance cybersecurity in a realistic way. The COVID-19 pandemic has accelerated the cybercrime threat landscape. Cybercrime has a significant impact on the gross domestic product (GDP) of every targeted country. It encompasses a broad spectrum of offenses committed online, including hacking; sensitive information theft; phishing; online fraud; modern malware distribution; cyberbullying; cyber espionage; and notably, cyberattacks orchestrated by botnets. This study provides a new collaborative deep learning approach based on unsupervised long short-term memory (LSTM) and supervised convolutional neural network (CNN) models for the early identification and detection of botnet attacks. The proposed work is evaluated using the CTU-13 and IoT-23 datasets. The experimental results demonstrate that the proposed method achieves superior performance, obtaining a very satisfactory success rate (over 98.7%) and a false positive rate of 0.04%. The study facilitates and improves the understanding of cyber threat intelligence, identifies emerging forms of botnet attacks, and enhances forensic investigation procedures.
Collapse
Affiliation(s)
- Amir Djenna
- College of New Technologies of Information and Communication, University of Constantine 2, Constantine 25000, Algeria
| | - Ezedin Barka
- College of Information Technology, United Arab Emirates University, Al Ain P.O. Box 17555, United Arab Emirates
| | - Achouak Benchikh
- College of New Technologies of Information and Communication, University of Constantine 2, Constantine 25000, Algeria
| | - Karima Khadir
- College of New Technologies of Information and Communication, University of Constantine 2, Constantine 25000, Algeria
| |
Collapse
|
6
|
Natkaniec M, Bednarz M. Wireless Local Area Networks Threat Detection Using 1D-CNN. SENSORS (BASEL, SWITZERLAND) 2023; 23:5507. [PMID: 37420675 DOI: 10.3390/s23125507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 06/01/2023] [Accepted: 06/09/2023] [Indexed: 07/09/2023]
Abstract
Wireless Local Area Networks (WLANs) have revolutionized modern communication by providing a user-friendly and cost-efficient solution for Internet access and network resources. However, the increasing popularity of WLANs has also led to a rise in security threats, including jamming, flooding attacks, unfair radio channel access, user disconnection from access points, and injection attacks, among others. In this paper, we propose a machine learning algorithm to detect Layer 2 threats in WLANs through network traffic analysis. Our approach uses a deep neural network to identify malicious activity patterns. We detail the dataset used, including data preparation steps, such as preprocessing and division. We demonstrate the effectiveness of our solution through series of experiments and show that it outperforms other methods in terms of precision. The proposed algorithm can be successfully applied in Wireless Intrusion Detection Systems (WIDS) to enhance the security of WLANs and protect against potential attacks.
Collapse
Affiliation(s)
- Marek Natkaniec
- Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland
| | - Marcin Bednarz
- Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland
| |
Collapse
|
7
|
Real-time botnet detection on large network bandwidths using machine learning. Sci Rep 2023; 13:4282. [PMID: 36922641 PMCID: PMC10017669 DOI: 10.1038/s41598-023-31260-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 03/08/2023] [Indexed: 03/18/2023] Open
Abstract
Botnets are one of the most harmful cyberthreats, that can perform many types of cyberattacks and cause billionaire losses to the global economy. Nowadays, vast amounts of network traffic are generated every second, hence manual analysis is impossible. To be effective, automatic botnet detection should be done as fast as possible, but carrying this out is difficult in large bandwidths. To handle this problem, we propose an approach that is capable of carrying out an ultra-fast network analysis (i.e. on windows of one second), without a significant loss in the F1-score. We compared our model with other three literature proposals, and achieved the best performance: an F1 score of 0.926 with a processing time of 0.007 ms per sample. We also assessed the robustness of our model on saturated networks and on large bandwidths. In particular, our model is capable of working on networks with a saturation of 10% of packet loss, and we estimated the number of CPU cores needed to analyze traffic on three bandwidth sizes. Our results suggest that using commercial-grade cores of 2.4 GHz, our approach would only need four cores for bandwidths of 100 Mbps and 1 Gbps, and 19 cores on 10 Gbps networks.
Collapse
|
8
|
Adv-Bot: Realistic Adversarial Botnet Attacks against Network Intrusion Detection Systems. Comput Secur 2023. [DOI: 10.1016/j.cose.2023.103176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
|
9
|
Wu B, Zou F, Zhang C, Yu T, Li Y. Multi-field relation mining for malicious HTTP traffic detection based on attention and cross network. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2023. [DOI: 10.1016/j.jisa.2022.103411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
10
|
Pinto A, Herrera LC, Donoso Y, Gutierrez JA. Survey on Intrusion Detection Systems Based on Machine Learning Techniques for the Protection of Critical Infrastructure. SENSORS (BASEL, SWITZERLAND) 2023; 23:2415. [PMID: 36904618 PMCID: PMC10007329 DOI: 10.3390/s23052415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
Industrial control systems (ICSs), supervisory control and data acquisition (SCADA) systems, and distributed control systems (DCSs) are fundamental components of critical infrastructure (CI). CI supports the operation of transportation and health systems, electric and thermal plants, and water treatment facilities, among others. These infrastructures are not insulated anymore, and their connection to fourth industrial revolution technologies has expanded the attack surface. Thus, their protection has become a priority for national security. Cyber-attacks have become more sophisticated and criminals are able to surpass conventional security systems; therefore, attack detection has become a challenging area. Defensive technologies such as intrusion detection systems (IDSs) are a fundamental part of security systems to protect CI. IDSs have incorporated machine learning (ML) techniques that can deal with broader kinds of threats. Nevertheless, the detection of zero-day attacks and having technological resources to implement purposed solutions in the real world are concerns for CI operators. This survey aims to provide a compilation of the state of the art of IDSs that have used ML algorithms to protect CI. It also analyzes the security dataset used to train ML models. Finally, it presents some of the most relevant pieces of research on these topics that have been developed in the last five years.
Collapse
Affiliation(s)
- Andrea Pinto
- Systems and Computer Engineering Department, School of Engineering, University of the Andes, Bogotá 111711, Colombia
| | | | - Yezid Donoso
- Systems and Computer Engineering Department, School of Engineering, University of the Andes, Bogotá 111711, Colombia
| | - Jairo A. Gutierrez
- Networking and Security Research Centre, Department of Computer Science and Software Engineering, School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| |
Collapse
|
11
|
D’hooge L, Verkerken M, Wauters T, De Turck F, Volckaert B. Investigating Generalized Performance of Data-Constrained Supervised Machine Learning Models on Novel, Related Samples in Intrusion Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:1846. [PMID: 36850444 PMCID: PMC9960990 DOI: 10.3390/s23041846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Recently proposed methods in intrusion detection are iterating on machine learning methods as a potential solution. These novel methods are validated on one or more datasets from a sparse collection of academic intrusion detection datasets. Their recognition as improvements to the state-of-the-art is largely dependent on whether they can demonstrate a reliable increase in classification metrics compared to similar works validated on the same datasets. Whether these increases are meaningful outside of the training/testing datasets is rarely asked and never investigated. This work aims to demonstrate that strong general performance does not typically follow from strong classification on the current intrusion detection datasets. Binary classification models from a range of algorithmic families are trained on the attack classes of CSE-CIC-IDS2018, a state-of-the-art intrusion detection dataset. After establishing baselines for each class at various points of data access, the same trained models are tasked with classifying samples from the corresponding attack classes in CIC-IDS2017, CIC-DoS2017 and CIC-DDoS2019. Contrary to what the baseline results would suggest, the models have rarely learned a generally applicable representation of their attack class. Stability and predictability of generalized model performance are central issues for all methods on all attack classes. Focusing only on the three best-in-class models in terms of interdataset generalization, reveals that for network-centric attack classes (brute force, denial of service and distributed denial of service), general representations can be learned with flat losses in classification performance (precision and recall) below 5%. Other attack classes vary in generalized performance from stark losses in recall (-35%) with intact precision (98+%) for botnets to total degradation of precision and moderate recall loss for Web attack and infiltration models. The core conclusion of this article is a warning to researchers in the field. Expecting results of proposed methods on the test sets of state-of-the-art intrusion detection datasets to translate to generalized performance is likely a serious overestimation. Four proposals to reduce this overestimation are set out as future work directions.
Collapse
|
12
|
Xing Y, Shu H, Kang F. PeerRemove: An Adaptive Node Removal Strategy for P2P Botnet Based on Deep Reinforcement Learning. Comput Secur 2023. [DOI: 10.1016/j.cose.2023.103129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
13
|
Levshun D, Kotenko I. A survey on artificial intelligence techniques for security event correlation: models, challenges, and opportunities. Artif Intell Rev 2023. [DOI: 10.1007/s10462-022-10381-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
14
|
Mvula PK, Branco P, Jourdan GV, Viktor HL. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. DISCOVER DATA 2023; 1:4. [PMID: 37038388 PMCID: PMC10079755 DOI: 10.1007/s44248-023-00003-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/21/2023] [Indexed: 04/12/2023]
Abstract
In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.
Collapse
Affiliation(s)
- Paul K. Mvula
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Paula Branco
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Guy-Vincent Jourdan
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Herna L. Viktor
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| |
Collapse
|
15
|
Botnet dataset with simultaneous attack activity. Data Brief 2022; 45:108628. [DOI: 10.1016/j.dib.2022.108628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 09/11/2022] [Accepted: 09/19/2022] [Indexed: 11/21/2022] Open
|
16
|
A novel hierarchical attention-based triplet network with unsupervised domain adaptation for network intrusion detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Almashhadani AO, Carlin D, Kaiiali M, Sezer S. MFMCNS: A Multi-Feature and Multi-Classifier Network-based System for Ransomworm Detection. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102860] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Niu Z, Xue J, Qu D, Wang Y, Zheng J, Zhu H. A novel approach based on adaptive online analysis of encrypted traffic for identifying Malware in IIoT. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.04.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
|
20
|
Chernikova A, Oprea A. FENCE: Feasible Evasion Attacks on Neural Networks in Constrained Environments. ACM TRANSACTIONS ON PRIVACY AND SECURITY 2022. [DOI: 10.1145/3544746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
As advances in Deep Neural Networks (DNNs) demonstrate unprecedented levels of performance in many critical applications, their vulnerability to attacks is still an open question. We consider evasion attacks at testing time against Deep Learning in constrained environments, in which dependencies between features need to be satisfied. These situations may arise naturally in tabular data or may be the result of feature engineering in specific application domains, such as threat detection in cyber security. We propose a general iterative gradient-based framework called FENCE for crafting evasion attacks that take into consideration the specifics of constrained domains and application requirements. We apply it against Feed-Forward Neural Networks trained for two cyber security applications: network traffic botnet classification and malicious domain classification, to generate feasible adversarial examples. We extensively evaluate the success rate and performance of our attacks, compare their improvement over several baselines, and analyze factors that impact the attack success rate, including the optimization objective and the data imbalance. We show that with minimal effort (e.g., generating 12 additional network connections), an attacker can change the model’s prediction from the Malicious class to Benign and evade the classifier. We show that models trained on datasets with higher imbalance are more vulnerable to our FENCE attacks. Finally, we demonstrate the potential of performing adversarial training in constrained domains to increase the model resilience against these evasion attacks.
Collapse
|
21
|
|
22
|
Human-guided auto-labeling for network traffic data: The GELM approach. Neural Netw 2022; 152:510-526. [PMID: 35660547 DOI: 10.1016/j.neunet.2022.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/08/2022] [Accepted: 05/10/2022] [Indexed: 11/22/2022]
Abstract
Data labeling is crucial in various areas, including network security, and a prerequisite for applying statistical-based classification and supervised learning techniques. Therefore, developing labeling methods that ensure good performance is important. We propose a human-guided auto-labeling algorithm involving the self-supervised learning concept, with the purpose of labeling data quickly, accurately, and consistently. It consists of three processes: auto-labeling, validation, and update. A labeling scheme is proposed by considering weighted features in the auto-labeling, while the generalized extreme learning machine (GELM) enabling fast training is applied to validate assigned labels. Two different approaches are considered in the update to label new data to investigate labeling speed and accuracy. We experiment to verify the suitability and accuracy of the algorithm for network traffic, applying the algorithm to five traffic datasets, some including distributed denial of service (DDoS), DoS, BruteForce, and PortScan attacks. Numerical results show the algorithm labels unlabeled datasets quickly, accurately, and consistently and the GELM's learning speed enables labeling data in real-time. It also shows that the performances between auto- and conventional labels are nearly identical on datasets containing only DDoS attacks, which implies the algorithm is quite suitable for such datasets. However, the performance differences between the two labels are not negligible on datasets, including various attacks. Several reasons that require further investigation can be considered, including the selected features and the reliability of conventional labels. Even with this limitation of the current study, the algorithm will provide a criterion for labeling data in real-time occurring in many areas.
Collapse
|
23
|
Ding Q, Li J. AnoGLA: An efficient scheme to improve network anomaly detection. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2022. [DOI: 10.1016/j.jisa.2022.103149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
24
|
Wang Z, Shao L, Cheng K, Liu Y, Jiang J, Nie Y, Li X, Kuang X. ICDF: Intrusion collaborative detection framework based on confidence. INT J INTELL SYST 2022. [DOI: 10.1002/int.22877] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Zhi Wang
- College of Cyber Science Nankai University Tianjin China
| | - Leshi Shao
- College of Cyber Science Nankai University Tianjin China
| | - Kai Cheng
- College of Cyber Science Nankai University Tianjin China
| | - Yuanzhao Liu
- College of Cyber Science Nankai University Tianjin China
| | - Jianan Jiang
- Institute of Artificial and Intelligence Guangzhou University Guangzhou China
| | - Yuanping Nie
- National Key Laboratory of Science and Technology on Information System Security Beijing China
| | - Xiang Li
- National Key Laboratory of Science and Technology on Information System Security Beijing China
| | - Xiaohui Kuang
- National Key Laboratory of Science and Technology on Information System Security Beijing China
| |
Collapse
|
25
|
Tu T, Qin J, Zhang H, Chen M, Xu T, Huang Y. A comprehensive study of Mozi botnet. INT J INTELL SYST 2022. [DOI: 10.1002/int.22866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Teng‐Fei Tu
- State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
| | - Jia‐Wei Qin
- Cyber Security Department National Computer Network Emergency Response Technical Team/Coordination Center of China Beijing China
| | - Hua Zhang
- State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
| | - Miao Chen
- State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
| | - Tong Xu
- State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
| | - Yue Huang
- State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
| |
Collapse
|
26
|
Schwengber BH, Vergutz A, Prates NG, Nogueira M. Learning From Network Data Changes for Unsupervised Botnet Detection. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3109076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
Priyadarshini R, Barik RK. A deep learning based intelligent framework to mitigate DDoS attack in fog environment. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2019.04.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
28
|
Mills R, Marnerides AK, Broadbent M, Race N. Practical Intrusion Detection of Emerging Threats. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3091517] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
29
|
A Novel Framework for Generating Personalized Network Datasets for NIDS Based on Traffic Aggregation. SENSORS 2022; 22:s22051847. [PMID: 35270994 PMCID: PMC8914796 DOI: 10.3390/s22051847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/27/2022] [Accepted: 02/06/2022] [Indexed: 12/02/2022]
Abstract
In this paper, we addressed the problem of dataset scarcity for the task of network intrusion detection. Our main contribution was to develop a framework that provides a complete process for generating network traffic datasets based on the aggregation of real network traces. In addition, we proposed a set of tools for attribute extraction and labeling of traffic sessions. A new dataset with botnet network traffic was generated by the framework to assess our proposed method with machine learning algorithms suitable for unbalanced data. The performance of the classifiers was evaluated in terms of macro-averages of F1-score (0.97) and the Matthews Correlation Coefficient (0.94), showing a good overall performance average.
Collapse
|
30
|
Li Y, Zhu M, Luo X, Yin L, Fu Y. A privacy-preserving botnet detection approach in largescale cooperative IoT environment. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06934-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Machine learning for encrypted malicious traffic detection: Approaches, datasets and comparative study. Comput Secur 2022. [DOI: 10.1016/j.cose.2021.102542] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
32
|
Ibitoye O, Shafiq M, Matrawy A. Differentially Private Self-normalizing Neural Networks for Adversarial Robustness in Federated Learning. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102631] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
33
|
Hu X, Gu C, Chen Y, Wei F. CBD: A Deep-Learning-Based Scheme for Encrypted Traffic Classification with a General Pre-Training Method. SENSORS 2021; 21:s21248231. [PMID: 34960324 PMCID: PMC8705865 DOI: 10.3390/s21248231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 12/07/2021] [Accepted: 12/07/2021] [Indexed: 11/29/2022]
Abstract
With the rapid increase in encrypted traffic in the network environment and the increasing proportion of encrypted traffic, the study of encrypted traffic classification has become increasingly important as a part of traffic analysis. At present, in a closed environment, the classification of encrypted traffic has been fully studied, but these classification models are often only for labeled data and difficult to apply in real environments. To solve these problems, we propose a transferable model called CBD with generalization abilities for encrypted traffic classification in real environments. The overall structure of CBD can be generally described as a of one-dimension CNN and the encoder of Transformer. The model can be pre-trained with unlabeled data to understand the basic characteristics of encrypted traffic data, and be transferred to other datasets to complete the classification of encrypted traffic from the packet level and the flow level. The performance of the proposed model was evaluated on a public dataset. The results showed that the performance of the CBD model was better than the baseline methods, and the pre-training method can improve the classification ability of the model.
Collapse
Affiliation(s)
- Xinyi Hu
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China; (C.G.); (Y.C.); (F.W.)
- Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China
- Correspondence:
| | - Chunxiang Gu
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China; (C.G.); (Y.C.); (F.W.)
- Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China
| | - Yihang Chen
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China; (C.G.); (Y.C.); (F.W.)
| | - Fushan Wei
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China; (C.G.); (Y.C.); (F.W.)
- Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China
| |
Collapse
|
34
|
AEGR: a simple approach to gradient reversal in autoencoders for network anomaly detection. Soft comput 2021. [DOI: 10.1007/s00500-021-06110-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
35
|
Abstract
The enormous growth of services and data transmitted over the internet, the bloodstream of modern civilization, has caused a remarkable increase in cyber attack threats. This fact has forced the development of methods of preventing attacks. Among them, an important and constantly growing role is that of machine learning (ML) approaches. Convolutional neural networks (CNN) belong to the hottest ML techniques that have gained popularity, thanks to the rapid growth of computing power available. Thus, it is no wonder that these techniques have started to also be applied in the network traffic classification domain. This has resulted in a constant increase in the number of scientific papers describing various approaches to CNN-based traffic analysis. This paper is a survey of them, prepared with particular emphasis on a crucial but often disregarded aspect of this topic—the data transformation schemes. Their importance is a consequence of the fact that network traffic data and machine learning data have totally different structures. The former is a time series of values—consecutive bytes of the datastream. The latter, in turn, are one-, two- or even three-dimensional data samples of fixed lengths/sizes. In this paper, we introduce a taxonomy of data transformation schemes. Next, we use this categorization to describe various CNN-based analytical approaches found in the literature.
Collapse
|
36
|
Al-mashhadi S, Anbar M, Hasbullah I, Alamiedy TA. Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic. PeerJ Comput Sci 2021; 7:e640. [PMID: 34458571 PMCID: PMC8372004 DOI: 10.7717/peerj-cs.640] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/22/2021] [Indexed: 05/27/2023]
Abstract
Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the 'Domain Name System' (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet's DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.
Collapse
Affiliation(s)
- Saif Al-mashhadi
- National Advanced IPv6 Centre, Universiti Sains Malaysia, Penang, Malaysia
- Electrical Engineering, University of Baghdad, Baghdad, Baghdad, Iraq
| | - Mohammed Anbar
- National Advanced IPv6 Centre, Universiti Sains Malaysia, Penang, Malaysia
| | - Iznan Hasbullah
- National Advanced IPv6 Centre, Universiti Sains Malaysia, Penang, Malaysia
| | - Taief Alaa Alamiedy
- National Advanced IPv6 Centre, Universiti Sains Malaysia, Penang, Malaysia
- ECE Department- Faculty of Engineering, University of Kufa, Kufa, Najaf, Iraq
| |
Collapse
|
37
|
Attack Categorisation for IoT Applications in Critical Infrastructures, a Survey. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11167228] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
With the ever advancing expansion of the Internet of Things (IoT) into our everyday lives, the number of attack possibilities increases. Furthermore, with the incorporation of the IoT into Critical Infrastructure (CI) hardware and applications, the protection of not only the systems but the citizens themselves has become paramount. To do so, specialists must be able to gain a foothold in the ongoing cyber attack war-zone. By organising the various attacks against their systems, these specialists can not only gain a quick overview of what they might expect but also gain knowledge into the specifications of the attacks based on the categorisation method used. This paper presents a glimpse into the area of IoT Critical Infrastructure security as well as an overview and analysis of attack categorisation methodologies in the context of wireless IoT-based Critical Infrastructure applications. We believe this can be a guide to aid further researchers in their choice of adapted categorisation approaches. Indeed, adapting appropriated categorisation leads to a quicker attack detection, identification, and recovery. It is, thus, paramount to have a clear vision of the threat landscapes of a specific system.
Collapse
|
38
|
Joshi C, Ranjan RK, Bharti V. A Fuzzy Logic based feature engineering approach for Botnet detection using ANN. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.06.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
39
|
Mihailescu ME, Mihai D, Carabas M, Komisarek M, Pawlicki M, Hołubowicz W, Kozik R. The Proposition and Evaluation of the RoEduNet-SIMARGL2021 Network Intrusion Detection Dataset. SENSORS 2021; 21:s21134319. [PMID: 34202616 PMCID: PMC8272217 DOI: 10.3390/s21134319] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 06/17/2021] [Accepted: 06/18/2021] [Indexed: 11/19/2022]
Abstract
Cybersecurity is an arms race, with both the security and the adversaries attempting to outsmart one another, coming up with new attacks, new ways to defend against those attacks, and again with new ways to circumvent those defences. This situation creates a constant need for novel, realistic cybersecurity datasets. This paper introduces the effects of using machine-learning-based intrusion detection methods in network traffic coming from a real-life architecture. The main contribution of this work is a dataset coming from a real-world, academic network. Real-life traffic was collected and, after performing a series of attacks, a dataset was assembled. The dataset contains 44 network features and an unbalanced distribution of classes. In this work, the capability of the dataset for formulating machine-learning-based models was experimentally evaluated. To investigate the stability of the obtained models, cross-validation was performed, and an array of detection metrics were reported. The gathered dataset is part of an effort to bring security against novel cyberthreats and was completed in the SIMARGL project.
Collapse
Affiliation(s)
- Maria-Elena Mihailescu
- Department of Computer Science and Engineering, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania; (M.-E.M.); (D.M.)
| | - Darius Mihai
- Department of Computer Science and Engineering, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania; (M.-E.M.); (D.M.)
| | - Mihai Carabas
- RoEduNet, Strada Mendeleev 21-25, 010362 Bucharest, Romania;
| | - Mikołaj Komisarek
- ITTI Sp. z o.o., ul. Rubież 46, 61-612 Poznań, Poland; (M.K.); (R.K.)
- Institute of Telecommunications and Computer Science, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland;
| | - Marek Pawlicki
- ITTI Sp. z o.o., ul. Rubież 46, 61-612 Poznań, Poland; (M.K.); (R.K.)
- Correspondence:
| | - Witold Hołubowicz
- Institute of Telecommunications and Computer Science, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland;
| | - Rafał Kozik
- ITTI Sp. z o.o., ul. Rubież 46, 61-612 Poznań, Poland; (M.K.); (R.K.)
| |
Collapse
|
40
|
An effective NIDS framework based on a comprehensive survey of feature optimization and classification techniques. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06093-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
41
|
Venturi A, Apruzzese G, Andreolini M, Colajanni M, Marchetti M. DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems. Data Brief 2021; 34:106631. [PMID: 33365367 PMCID: PMC7749366 DOI: 10.1016/j.dib.2020.106631] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 12/02/2020] [Accepted: 12/03/2020] [Indexed: 11/28/2022] Open
Abstract
We present the first dataset that aims to serve as a benchmark to validate the resilience of botnet detectors against adversarial attacks. This dataset includes realistic adversarial samples that are generated by leveraging two widely used Deep Reinforcement Learning (DRL) techniques. These adversarial samples are proved to evade state of the art detectors based on Machine- and Deep-Learning algorithms. The initial corpus of malicious samples consists of network flows belonging to different botnet families presented in three public datasets containing real enterprise network traffic. We use these datasets to devise detectors capable of achieving state-of-the-art performance. We then train two DRL agents, based on Double Deep Q-Network and Deep Sarsa, to generate realistic adversarial samples: the goal is achieving misclassifications by performing small modifications to the initial malicious samples. These alterations involve the features that can be more realistically altered by an expert attacker, and do not compromise the underlying malicious logic of the original samples. Our dataset represents an important contribution to the cybersecurity research community as it is the first including thousands of automatically generated adversarial samples that are able to thwart state of the art classifiers with a high evasion rate. The adversarial samples are grouped by malware variant and provided in a CSV file format. Researchers can validate their defensive proposals by testing their detectors against the adversarial samples of the proposed dataset. Moreover, the analysis of these samples can pave the way to a deeper comprehension of adversarial attacks and to some sort of explainability of machine learning defensive algorithms. They can also support the definition of novel effective defensive techniques.
Collapse
Affiliation(s)
- Andrea Venturi
- Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Italy
| | - Giovanni Apruzzese
- Hilti Chair of Data and Application Security, University of Liechtenstein, Vaduz, Liechtenstein
| | - Mauro Andreolini
- Department of Physics, Computer Science and Mathematics, University of Modena and Reggio Emilia, Italy
| | - Michele Colajanni
- Department of Informatics, Science and Engineering, University of Bologna, Italy
| | - Mirco Marchetti
- Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Italy
| |
Collapse
|
42
|
|
43
|
Cordero CG, Vasilomanolakis E, Wainakh A, Mühlhäuser M, Nadjm-Tehrani S. On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection. ACM TRANSACTIONS ON PRIVACY AND SECURITY 2021. [DOI: 10.1145/3424155] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.
This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.
Collapse
|
44
|
Kenyon A, Deka L, Elizondo D. Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets. Comput Secur 2020. [DOI: 10.1016/j.cose.2020.102022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
45
|
Apruzzese G, Andreolini M, Marchetti M, Venturi A, Colajanni M. Deep Reinforcement Adversarial Learning Against Botnet Evasion Attacks. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2020. [DOI: 10.1109/tnsm.2020.3031843] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
46
|
Zhao J, Liu X, Yan Q, Li B, Shao M, Peng H. Multi-attributed heterogeneous graph convolutional network for bot detection. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.113] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
47
|
Schlör D, Ring M, Hotho A. iNALU: Improved Neural Arithmetic Logic Unit. Front Artif Intell 2020; 3:71. [PMID: 33733188 PMCID: PMC7861275 DOI: 10.3389/frai.2020.00071] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/05/2020] [Indexed: 11/20/2022] Open
Abstract
Neural networks have to capture mathematical relationships in order to learn various tasks. They approximate these relations implicitly and therefore often do not generalize well. The recently proposed Neural Arithmetic Logic Unit (NALU) is a novel neural architecture which is able to explicitly represent the mathematical relationships by the units of the network to learn operations such as summation, subtraction or multiplication. Although NALUs have been shown to perform well on various downstream tasks, an in-depth analysis reveals practical shortcomings by design, such as the inability to multiply or divide negative input values or training stability issues for deeper networks. We address these issues and propose an improved model architecture. We evaluate our model empirically in various settings from learning basic arithmetic operations to more complex functions. Our experiments indicate that our model solves stability issues and outperforms the original NALU model in means of arithmetic precision and convergence.
Collapse
Affiliation(s)
- Daniel Schlör
- Data Science Chair, Institute of Computer Science, University of Wuerzburg, Würzburg, Germany
| | - Markus Ring
- Department of Electrical Engineering and Computer Science, University of Applied Sciences and Arts Coburg, Coburg, Germany
| | - Andreas Hotho
- Data Science Chair, Institute of Computer Science, University of Wuerzburg, Würzburg, Germany
| |
Collapse
|
48
|
The effects of feature selection on the classification of encrypted botnet. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES 2020. [DOI: 10.1007/s11416-020-00367-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
49
|
Iannacone MD, Bridges RA. Quantifiable & comparable evaluations of cyber defensive capabilities: A survey & novel, unified approach. Comput Secur 2020. [DOI: 10.1016/j.cose.2020.101907] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
50
|
Blaise A, Bouet M, Conan V, Secci S. Botnet Fingerprinting: A Frequency Distributions Scheme for Lightweight Bot Detection. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2020. [DOI: 10.1109/tnsm.2020.2996502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|