Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Prof. Shwethashree G C, Tanusha R, Harshitha P B, Dhyan Ganesh S, NS Dheeraj Gowda
DOI Link: https://doi.org/10.22214/ijraset.2024.60590
Certificate: View Certificate
In the dynamic realm of networking, Software Defined Networking (SDN) emerges as a transformative force, offering centralized control and programmable capabilities that empower network administrators to efficiently manage and secure network infrastructures. However, amidst the ever-present threat of distributed denial-of-service (DDoS) attacks, the need for robust detection mechanisms is imperative. This research proposes a machine learning-based approach to enhance DDoS attack detection and classification within SDN environments. Leveraging the SelectKBest algorithm for feature selection and employing various classifiers such as Decision tree, KMeans++, XGBoost, etc.. with a focus on Random Forest as the most effective, the project aims to bolster detection accuracy and efficiency. Through comprehensive experimentation and comparative analysis, the efficacy of the proposed methodology in identifying DDoS threats is demonstrated, contributing to the ongoing efforts in fortifying cybersecurity defenses against sophisticated adversarial tactics.
I. INTRODUCTION
The Software Defined Network(SDN) is a new network architecture which is the decoupling of forwarding plane from the control plane. [4] SDN differs from traditional networks as it separates Control Plane and Data Plane. Therefore SDN can improve network management, scalability, and dynamics. Thus SDN can dynamically modify the program to increase network security. In traditional network architecture, the network devices such as router and switch are managed and controlled by the network administrator according to the devices vendor company. Open Networking Foundation (ONF) develops SDN architecture, where the network administrators perform and manage network service from the centralized SDN controller. There are three layers found in SDN architecture:
With separate control and data fields, SDN can implement a centralized control function where all packets are received by the switch from the host, then the controller determines the packet delivery path. [4] The controller is the heart of the SDN where most of the functions of the SDN depend on the controller. Because it is centralized, SDN becomes the target of attack. A DDoS attack can take advantage of this functionality by attacking the Server via the registered Host. SDN network security must be considered especially for IoT, Server, or Cloud. A DDoS attack can be launched in a variety of ways, and its main effect is to decrease the availability of services, which can lead to losses in money and a variety of other issues.
The project at hand is dedicated to implementing a machine learning-based DDoS attack detection and classification system within an SDN environment. Key objectives include refining preprocessing techniques, training machine learning models to distinguish between normal and malicious network activities, and implementing a user-friendly interface for network administrators to monitor and respond to potential DDoS attacks in SDN environments. Through this research endeavor, the project aims to advance DDoS detection techniques in SDN environments, ultimately fortifying network security and ensuring the uninterrupted delivery of network services.
II. RELATED WORK
Mainly there are two previous DDoS attack detection methods:
A. Entropy-based DDoS Attack Detection
Researcher [1] developed an entropy-based algorithm capable of detecting DDoS attacks and identifying attacking paths. By analyzing entropy variations in destination IP addresses and flow initiation rates, the method swiftly detects attack traffic and triggers mitigation processes for network protection. [2] Another study introduced StateSec, a stateful SDN approach leveraging in-switch processing to detect and mitigate DDoS attacks. StateSec monitors packet matching features and employs an entropy-based algorithm for efficient and precise detection, outperforming traditional methods like sflow.
B. Machine Learning-based DDoS Attack Detection
[3]A researcher enhanced the Support Vector Machine (SVM) algorithm with Advanced Support Vector Machine (ASVM) techniques. This multi-class classification method categorizes DDoS attacks into three classes, utilizing volumetric and asymmetric features. Experimental results demonstrate a detection accuracy of approximately 97% with reduced training and testing times.
The research paper, [4] authored by Yudha Purwanto and his team, proposes a method for detecting Distributed Denial-of-Service (DDoS) attacks in SDN using Machine Learning with Ensemble Algorithm. Recognizing the vulnerability of SDN to DDoS attacks due to its centralized control, the study focuses on enhancing security through high detection accuracy and efficiency. The research consists of two main methodologies: clustering and classification, and detection validation. By employing Ensemble Algorithm techniques such as K-means++ and Random Forest, the study achieves significant improvements in accuracy and efficiency in detecting DDoS attacks within SDN environments. Experimental validation conducted on the InSDN dataset using the Mininet emulator demonstrates the effectiveness of the proposed approach, showcasing 99% accuracy, precision, recall, and F1-Measure, along with low processing time.
In one of the research [5] project, conducted by Dr. J. Thangakumar, along with Dr. M. Sambath and S. Santhosh, focuses on enhancing the detection of Distributed Denial-of-Service (DDoS) attacks using machine learning techniques. The study employs the CICDDoS2019 dataset and various machine learning algorithms, including Random Forest, XGBoost, and a modified version of XGBoost, to develop a robust detection model. By comparing the performance of these algorithms, the research aims to identify the most effective approach for accurately detecting DDoS attacks. Through extensive experimentation and analysis, the study demonstrates that the modified XGBoost classifier achieves the highest accuracy rate of 97% thereby enhancing cybersecurity.
Aye Thandar Kyaw [8] and his colleagues proposed a DDOS attack detection system for SDN networks using machine learning algorithms, specifically comparing the performance of linear and polynomial Support Vector Machine (SVM) classifiers. The proposed system utilizes flow data collection, feature extraction, and attack classification, employing the polynomial SVM algorithm to differentiate between normal and attack traffic. Through experimental evaluation, the paper demonstrates that the polynomial SVM classifier achieves higher accuracy and lower false alarm rates compared to the linear SVM classifier, providing an effective approach for mitigating DDOS attacks in SDN networks.
Nisharani Meti [9] showed the result by comparing machine learning algorithms: Naïve Bayes, Support Vector Machine (SVM) and Neural Network (NN) classifier to detect the legitimate and illegitimate connection. This paper showed the implementation of the proposed mechanism by using Mininet and Ryu SDN controller on different topologies. According to the experimental result, the author proved that SVM was a better classifier compared to the other two machine learning algorithms.
The results of recent study, however, are constrained. First off, because specific classification models are statically chosen, they have a poor detection rate. Second, some of the earlier methods had high computing costs during the model-training phase. Thirdly, because it can be difficult to cope with big amounts of data, many earlier methods relied on relatively limited data sets.
III. ALGORITHMS USED
A. Stratified K Fold
This is a data splitting method for cross-validation. It shuffles the data while maintaining the original class distribution, then splits it into folds (groups) for training and testing a model. Hence it ensures each fold maintains the same class distribution as the entire dataset.
It includes benefits as it provides a more reliable estimate of model performance on imbalanced datasets and also reduces the bias towards the majority class. It helps identify models that might struggle with the minority class. It provides a robust estimation of model performance compared to a single train-test split.
Usually K-Fold Cross-Validation splits the data into k groups (folds) of (almost) equal size. In each iteration (fold), it uses one fold for testing (validation set) and the remaining k-1 folds for training. Finally it evaluates the model's performance on the testing set and repeats the process for all k folds.
B. Random Forest Classifier
Using decision trees, the Random Forest is a supervised machine learning algorithm that can be applied to regression, classification, and other tasks. Random forests are especially useful for managing complicated and sizable datasets, managing high-dimensional feature spaces, and offering insights into the significance of individual features. This algorithm is widely used in many different domains due to its capacity to minimize overfitting and maintain high predictive accuracy. A randomly chosen portion of the training set is used by the Random Forest Classifier to generate a set of decision trees. It starts with a set of decision trees (DT) drawn at random from the training set. To determine the final prediction, it tallies the votes from each decision tree.
There are two stages to Random Forest(RF): the creation of the random forest and the creation of the random forest classifier. Initially, the Random Forest creation pseudocode is displayed.
When combined, Stratified K-Fold cross-validation with Random Forest, the following steps are typically followed:
Combining Stratified K-Fold cross-validation with Random Forest helps in obtaining more reliable estimates of the model's performance by reducing the impact of variability due to different train-test splits. It also allows for better utilization of the available data by using each data point for both training and validation across different folds. This approach is particularly beneficial when dealing with datasets with limited samples or imbalanced class distributions.
C. SelectKBest feature
One of the most popular techniques for selecting features is SelectKBest. It is a kind of filter-based feature selection technique used in machine learning. Filter-based feature selection techniques select features without relying on a particular machine learning algorithm. Rather, the features are ranked and scored using statistical methods.
SelectKBest scores and ranks the features according to how they relate to the output variable using statistical tests such as the mutual information score, ANOVA F-test, and chi-squared test. The K features that have the best scores are then chosen to be a part of the final feature subset.
Working with huge datasets requires the ability to quickly reduce the feature set to a manageable quantity, which is something that SelectKBest excels at doing. The parameters of SelectKBest are k and the score function. The feature importance is assessed via the usage of the score function.
D. Chi-square Testing
A statistical test called the chi-square test is performed to ascertain whether two categorical variables significantly correlate with one another. Since it is non-parametric, it does not make any assumptions on the data's distribution. The comparison of observed and expected frequencies inside a contingency table is the basis of the test. The chi-square test examines the relationship between the parts to assist with feature selection issues. It establishes if the correlation between two sampled categorical variables would represent the true correlation between them in the population.
It is a member of the continuous probability distribution family. The sum of the squares of the k independent standard random variables, as provided by is the definition of the Chi-Squared distribution.
χ^2=∑(O-E)^2/E
Where:
χ^2 is the chi-square test statistic.
O is the observed frequency of a category.
E is the expected frequency of a category under the null hypothesis of independence.
Chi-square test is utilized for categorical highlights in a dataset. We calculate Chi-square between each highlight and the target and select the required number of highlights with best Chi-square scores. Highlights that appear critical conditions with the target variable are considered imperative for forecast and can be chosen for assist investigation.
IV. PROPOSED METHODOLOGY
A. Dataset Selection and Preprocessing
2. Data Preprocessing
The dataset undergoes thorough preprocessing to ensure data quality and compatibility with machine learning algorithms. This includes handling the missing values, encoding categorical variables, and scaling the numerical features. Preprocessing steps are essential to prepare the dataset for effective model training and evaluation.
B. Initial Model Training and Evaluation
C. Feature Selection
TABLE I
FEATURES EXTRACTED FOR THE PROPOSED SYSTEM
No. |
Feature |
No. |
Feature |
1 |
Duration |
6 |
Hot |
2 |
Service |
7 |
Count |
3 |
Src Bytes |
8 |
Service Count |
4 |
Dst Bytes |
9 |
Dst Host Count |
5 |
Wrong Fragment |
10 |
Dst Host Service Count |
VII. ACKNOWLEDGMENT
We would like to thank few people who helped us in carrying out this project. We sincerely thank Dr. C Nataraju , Principal, JSSSTU, Mysuru and Dr. Srinath S, HOD, Department of computer science and engineering, JSSSTU, Mysuru who encouraged us at this venture.
It is our duty to thank our project supervisor Prof. Shwethashree G C for her encouragement and effective guidance throughout the project. We also thank our panel members who have corrected us in every step during Continuous Internal Evaluation.
In our study, we delve into the critical domain of DDoS attacks, recognized as among the most severe threats to network infrastructure. By exploring testing, analysis, and machine learning model development, we aim to enhance DDoS attack detection capabilities. Utilizing the KDD and CICDDoS datasets, we apply a consistent methodology, leveraging Random Forest for classification and Stratified K-Fold for data splitting. Our results showcase exceptional accuracy rates of 99.95% on the KDD Dataset and 99.92% on the CICDDoS dataset. This consistency underscores the robustness and effectiveness of our approach across diverse datasets, offering promising prospects for real-world cybersecurity applications. Overall, our study contributes significantly to advancing DDoS attack detection methodologies, paving the way for improved cyber threat mitigation strategies and enhancing network infrastructure resilience against evolving threats.
[1] M.Kia, “Early detection and mitigation of DDoS attacks in software defined networks”, M.Sc. Thesis. Ryerson University, Toronto, Ontario, Canada, 2015. [2] J.Boite, P.A.Nardin, F.Rebecchi, M.Bouet, V.Conan, “StateSec: Stateful monitoring for DDoS protection in software defined networks”,IEEE Conference on Network Softwarization, Bologna,Italy, 2017. [3] M. Myint Oo, K. Sinchai, and K. ossaporn, “ Advanced support vector machine based detection for distributed denial of service attack on software defined network,” Journal of Computer Networks and Communications, Volume 2019. [4] Diash Firdaus , Rendu Munadi , Yudha Purwanto , “DDOS Attack Detection in Software Defined Network using Ensemble K-means ++ and Random Forest” , 2020 3rd International Seminar on Research of Information and Technology and Intelligent systems (ISRITI) . [5] S.Santhosh, Dr. M.Sambath, Dr. J. Thangakumar, “Detection Of DDOS Attack using Machine Learning Models “, 2023 International Conference on Networking ans Communications (ICNWC). [6] Rashmikiran Pandey, Mrinal Pandey, Alexey Nazarov , “Enhanced DDoS Detection using Machine Learning” , 2023 6th International Conference on Information Systems and Computer Networks (ISCON) GLA University, Mathura, India. Mar 3-4, 2023. [7] C.M. Nalayini , Jeevaa Katiravan , “A New IDS for Detecting DDoS Attacks in Wireless Networks using Spotted Hyena Optimization and Fuzzy Temporal CNN” , Journal of Internet Technology Vol. 24 No. 1, January 2023. [8] Aye Thandar Kyaw , May Zin Oo , Chit Su Khin , “Machine-Learning Based DDOS Attack Classifier in Software Defined Network” , 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). [9] N. Meti, D.G Narayan,V.P Baligar,\"Detection of distributed denial of service attacks using machine learning algorithms in SDN\",IEEE 2017.
Copyright © 2024 Prof. Shwethashree G C, Tanusha R, Harshitha P B, Dhyan Ganesh S, NS Dheeraj Gowda. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET60590
Publish Date : 2024-04-19
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here