Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Mansi Samir Pathare, Divakar Mallah, Ashma Bhagad, Prof. Md. Ameen
DOI Link: https://doi.org/10.22214/ijraset.2024.60450
Certificate: View Certificate
Over this period of time, there has been a significant increase in the quantum of data reused over the Internet due to the gradual increase in technology operation. The massive volume of data being transferred over the Internet raises the need for data security, which is where vulnerability discovery systems ( IDS) come into play and aid in the discovery of any pitfalls to virtual security. An intrusion discovery system, or IDS, is a device that keeps an eye on and evaluates data in order to find any cases of network or system intrusion. Hackers use colorful ways to gain access to a network. In order to classify attacks, describe them whenever an attack occurs, and determine which machine literacy algorithm is most applicable for relating the attack, the proposed Vulnerability discovery system is being enforced using slice- edge technologies, similar to machine literacy algorithms.
I. INTRODUCTION
This document An Vulnerability detection system is an apparatus or software program designed to keep an eye out for malicious activity or system or network violations. A Vulnerability system mixes outputs from various sources and separates malicious activity from false alarms using alarm filtering techniques. While monitoring networks for potentially malicious activity, intrusion detection systems are also prone to false alarms. Therefore, when deploying products for the first time, organizations must fine-tune them. In order to distinguish between malicious activity and regular network traffic, intrusion detection systems must be configured correctly. well as the collection, analysis, and logging of data. Making use of Internet of Things technology in industrial control systems is the main idea behind the Industrial Internet of Things (IIoT). Industrial control systems, or ICSs, have long been used to monitor industrial machinery and processes. They are a vital component of critical infrastructures. In addition to logging all events that occur in the industrial systems, they also conduct real-time data collection and analysis, device interaction, and monitoring. The optimization and automation of industrial processes are made possible by the use of IoT technology in these systems, which improves network intelligence and security.
A. Vulnerability analysis Architecture:
Using machine learning algorithms like decision trees, regression, random forests, and KNN, we have gathered the dataset for the intrusion detection system, which includes the following details from the KDD dataset. Details of the Data Set. The process of gathering data includes choosing high-quality data to be analyzed. To implement machine learning in this case, we used the KDD vulnerability dataset that was obtained from uci.edu. Finding approaches and resources for gathering thorough and pertinent data, analyzing it using statistical methods, and evaluating the findings are the duties of a data analyst. Data visualization: A lot of information is simpler to comprehend and evaluate when it is presented graphically.
3. Dataset Splitting: The dataset can be divided into two parts. Where part one is training dataset and testing dataset. The training dataset can be divided into 70%to 80%. The testing dataset can be divided into 20% to 30%. Training, test, and validation sets are the three subsets into which a dataset used for machine learning should be divided. Training set: When training a model and defining its ideal parameters, a data scientist uses a training set. The model learns from it. Test set: To assess the trained model's capacity for generalization, a test set is necessary. After training over training data, the latter refers to a model's capacity to spot patterns in fresh, untainted data. To prevent overfitting of the model—the previously mentioned inability to generalize—it is imperative to utilize distinct subsets for training and testing.
4. Model Training: Model training can start once a data scientist has preprocessed the gathered data and divided it into train and test sets. The algorithm must be "fed" training data during this procedure. When you use predictive analysis to get an answer, an algorithm will process data and produce a model that can identify a target value (attribute) in fresh data. Model development is the goal of model training.
5. Model Testing and Evaluation: This step aims to create the simplest model that can generate a target value quickly and accurately enough. Model tuning is one way a data scientist can accomplish this. It is the process of fine-tuning model parameters to maximize algorithmic performance. Not only does the test data contain specific attack types that are not present in the training data, but it also differs in probability distribution from the training data. This adds realism to the task. Some intrusion experts contend that the majority of new attacks are really variations of well-known ones, and that new variants can often be identified by their "signature.". There are 24 attack types in total across the datasets for training, and an additional 14 types are present solely in the test data.
B. Implementation of Machine learning algorithm:
There are various machine learning algorithms which can be useful for vulnerability detection and analysis which are: Decision Tree, Logistic Regression, KNN, and Random Forest are four of the machine learning algorithms that are taken into consideration for intrusion detection. For the expected accuracy and error values, the predicted value is compared.
The Logistic Regression Algorithm is a powerful and simple predictive model analysis technique used in machine learning. It is typically used for binary classification problems, predicting the probability of a binary outcome using a logit function. Around 60% of classification problems can be solved using this algorithm. It is a special case of linear regression, predicting probabilities of outcomes using a log function. In simple terms, it predicts scores on one variable based on scores from a second variable, with the predicted variable called the Criterion Variable.
a. Sigmoid Function: The sigmoid function is a mathematical function that has the ability to map any real number between o and 1, giving it the shape of a letter "S." A mathematical function with a distinctive "S"-shaped curve, or sigmoid curve, is called a sigmoid function. Any value in the domain is converted to a number between 0 and 1.
Any mathematical function with a distinctive S-shaped or sigmoid curve on its graph is called a sigmoid function. Logistic regression have sigmoidal function which contain s-shape curve that can we use for regression
2. Decision Tree Algorithm
One kind of supervised learning algorithm that is commonly used in classification issues is the decision tree. It functions with both continuous and categorical input and output variables. Using the most significant splitter or differentiator in the input variables, we divide the sample into two or more homogeneous sets (or sub-populations) in this technique. An internal node in a decision tree indicates an attribute test, a branch shows the result, and a leaf indicates the choice made after computing the attribute.
The main goal of using decision trees is to build a training model that, by learning decision rules deduced from previous data (training data), can be used to predict a target variable's class or value. Compared to other classification algorithms, the Decision Tree algorithm is very simple to understand. The Decision Tree algorithm uses tree representation to attempt to solve the problem. Every leaf node in the tree corresponds to a class label, and every internal node to an attribute.
3. Random Forest Algorithm
With n cases in the training dataset, the Random Forest Algorithm is used. N subsamples with replacement are randomly selected from these n cases. Each tree is constructed using these randomly selected subsamples from the training dataset. Choose a number m such that m < k, given that there are k variables for input. Every node has k variables, from which m are chosen at random. The split of the node is determined by selecting the split that maximizes these m variables. As the forest expands, the value of m remains unaltered. Without any pruning, every tree is allowed to reach its full potential.
4. K Nearest Neighbor (KNN) Algorithm
The KNN algorithm calculates the separation between a set of scenarios in the data set and a query scenario. Using a distance function d(x,y), where x,y are scenarios made up of N features and x={x1,…,xN} and y={y1,…,yN}.
The type of data affects the similarity measure. The Euclidean distance can be applied to real-valued data. Other data types, like binary or categorical data, can also be utilized, as well as Hamming distance. In order to model the issue and produce predictive decisions, instance-based algorithms use data instances (or rows). Since every training observation is kept as a component of the model, the KNN algorithm represents an extreme version of instance-based techniques.
II. LITERATURE SURVEY
III. DESIGN
A. UML
A general-purpose modelling language is called Unified Modelling Language (UML). UML's primary goal is to establish a common framework for visualizing a system's design process. It resembles blueprints from other engineering specialties quite a bit. UML diagrams are used to show a system's structure and behaviour. UML aids in the modelling, design, and analysis processes for system architects, software engineers, and businesspeople. Unified Modelling Language was standardized by the Object Management Group (OMG) in 1997.
1) Use Case Diagram: Using actors and use cases, a use case diagram captures the requirements and functionality of the system. Use cases serve as a model for the functions, duties, and services that a system must provide. Use cases show high-level features and the way a user will interact with the system. The fundamental ideas of Unified Modelling language modelling are use-cases.
2) Class Diagram: Using design elements like classes, packages, and objects, class diagrams represent the contents and structure of a class. Three perspectives are described in a class diagram when designing a system: conceptual, specification, and implementation. Three elements make up a class: name, attributes, and operations. Class diagrams also show relationships like inheritance, associations, and containment. The most prevalent type of relationship in a class diagram is the association relationship.
V. ACKNOWLEDGMENT
WeThank all those people who supported me and my group members for completion of our project. I thank my guide Dr. Md. Ameen for guiding me throughout. I also want to express my gratitude towards Prof. K.N. Attarde ,head of the Department For his motivation.I want to thank my family for continuously motivating & Supporting me for my research work.
For IIoT devices, cyber-security is essential. Focusing on the industrial side of IoT technology is essential since there is still a significant gap in providing adequate security for these systems. Security in IT systems has been ensured by the widespread use of big data analytics and machine learning solutions. But the prevalent cyber-risks associated with traditional IT systems differ because of their fundamentally different priorities. Therefore, security for IIoT requires extra care. We have shown how effective machine learning is for enhancing the security of these systems through our discussions and experimental evaluation. The first thing we did in this paper was examine the security vulnerabilities of the four most widely used IIoT protocols. After that, we evaluated the risks associated with the most significant and common vulnerabilities of the IIoT systems and the ways in which machine learning-based remedies could be effective in addressing them. Next, to highlight the areas where security is still required, a review of the literature was done on the machine learning-based anomaly detection techniques. The primary goal of an Intrusion Detection System is to identify and prevent attacks and malicious behavior within a network while minimizing false alarms. Utilizing machine learning algorithms enhances the accuracy and reliability of the IDS output. It also assesses the effectiveness of different machine learning algorithms in detecting attacks. The growing reliance on technology has resulted in vast amounts of data that must be securely processed and stored, emphasizing the importance of security for users.
[1] A B. Athira, V. \"Standardization and Classification of Alerts Generated by Intrusion Detection Systems,\" Pathari, vol. 5, issue 2, 2016, [2] International Journal on Cybernetics and Informatics. \"Intrusion Detection Systems with Correlation Capabilities\" by Daniel Johansson and Par Andersson; Yasm Curt, \"Prelude as a Hybrid IDS Framework\'\'; March 2009. [3] Kumar Vinod and Sangwan Prakash Om, \"Signature Based Intrusion Detection System Using SNORT\"; International Journal of Computer Applications & Information Technology, Vol. I, November 2012, [4] \"An approach for Anomaly based Intrusion detection system using SNORT,\" Singh Deepak Kumar, Gupta Jitendra Kumar, International Journal of Scientific & Engineering Research, Volume 4, Issue 9, September 2013. [5] \"Intrusion Detection System - A Study,\" S. Vijayarani, and Maria Sylviaa S., International Journal of Security, Privacy, and Trust Management, Vol. 4, Issue 1, pp. Feb. 31, 2015–44. [6]\"Research of Intrusion Detection System Based on Vulnerability Scanner,\" ICACC, Advanced Computer Control, Yang Guangming, Chen Dongming, Xu Jian, and Zhu Zhiliang, March 2010. [6] Chakraborty Nilotpal, \"Intrusion Prevention and Detection Systems,\". P. Porambage, G. Gür, D. P. M. Osorio, M. Liyanage, and M. Ylianttila, “6G security challenges and potential solutions,” in Proc. IEEE Joint Eur. Conf. Netw. Commun. (EuCNC) 6G Summit, 2021, pp. 1–6. [7] C. de Alwis et al., “Survey on 6G frontiers: Trends, applications, requirements, technologies and future research,” IEEE Open J. Commun. Soc., vol. 2, pp. 836–886, 2021. [8] X. You et al., “Towards 6G wireless communication networks: Vision, enabling technologies, and new paradigm shifts,” Sci. China Inf. Sci., vol. 64, no. 1, pp. 1–74, 2021. [9] Arshad, J.; Azad, M.A.; Amad, R.; Salah, K.; Alazab, M.; Iqbal, R. A review of performance, energy and privacy of intrusion detection systems for IoT. Electronics 2020, 9, 629. [Google Scholar] [CrossRef] [10] Mercer, D. Smart Home Will Drive Internet of Things To 50 Billion Devices. Available online 2023 [11] Gavin Wright, A.S.G. What Is a Side-Channel Attack? 2021.Available online on :https://www.techtarget.com/searchsecurity/ definition/side-channel-attack (accessed on 10 March 2023).
Copyright © 2024 Mansi Samir Pathare, Divakar Mallah, Ashma Bhagad, Prof. Md. Ameen . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET60450
Publish Date : 2024-04-16
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here