A Review on Spam Detection Techniques for IOT Devices

Authors: Nurussabah Mohammad Fahim, Prof. Dr. Ahmad Sajjad Khan

DOI Link: https://doi.org/10.22214/ijraset.2021.39317

Abstract

The number of Internet of Things (IoT) devices in smart homes is rapidly increasing, generating massive volumes of data that is mostly transmitted over wireless communication channels. The amount of data released by these gadgets increased as well. Aside from the increasing volume, the IoT gadget generates a significant amount of data. Having varied data quality characterised by its speed in terms of time and position in various different modalities dependence. However, different IoT devices are vulnerable to different dangers, such as cyber-attacks and changing network conditions. Data leakage, connections, and so on. The properties of IoT nodes, on the other hand, make the current solutions obsolete. inadequate to cover the entire security range of IoT networks Machine learning can help in this situation. Algorithms can be useful in detecting irregularities in data, which improves the security of IoT systems. The methods are aimed at the data anomalies that exist in the smart Internet in general.

Introduction

I. INTRODUCTION

In smart homes, the number of Internet of Things (IoT) devices is rapidly increasing, generating vast volumes of data that is largely transported over wireless communication channels. However, numerous IoT devices are exposed to a variety of dangers, including cyber-attacks, unstable network connectivity, data leakage, and so on. Statistical analysis and machine.

learning may help spot anomalies in data, which improves the security of the smart home IoT system, which is what this study is all about. With the use of several parameters such as feature importance, root mean square error, hyper-parameter tweaking, and others, this article explores the reliability of IoT devices delivering domestic appliance readings. The system assigned a spamicity score to each IoT device based on the feature importance and the root mean square error value based score of the machine learning models. The Internet of Things (IoT) is defined as a network of interconnected and distributed embedded systems that communicate via wired or wireless communication technologies. The Internet of Things (IoT) has experienced massive expansion and quick development, resulting in the inclusion of IoT devices in smart homes and smart cities. It's also defined as a network of physical items or things with limited compute, storage, and communication capabilities that are also integrated with electronics (such as sensors and actuators), software, and network connectivity that allow them to gather, process, and share data. IoT objects include smart household devices such as a smart bulb, smart adapter, smart metre, smart refrigerator, smart oven, AC, temperature sensor, smoke detector, IP camera, and more sophisticated devices such as frequency identification (RFID) devices, heartbeat detectors, accelerometers, parking zone sensors, and a variety of other sensors in automobiles, among others.

The Internet of Things offers a wide range of applications and services, including critical infrastructure, agricultural, military, household appliances, and personal health care. As the number of IoT devices grows, so does the number of anomalies caused by these devices. Interruptions, spoofing attacks, Dos attacks, jamming, eavesdropping, spam, and malware are among security challenges that IoT applications must address. The most caution should be exercised with web-based devices, as they account for the majority of IoT devices. It is typical in the workplace for IoT devices to be used to effectively implement security and protection features.

Wearable devices that collect and transfer a user's health data to a connected smartphone, for example, should avoid data leaking to protect privacy. According to market research, 25-30% of working workers connect their personal IoT devices to their company's network. The growing popularity of IoT attracts both the target audience, i.e. users, and the attackers.

However, as machine learning (ML) becomes more prevalent in various attack scenarios, IoT devices must adopt a defensive approach and important parameters in security protocols to strike a balance between security, privacy. The main purpose of this paper is to present a thorough and complete assessment of current research on detecting review spam using various machine learning approaches, as well as to develop methodology for further exploration.

II. RELATED WORKS

As the number of low-cost Internet-of-Things (IoT) devices has increased considerably in recent years, spammers have found them to be great targets. Default passwords are included with some network cameras when they are released to the market. Many IoT devices have old or poorly configured operating systems. These practises make it easy for IoT devices to be hacked. Some of these infected IoT devices could be used to send spam via email. As a result, mail server administrators must figure out how to deal with unwanted connections from client IoT devices. Even if a mail server uses a whitelist or blacklist to only allow E-mails relayed from a few known SMTP servers, such a list-based solution appears to be ineffective for worldwide SMTP clients when considering the flexibility and cost of list maintenance.[4]. In the analysis of spam detection in IoT devices, a variety of machine learning and soft computing algorithms have been used (Eg.Smart homes). Efforts have been made to solve security and privacy issues in IoT networks. [2]Choi J, Jeong, HKim et,al. have investigated the best and most efficient Spam Detection in IoT Devices in this article. Extreme Gradient Boosting, Decision Trees, Gradient Boosted Regression, Bagged Model, Bayesian Generalized Linear Model, and Generalized Linear Model with Stepwise Feature Selection methods are all employed.The author introduced a new technique for Spam Detection in Smart Homes in this study. Extreme Gradient Boosting, Decision Trees, Random Forest, and Gradient Boosted regression models are all examples of extreme gradient boosting. [4]. IoT systems, which include devices, services, and networks, are vulnerable to network, physical, and application threats that are similar to privacy leakage.

A. DoS (Distributed Denial of Service) Attacks

To prevent IoT devices from gaining access to the target database, attackers can flood it with unsolicited requests. a wide range of services These fraudulent requests generated by a network of IoT devices are known as though they were bots This form of attack has the potential to drain all of the service provider's resources. It has the ability to thwart genuine transactions.The network resource should be made unavailable to users.

B. Attacks on Radio-frequency Identification (RFID)

These threats are especially common in IoT devices' physical layer. This onslaught results in the loss of the game the device's integrity Attackers attempt to alter data at the node storage level or while it is in transit within the network transmission make use of spamming methods One of the most popular ways is ad fraud.

C. Near-Field Communication (NFC)

Electronic payment frauds are the focus of these attacks. Unencrypted traffic, eavesdropping, and tag alteration are all plausible assaults. Conditional privacy protection is the solution to this dilemma. As a result, the attacker is unable to create an identical profile using the user's public key. A trustworthy service manager generates random public keys for this model.

Machine Learning is the study of computer algorithms that improve themselves over time by performing various jobs. Computer science is a subfield of machine learning. To increase network security, many machine learning approaches such as supervised learning, unsupervised learning, and reinforcement learning have been widely used. The existing machine learning technique for detecting the above-mentioned attacks

III. MACHINE LEARNING APPROACHES THAT ARE SUPERVISED

For labelling the network, models such as support vector machines (SVMs), random forest, Naive Bayes, K-nearest neighbour (KNN), and neural networks (NNs) are utilised.

A. Machine Learning Techniques That Are Not Supervised

In the absence of labels, these strategies surpass their counterparts' techniques. It operates on the basis of the formation of clusters Multivariate correlation analysis is used in IoT devices to identify Dos attacks.

B. Machine Learning Algorithms That Use Reinforcement

These models allow an IoT system to choose security protocols and key settings by trial and error against a set of security protocols and key parameters. various assaults Q-learning has been utilised to increase authentication performance and will aid in virus detection as well.

C. Learning That Is Only Partially Supervised

Semi-supervised learning is a machine learning technique that sits between between unsupervised learning (no labelled training data) and supervised learning (fully labelled training data). Although some of the training examples lack training labels, many machine-learning researchers have discovered that unlabeled data, when combined with a small amount of labelled data, can significantly enhance learning accuracy. The training labels in weakly supervised learning are noisy, limited, or imprecise; nonetheless, they are frequently less expensive to induce, resulting in larger effective training sets. Machine learning approaches aid in the development of protocols for lightweight access control, which save a significant amount of energy and improve the life of IoT devices.

For example, the developed outer detection strategy uses K-NNs to address the problem of unregulated outer detection in WSNs. The literature review explains how machine learning may be used to improve network security.

SVM: Support vector machines, often known as support vector networks, are a collection of supervised learning algorithms for classification and regression. It is, however, mostly used in categorization difficulties. We represent each data item as a degree in n-dimensional space (where n is the number of features you have), with the value of each feature being the value of a certain coordinate, using the SVM method. Then, by locating the hyper-plane that separates the two classes, we may classify them. As a result, we may state that SVM's main goal is to discover a hyperplane in an N-dimensional space that clearly classifies data points. Both linear and non-linear data can be classified using SVM. It employs a method called the kernel trick to rework your data such that it supports these transformations and also determines an ideal boundary between the available outputs in order to categorize non-linear data.

IV. RANDOM FORESTS (RF)

They are a bag containing n Decision Trees (DT) with a unique set of hyper-parameters that have been trained on various subsets of data. Random Forests are also known as an in machine learning. Using an ensemble or a bagging approach is a good option. Because of its simplicity and effectiveness, random forest is one of the most widely used algorithms. stability. A random forest is more stable and reliable than a single tree. Random Forest is a supervised machine learning technique that can be used to classify and predict data. But talking about how it's used for classification because it's more intuitive and simple to understand. It's a collection of decision trees that aid in the reduction of decision tree variance It strikes a good balance between high variation and low volatility. By sampling with each tree fitted and a sample of characteristics at each split, substantial bias is achieved.

V. BOOSTING OF EXTREME GRADIENTS (XGBOOST)

Extreme Gradient Boosting (EGB) is a popular supervised machine learning model with distributed and out-of-core computing, efficiency, and parallelization capabilities. Parallelization takes place for numerous nodes in a single tree, not across trees. It's a scalable and efficient gradient boosting system. A good linear model solver and a tree learning technique are included in the package. Regression, grouping, and ranking are some of the objective functions it offers. It's based on numeric vectors. It's 10 times faster than the fastest gradient boosting methods currently available. Gradient boosting is an approach for finding the most effective tree model by using more precise approximations. It employs a number of ingenious techniques to make it particularly competitive with structured data. Each training round, a bad learner is created, and its predictions are matched with the correct outcome

In the coming years, the amount of data released by these gadgets will multiply many times. Aside from increased volume, IoT devices generate a great amount of data in a variety of modalities with varying data quality characterised by its speed in terms of time and position dependency. Machine learning (ML) algorithms can help ensure security and authorisation based on biotechnology, as well as anomaly detection to improve the usability and security of IoT devices, in such a scenario. Learning algorithms, on the other hand, are frequently used by attackers to exploit vulnerabilities in smart IoT-based devices..

Model no.

Model

Package

Tuning parameter

Model

Support Vector Classifier

sklearn

None

Model2

Random forest

sklearn

None

Model3

eXtreme Gradient

Boosting

Xgboost

Nrounds

Lambda

alpha

Conclusion

ML models are used to identify the spam boundaries of IoT devices in the proposed system. Using highlight designing process, the IoT dataset used in the experiments is pre-prepared. The spamicity score is used in this research to determine the reliability of IoT devices in the smart home organisation. Different ML models were utilised to assess the time-arrangement information produced by keen metres through extensive tests and analysis. By assigning a spam score to IoT gadgets in a smart home, different commitment levels of IoT gadgets were resolved with the help of group strategies for ML. The results demonstrate that the spamicity score of the devices aids in refining the conditions that must be met for IoT devices in the smart home to function properly.

References

[1] Fatima Hussain,Rasheed Hussain,Syed Ali HassanHossain. Machine Learning in IoT Security: Current Solutions and Future Challenges [2] Choi, J.; Jeoung, H.; Kim, J.; Ko, Y.; Jung, W.; Kim, H.; Kim, J. Detecting and identifying faulty IoT devices in smart homes with context extraction. In Proceedings of The 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, Luxembourg, 25– 28 June 2018; pp. 610–621. [3] Tang, S.; Gu, Z.; Yang, Q.; Fu, S. Smart Home IoT Anomaly Detection based on Ensemble Model Learning from Heterogeneous Data. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 4185–4190. [4] Makkar A.; Garg S.; Kumar, N.; Hossain, M.S.; Ghoneim, A.; Alrashoud, M. An Efficient Spam Detection Technique for IoT Devices using Machine Learning. IEEE Trans. Ind. Inform. 2020. [5] Ameema Zainab, Shady S. Refaat and Othmane Bouhali;Ensemble-Based Spam Detection in Smart Home IoT Devices Time Series Data Using Machine Learning Techniques [6] Nitin Jindal and Bing Liu. \"Opinion Spam and Analysis.\" Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, California, USA [7] Loredana Firte Camelia Lemnaru Rodica Potolea Spam Detection Filter using KNN Algorithm and Resampling 2010 IEEE [8] Peng Wan, Minoru Uehara Spam Detection Using Sobel Operators and OCR 2012 26th International Conference on Advanced Information Networking and Applications Workshops [9] Jakub Piskorski, Marcin Sydow, Dawid Weiss Exploring Linguistic Features for Web Spam Detection:A Preliminary Study ACM 200x [10] SNEHAL DIXIT & A.J.AGRAWAL REVIEW SPAM DETECTION International Journal of Computational Linguistics and Natural Language Processing Vol 2 Issue 6 June 2013 ISSN 2279 – 0756 [11] RAYMOND Y. K. LAU, S. Y. LIAO, RON CHIWAI KWOK, KAIQUAN XU, YUNQING XIA, YUEFENG LI Text Mining and Probabilistic Language Modeling for Online Review Spam Detection ACM Trans. Manag. Inform. Syst. 2, 4, Article 25 (December 2011)

Copyright

Copyright © 2022 Nurussabah Mohammad Fahim, Prof. Dr. Ahmad Sajjad Khan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET39317

Publish Date : 2021-12-08

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here