Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Payal Budhe, Dipalee Rane
DOI Link: https://doi.org/10.22214/ijraset.2023.54329
Certificate: View Certificate
Life has reached a stage where we cannot live without internet enabled technology. New devices and services are being invented continuously with the evolution of new technologies to improve our day-to-day lifestyle. At the same time, this opens many security vulnerabilities. Cybercrime may happen to any device/service at any time with worst ever consequences. Internet use today has a greater impact on young people than ever before. They view the internet and mobile phone networks as the two major communication frameworks that are crucial to our everyday lives and the formation of our identities. However, these technologies are often used improperly. Many internet users are the targets of bullying, which leaves the \"target\" completely perplexed. Cyberbullying is drastically increasing. The issue of cyberbullying is saddening because the system that enables communication and information flow is evolving into a risky \"site\" to visit. Cyberbullying affects people all across the world, not just in one nation. United States have begun to enact laws that are focused on cyberbullying. Other countries have adopted laws against bullying that apply to both regular bullying and cyberbullying. The internet gives users the option to browse anonymously and to create profiles with secret identities. Our proposed structure can significantly increase the existing methods detection capacity in actual social network scenarios while effectively making up for their drawbacks.
I. INTRODUCTION
With more than four billion Internet users globally, the online world has had an enormous impact on society and has become a necessary component of daily life. The current society is entirely dependent on technology, and owing to the internet, young people are now enjoying modern ways of life. One of the major problems resulting from this rapid technological improvement, which also has many drawbacks, is cyberbullying. The internet has grown into a versatile tool that has significantly improved our day-to-day activities. Cyberbullying is only one of many unwanted behaviors that have found their way onto the internet.
A. Cyberbullying
Cyberbullying, also referred to as cyber harassment, is when someone is threatened, bullied, harassed, or scared using specific internet tools. Online bullying is another word for this. Cyberbullying is bullying committed via a digital tool, channel, or platform. Posing as someone else or breaking into someone's account or profile isn't always part of cyberbullying, But there are a lot of different ways that cyberbullying can happen. Cyberbullying is the act of distributing false information about another person online, including through text messages sent by SMS, online chat rooms, game forums, social networking sites, and online chat. It can be viewed on a variety of digital devices, including tablets, smartphones, and laptops. When offensive, harmful, or inappropriate content is sent, uploaded, or shared using various digital tools, it is referred to as cyberbullying. Cyberbullying has become a widespread issue because everyone uses social networking sites today, and it's easy to take advantage of this access. Embarrassing, blackmailing, disparaging, manipulating, or harassing behaviors are included in this. Such hostile behavior readily and unfavorably causes serious harm to a person.
B. Cyberbullying Types
According to the literature, there are 10 types of cyberbullying [34]:
C. Countermeasures By Social Media
Users can report bullying on social networking sites like Facebook and Twitter, which promote a safe environment online. These include specifying the intended audience, blocking specific users, and recognizing and banning people who behave badly. Despite the fact that they are incredibly important, these techniques are reactive in nature and only apply after the victim has already been harmed. By the time someone reports the offensive post and the required action is taken by the authority, many users may have already read it and experienced the previously mentioned harmful effects. We therefore need an automated system that can rapidly and accurately identify cyberbullying behavior.
D. Feature types used in cyberbullying prediction
|
Content Based Features |
|||||
Paper |
BoW |
SG |
PF |
CB |
SF |
PR |
1 |
Ö |
Ö |
× |
× |
× |
Ö |
2 |
Ö |
× |
Ö |
Ö |
× |
Ö |
3 |
× |
× |
Ö |
× |
× |
Ö |
4 |
Ö |
× |
Ö |
× |
× |
× |
5 |
Ö |
× |
× |
× |
Ö |
× |
6 |
Ö |
× |
Ö |
× |
× |
× |
7 |
Ö |
× |
Ö |
× |
Ö |
× |
8 |
Ö |
× |
× |
× |
× |
× |
9 |
Ö |
× |
Ö |
× |
× |
× |
10 |
Ö |
× |
Ö |
Ö |
Ö |
Ö |
11 |
Ö |
× |
Ö |
Ö |
× |
× |
12 |
Ö |
× |
× |
× |
Ö |
× |
13 |
Ö |
× |
× |
Ö |
× |
× |
14 |
Ö |
× |
Ö |
× |
Ö |
× |
BoW - bag of words, SG - skip gram, PF - profanity features, SF - sentiment features, PR – pronouns
Table 2. Summary of Profile Based feature types used in cyberbullying
|
Profile Based Features |
|||
Paper |
DF |
FCF |
TSF |
LOCF |
1 |
× |
× |
× |
× |
2 |
× |
× |
× |
× |
3 |
Ö |
× |
× |
× |
4 |
× |
× |
× |
× |
5 |
× |
× |
× |
× |
6 |
× |
Ö |
× |
× |
7 |
× |
× |
× |
× |
8 |
× |
× |
× |
× |
9 |
× |
× |
× |
× |
10 |
Ö |
Ö |
× |
× |
11 |
× |
× |
× |
× |
12 |
× |
× |
× |
× |
13 |
× |
× |
Ö |
Ö |
14 |
× |
× |
× |
× |
DF - demographic features , FCF - friends or follower count features, TSF - timestamp features, LOCF - location of post feature
II. RELATED WORK
There are several works done on cyberbullying detection.
In [23],This article introduces a brand-new Bully Net architecture for locating bullies on the Twitter social network. In order to create an SN based on bullying tendencies, researchers conducted in-depth research on mining SNs for a better understanding of the interactions between users in social media. They found that by creating conversations focused on environment as well as content, they could successfully pinpoint the feelings and actions that cause bullying. During the experimental investigation, the examination of their suggested centrality metrics to recognize bullies from SN, they were able to identify bullies for a variety of scenarios with about 80% accuracy and 81% precision.
In [18] this research, researchers suggested a detection architecture for cyberbullying to address the issue. They talked about the data architecture for hate speech on Twitter and personal attacks on Wikipedia. Given that tweets containing hate speech typically contained cursing, which made it simple to identify, natural language processing techniques for this type of speech were successful with accuracy rates of over 90% utilizing fundamental machine learning algorithms. Because of this, using BoW and Tf-Idf models rather than Word embeddings models produces better results .Although the three feature selection approaches worked similarly, it was challenging to identify personal assaults using the same model because the comments lacked a lot of learnable sentiment.
In [22], Haider et al.discuss a study on the identification of multilingual cyberbullying. They discovered that the majority of work in this field is done in English, thus they tried to identify cyberbullying in Arabic. They employed ML learning techniques to identify cyberbullying in their work. 32K tweets made up their dataset, and 1800 of those were bullying-related. To identify cyberbullying, they utilized the Support Vector Machine (SVM) and Naïve Bayes methods, and they received F1 scores of 92% and 90%, respectively.
In [20] this study, researchers developed two ensemble-based voting algorithms to identify sentences that are offensive or not. Every ML algorithm and ensemble technique that was used independently has been outperformed by our suggested model. For the twitter extracted dataset, they had the greatest accuracy. The performance of their model will be evaluated in the future using a variety of diverse datasets, as well as some private datasets. Finally, there are many other types of cyberbullying, including harassment, flame, denigration, impersonation, racism, sexism, etc.
In [16] this paper, the issue of detecting cyberbullying was addressed by the sequential hypothesis testing methodology. More specifically, the objective is to choose when to stop extracting and evaluating features from the message and make a decision. Each communication can be classified into one of two classes (i.e., cyberbullying or normal). In order to achieve this, an optimization function was created in terms of the average cost of the classification technique and the cost of features, and the best possible outcome was found.
III. PROPOSED SYSTEM
The detection of cyberbullying involves the following steps:
a. Dataset
Gathering data sets from different online networks is the initial stage in the detection of cyberbullying. User comments, posts, pictures, and videos on social networking and media sites typically form data sets for cyberbullying. Using the Twitter API makes it simple to access tweets on Twitter. Along with pre-made datasets from websites like kaggle.com, data from websites like YouTube, Facebook, Myspace, Instagram, etc. are also used for the detection of cyberbullying.
b. Pre-processing
Data pre-processing is the following stage, which is used to modify the data set so that it only contains relevant data. Data pre-processing includes the removal of white spaces, stop words, and special characters prior tokenization and lemmatization. At this stage, we can also use a variety of other methods to organize a data collection.
c. Tokenization
Tokenization is the process of breaking down a piece of text into small units called tokens. A token may be a word, part of a word or just characters like punctuation. Tokenization can be broadly classified into 3 types – word, character, and sub word (n-gram characters) tokenization. Word Tokenization is the most commonly used tokenization algorithm.
d. Stemming
After splitting sentences into words i.e., tokenization humans want to reduce the words to their base or root form. Essentially, this is exactly what is meant by stemming. The process of condensing words with comparable meanings into their "stem" or "root" forms is known as stemming.
e. Lemmatization
Lemmatization is the process of combining a word's several inflected forms into a single unit for evaluation. Similar to stemming, lemmatization adds context to the words. As a result, it links words with related meanings together.
f. Stopword Removal
The most frequent words in any language that have no meaning are called stop words, and natural language processing typically ignores them. Stop words in English include "a," "and," "the," and "of." Stop words are frequently eliminated from texts in natural language processing before they are processed for analysis. This is done to simplify the content and exclude unnecessary information.
g. Feature Extraction
A dimensionality reduction technique called feature extraction divides a large amount of raw data into smaller, easier-to-process groups. These huge data sets share the characteristic of having many variables that demand a lot of computational power to process. The term "feature extraction" refers to techniques for choosing and/or combining variables into features, which significantly reduces the amount of data that needs to be processed while effectively and fully characterizing the initial data set. Text is transformed into a matrix (or vector) of features using feature extraction algorithms. Among the most widely used techniques for feature extraction are: Bag-of-Words and TF-IDF.
h. Text Embedding or Word Embedding
It is an approach for representing words and documents. Word Embedding or Word Vector is a numeric vector input that represents a word in a lower-dimensional space. It allows words with similar meaning to have a similar representation. They can also approximate meaning. A word vector with 50 values can represent 50 unique features.
i. SGD( Stochastic Gradient Descent )Classifier
Gradient Descent is a generic optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea is to tweak parameters iteratively in order to minimize the cost function. An important parameter of Gradient Descent (GD) is the size of the steps, determined by the learning rate hyperparameters. If the learning rate is too small, then the algorithm will have to go through many iterations to converge, which will take a long time, and if it is too high, we may jump the optimal value. The word ‘stochastic ‘means a system or process linked with a random probability. Hence, in Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for each iteration. Last step in the detection of cyberbullying ,The information is divided into instances of positive or negative cyberbullying, i.e., information that most definitely contains information about cyberbullying against information that doesn't significantly includes information about cyberbullying. A training collection of labelled examples is required for classification algorithms to predict the label of an input before classifying input data. For data classification, a variety of algorithms and techniques can be utilized.
Cyberbullying has become more common and has begun to generate severe social issues as a result of young people using social media more frequently and the websites that host social media platforms becoming more widely used. A mechanism for automatically identifying cyberbullying must be created in order to stop its harmful effects. Given the significance of identifying cyberbullying, we investigated in this study how to recognize posts on social media that were associated with cyberbullying. This study looked at several studies that investigated the use of different algorithms to identify hostile activity on social networking sites. There was also a list of the numerous discriminatory traits that were used to identify cyberbullying on online social networking sites. With an accuracy of 92.73% and an F-measure of 94.32%, the stochastic gradient descent classifier provides us with the superior outcome. Because of the development of networking and information technology, there are now answers to online contact that are wonderful, awful, hateful, and everything in between. These reactions are routinely mishandled and have left innocent people with lifelong emotional pain, which frequently inspires hopelessness and suicide. They were unable to publicly ask for assistance from various agencies or family members.
[1] Chavan, V.S. and S. Shylaja. “Machine learning approach for detection of cyber-aggressive comments by peers on social media network. in Advances in computing”, communications and informatics (ICACCI), 2015 International Conference on. 2015. IEEE. [2] Chen, Y., et al.” Detecting Offensive Language in social media to Protect Adolescent Online Safety. in Privacy, Security”, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom). 2012. IEEE [3] Dadvar, M., et al., “Improved cyberbullying detection using gender information”. 2012. [4] Dinakar, K., R. Reichart, and H. Lieberman, “Modeling the detection of Textual Cyberbullying”. 2011. [5] Van Hee, C., et al. “Detection and fine-grained classification of cyberbullying events”. in International Conference Recent Advances in Natural Language Processing (RANLP). 2015. [6] Hosseinmardi, H., et al.,” Detection of cyberbullying incidents on the Instagram social network”,arXiv preprint arXiv:1503.03909, 2015 [7] Kontostathis, A., et al. “Detecting cyberbullying: query terms and techniques”. in Proceedings of the 5th annual acm web science conference. 2013. ACM. [8] Sanchez, H. and S. Kumar, “Twitter bullying detection”. UCSC ISM245 Data Mining course report, 2011. [9] Zhao, R., A. Zhou, and K. Mao. “Automatic detection of cyberbullying on social networks based on bullying features”, in Proceedings of the 17th International Conference on Distributed Computing and Networking. 2016. ACM. [10] Squicciarini, A., et al. “Identification and characterization of cyberbullying dynamics in an online social network. in Proceedings of the Advances in Social Networks Analysis and Mining”, ACM-2015. [11] Reynolds, K., A. Kontostathis, and L. Edwards. “Using machine learning to detect cyberbullying in Machine Learning and Applications and Workshops” (ICMLA), 2011 10th International Conference on. 2011. IEEE. [12] Yin, D., et al., “Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB”, 2009. [13] Xu, J.-M., et al. “Learning from bullying traces in social media”, in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012. Association for Computational Linguistics. [14] Galán-García, P., et al. “Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying”, in International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. 2014. Springer. [15] Chris Emmery,Ben Verhoeven, Guy De Pauw, Gilles Jacobs, Cynthia Van Hee, Els Lefever, Bart Desmet, Ve ´ronique Hoste, Walter Daelemans, “Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity”,Springer2020. [16] Saloni Mahesh Kargutkar, Prof. Vidya Chitre , “A Study of Cyberbullying Detection Using Machine Learning Techniques” , IEEE Xplore 2020. [17] Mohammed Ali Al-garadi1,Mohammad Rashid Hussain, Nawsher Khan, Ghulam, Murtaza, Henry Friday Nweke1, Ihsan Ali, Ghulam Mujtaba,Haruna Chiroma, Hasan Ali Khattak and Abdullah Gani,” Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges ”, IEEE 2019. [18] Monirah A., Al-Ajlan, Mourad Ykhlef, “Optimized Twitter Cyberbullying Detection based on Deep Learning”, 978-1-5386-4110-1, IEEE-2018. [19] N. M. Zainudin, K. H. Zainal, N. A. Hasbullah, N. A. Wahab, and S. Ramli, “A review on cyberbullying in Malaysia from digital forensic “ [20] Vandana Nanda Kumar, Binsu C, Kovoor, Sreeja M.U., “Cyber - Bullying Revelation in Twitter Data using Naïve-Bayes Classifier Algorithm “,International Journal of Advanced Research in Computer Science. Volume 9, No. Jan-Feb 2018. [21] Semiu Salawu, Yulan He, and Joanna Lumsden, “Approaches to Automated Detection of Cyberbullying: A Survey “, IEEE Transaction 2017. [22] Rohit Pawar, Rajeev R. Raje, “Multilingual Cyber bullying Detection System” ,IEEE 2019. [23] Aparna Sankaran Srinath , Hannah Johnson, Gaby G. Dagher , and Min Long, “BullyNet: Unmasking Cyberbullies on Social Networks”,IEEE2021. [24] Farhan Bashir Shaikh, Mobashar Rehman, and Aamir aamin, “Cyberbullying: A Systematic Literature Review to Identify the Factors Impelling University Students Towards Cyberbullying”, IEEE 2020. [25] Bandeh Ali Talpur, Declan O’Sullivan, Cyberbullying severity detection: A machine learning approach ,Plos one 2020. [26] Rekha Sugandhi, Anurag Pande, Siddhant Chawla, Abhishek Agrawal, Husen Bhagat, “Methods for Detection of Cyberbullying: A Survey”, 2015 15th International Conference on ISDA [27] https://www.bing.com/image/search?q=cyberbullying+detection+diagram & form=HRDSC2&first=1&tsc=ImageHoverTitle [28] Cyril Onwubiko and Karim Ouazzane,” SOTER: A Playbook for Cybersecurity Incident Management”,IEEE 2022. [29] Norita Ahmad , Phillip A. Laplante ,Joanna F. Defranco , And Mohamad Kassab ,” A Cybersecurity Educated Community”,IEEE 2022. [30] Piyush Vyas , Martin Reisslein ,Bhaskar Prasad Rimal, , Gitika Vyas , Ganga Prasad Basyal ,and Prathamesh Muzumdar,” Automated Classification of Societal Sentiments on Twitter with Machine Learning”,IEEE 2022. [31] Shuwen Wang, Xingquan Zhu, Weiping Ding, and Amir Alipour Yengejeh,” Cyberbullying and Cyberviolence Detection: A Triangular User-Activity-Content View”,IEEE 2022. [32] Belal abdullah hezam murshed , jemal abawajy , suresha allappa1, mufeed ahmed naji saif , and hasib daowd esmail al-ariki,” DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform”,IEEE 2022. [33] Zhongyuan Jiang , Xianyu Chen , Jianfeng Ma and Philip S. Yu,” Rumor Decay: Rumor Dissemination Interruption for Target Recipients in Social Networks”,IEEE 2022. [34] 10 Forms of Cyberbullying | Kids Safety (kaspersky.com).
Copyright © 2023 Payal Budhe, Dipalee Rane. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET54329
Publish Date : 2023-06-22
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here