Deep Pattern Analysis for Cyberbullying Detection using Probabilistic Analytical Learning

Authors: K. Subha, D. Perarivalan, S. Santhosh, S. Sanjay

DOI Link: https://doi.org/10.22214/ijraset.2024.61503

Abstract

Cyberbullying has emerged as a significant issue in today\'s social media landscape, posing various detrimental effects. The combination of photo sharing and text comments has exacerbated the severity of cyberbullying incidents. Automated detection tools are crucial for ensuring the health and security of these platforms. However, traditional approaches that analyze text and images separately may fail to identify all instances of cyberbullying, especially when seemingly innocent content conveys bullying messages when posted together. This research proposes a novel system that extracts combined features from text and images to identify diverse cases of cyberbullying. The system can extract profiles based on behavior and uncover latent ties between users and groups with similar behaviors. Our approach utilizes methods log mining, business analysis, complex networks, and graph theory to achieve this. This paper outlines the entire process, from log file analysis to the construction of the user graph, with a particular focus on the step known as The finding of user behavioral patterns.

Introduction

I. INTRODUCTION

As cyberbullying becomes increasingly prevalent and severe on social media platforms, there's a pressing need for more effective detection and mitigation strategies. Traditional methods often focus solely on analyzing text or image content, overlooking the complexities of combined text-image cyberbullying instances. This research proposes an innovative framework that integrates features from both text and images, drawing on methodologies from various disciplines such as log mining, business process analysis, complex networks, and graph theory. However, addressing this challenge presents several difficulties, including the use of informal language and emojis, different languages, the absence of a comprehensive benchmark dataset, and the requirement for real-time detection in streaming data.
A critical aspect of our approach lies in the identification of user behavioral patterns, which serves as a foundational element for understanding the dynamics of cyberbullying within online communities. By systematically analyzing user interactions, content sharing behaviors, and engagement patterns, our methodology aims to unveil latent connections between users and user groups exhibiting similar behavioral traits. This multifaceted analysis enables us to discern subtle indicators of cyberbullying instances and identify potential clusters of harmful behavior.
The proposed framework encompasses a series of interconnected steps, beginning with data collection and preprocessing, followed by feature extraction from both textual and visual content. Leveraging advanced techniques in natural language processing (NLP) and computer vision, we extract linguistic attributes, visual elements, sentiment analysis, and contextual cues from user-generated content. Subsequently, we employ methods from log mining and business process analysis to construct behavioral models that capture the underlying patterns of user engagement and communication dynamics.
Furthermore, by applying principles from complex networks and graph theory, we analyze the structural properties of the social media ecosystem, including user connections, community structures, and influence networks. This sophisticated analysis allows us to identify cohesive user groups and detect potential clusters associated with cyberbullying behavior. Finally, we utilize graph-based representations to create user profiles based on behavioral attributes, interaction histories, and content preferences. Through the integration of these methodologies, our approach offers a holistic and data-driven framework for combating cyberbullying within social media environments. By uncovering user behavioral patterns and identifying cyberbullying clusters, we aim to foster a safer and more inclusive online community, where individuals can engage in positive digital interactions free from the harmful effects of cyberbullying.

II. RELATED WORK

In the past decade, numerous researchers have endeavored to address the issue of cyberbullying detection using various methodologies. Early studies such as those mentioned in [4] and [5], relied on conventional natural language processing (NLP) techniques such as N-grams and TF-IDF to extract features from text data.

These features were then used to train classifiers like Support Vector Machines (SVM) or Naive Bayes. Numerous notable articles discussing these approaches have been documented in surveys such as [2].

Subsequently, the emergence of deep learning methodologies marked a significant shift, with gaining prominence in language modeling. Works such as [6], [7], and [8] demonstrate the development of various iterations of Long Short-Term Memory (LSTM) and CNN models to address cyberbullying detection challenges. These approaches frequently extensive corpora, facilitating the mapping of words into high-dimensional vectors where semantically similar words cluster together. Additionally, some methods, as exemplified in [8], integrate user metadata such as the number of followers and social network connections into their detection algorithms. Researchers often train a unified classifier comprising both a text path and a metadata path to effectively leverage these combined features.

In recent years, numerous competitions and challenges centered around cyberbullying detection have spurred innovation in the field. Notably, articles authored by teams participating in challenges like SemEval2019 [11] have contributed significantly to the literature. A discernible trend observed in recent studies, exemplified by [12] and [13], is the adoption of Transformer-based architectures like BERT [14]. Remarkably, among the top 10 teams in the SemEval2019 offensive language detection task, seven utilized BERT-based architectures [11]. BERT's Transformer layers facilitate substantial parallelization, resulting in enhanced computational efficiency [15]. Furthermore, BERT's pre- trained models offer powerful language representation capabilities that can be fine-tuned with relative ease to achieve state-of-the-art performance.

III. PROPOSED METHOD

A. Cyberbullying Analysis and Classification

The proposed system employs advanced algorithms for cyberbullying analysis and classification. By analyzing the content of reviews and tweets posted on social media platforms, the system identifies instances of cyberbullying and categorizes them based on severity and type.

B. Feature-Based Classification

Utilizing feature-based classification techniques, the system extracts relevant features from textual content and user metadata. These features provide valuable insights into user behavior and interaction patterns, aiding in the identification of cyberbullying instances.

C. Handling Negations

To enhance the accuracy of cyberbullying detection, the system incorporates mechanisms to handle negations within text. By considering the context of negated statements, the system mitigates the risk of misclassification and improves the overall effectiveness of the detection process.

D. Opinion Summarization

Opinion summarization techniques are employed to distill key insights from user-generated content. By summarizing opinions expressed in reviews and tweets, the system facilitates a more comprehensive understanding of user sentiments and enables targeted interventions to address cyberbullying.

E. Sentiment Analysis at Multiple Levels

The proposed system conducts sentiment analysis at multiple levels and etc. This multi-tiered approach allows for a nuanced understanding of user sentiments and facilitates more accurate cyberbullying detection.

F. Event Derivation from Timestamped Tables

Events are derived from timestamped tables, with each table representing an activity and its associated timestamped values signifying events related to that activity. This approach enables a granular analysis of user behavior over time, facilitating the identification of temporal patterns and trends associated with cyberbullying.

G. Modification of Social Network Approach

Building upon the traditional social network approach, the proposed system introduces modifications focused on understanding user behavior within social media platforms. By considering a broader range of social interactions and relationships, the system enhances its ability to detect cyberbullying instances effectively.

H. Leveraging Cyberbullying Meta-Models

Cyberbullying meta-models are leveraged to describe the structural aspects of objects involved in cyberbullying incidents. These meta-models provide a standardized framework for analyzing and categorizing cyberbullying instances, improving the system's overall accuracy and efficiency.

I. Flexibility in Social Media Processes

Unlike traditional business processes, social media interactions lack strict or structured processes. Instead, declarative constraints and rules govern user behavior, allowing for greater flexibility and adaptability in cyberbullying detection. The proposed system incorporates these nuances into its analysis, ensuring comprehensive coverage of cyberbullying incidents.

IV. HARDWARE AND SOFTWARE REQUIREMENT

A. Backend Technologies:

Python: The system is built using the Python programming language, offering flexibility and a wide range of libraries for data analysis and machine learning.
NumPy: NumPy is utilized for numerical computing, providing efficient array operations and mathematical functions essential for data processing.
Sci-learn (scikit-learn): Sci-learn is a Python library used for machine learning tasks, including classification, regression, clustering, and model evaluation.
Jupyter Notebook: Jupyter Notebook serves as the interactive computing environment for developing and presenting the system's code and analysis. It enables seamless integration of code, visualizations, and explanatory text, facilitating reproducible research and collaboration.

B. Frontend Technologies:

C. Web Technologies:

The frontend of the system utilizes web technologies for user interaction and visualization. The specific technologies employed may include:

HTML: HTML is used for structuring the content of web pages, providing a standardized markup language for creating web interfaces.
CSS: CSS (Cascading Style Sheets) is used for styling web pages, allowing for customization of layout, colors, fonts, and other visual aspects.
JavaScript: JavaScript is employed for client-side scripting, enabling dynamic and interactive elements within web pages. Frameworks (e.g., React, Angular, Vue.js): Frontend frameworks may be used to facilitate the development of complex web applications, providing reusable components, state management, and routing capabilities.

???????VI. PROPOSED ALGORITHM

Proposed Algorithm: Probabilistic Analytical Learning Algorithm

The proposed algorithm, Probabilistic Analytical Learning Algorithm, leverages probabilistic and analytical techniques to facilitate effective cyberbullying detection and classification. By integrating principles from probability theory and analytical methods, the algorithm offers a robust framework for analyzing complex social media data and identifying cyberbullying instances.

Advantages of Proposed Algorithm:

Ability to Work with Insufficient Knowledge: The Probabilistic Analytical Learning Algorithm demonstrates resilience in scenarios where complete knowledge is unavailable or incomplete. By leveraging probabilistic reasoning and analytical capabilities, the algorithm can make informed decisions even with limited information.

Parallel Features for Fault Tolerance: In the event of a neural network component failure, the algorithm can continue its operations seamlessly due to its parallel processing features. This fault tolerance ensures uninterrupted performance and mitigates the impact of individual component failures on the overall system functionality.

Multifunctional Capabilities: The Probabilistic Analytical Learning Algorithm exhibits the ability to perform multiple functions simultaneously. By capitalizing on its parallel processing capabilities and sophisticated analytical techniques, the algorithm can handle diverse tasks efficiently, ranging from data analysis and feature extraction to classification and decision-making.

Conclusion

Our application is a pioneering advancement in process mining, aimed at optimizing processes by dissecting their execution traces. Unlike conventional methods, which often struggle with the complexities of systems where user behavior exhibits wide variability, our innovative solution offers a fresh approach. By employing a unique visualization technique, we unveil latent connections between users exhibiting similar behaviors within dynamic systems. One of our key accomplishments is overcoming the challenge of constructing a clear user network, typically hindered by the sheer dimensionality of user profile vectors. Our approach not only uncovers nuanced behavior patterns but also serves as a powerful tool for identifying instances of cyberbullying within user interactions. Moreover, by discerning representative user behavior and patterns, our application contributes to a more comprehensive understanding of system dynamics, enabling targeted interventions and ongoing process improvement initiatives.

References

[1] R. Kowalski, S. Limber, and P. W. Agatston. \"Cyberbullying. Malden, MA: Blackwell. 2008.\" [2] P. Bocij. \"Cyber stalking: Harassment in the Internet age and family. Greenwood Publishing to protect your Group. 2004.\" [3] J. Bishop. \"Representations of trolls in mass media communication: a review of media-texts and moral panics relating to internet trolling.\" International Journal of Web Based Communities 10(1): 7, 2014. [4] M. O. Lwin, B. Li, and R. P. Ang. \"Stop bugging me: An examination of adolescents protection behavior against online harassment.\" Journal of adolescence 35(1), 31-41, 2012. [5] S. Hinduja, and J. W. Patchin. \"Bullying beyond the schoolyard: Preventing and responding to cyberbullying.\" Thousand Oaks, CA: Sage. 2009. [6] T. Varinder, and P. Kanwar. \"Understanding social media.\" Bookboon, 2012. [7] A. Lenhart, M. Madden, A. Smith, K. Purcell., K. Zickuhr, and L. Rainie. \"Teens, Kindness and Cruelty on Social Network Sites: How American Teens Navigate the New World of\" Digital Citizenship\".\" Pew Internet & American Life Project. 2011. [8] M. Fishbein, and I. Ajzen. \"Belief Attitude, Intention and to Theory and Research Introduction Behavior: An Reading.\" 6, 1975. [9] S. Jameson. \"Cyberharrasment: Striking a balance between freespeech and privacy.\" CommLaw Conspectus 17, 231, 2008. [10] P. P. Nicolle, and L. J. Moriarty. \"Cyberstalking: Utilizing What We Do Know. Victims & Offenders.\" An International Journal of Evidence-based Research, Policy, and Practice 2009; 4.4, 435-41, 2009. [11] K.K. J. Seo, J. Tunningley, Z. Warner, and J. Buening. \"An Insight Into Student Perceptions of Cyberbullying.\" American Journal of Distance Education 30(1), 39-47, 2016. [12] T. Heiman, and D. Olenik-Shemesh. \"Cyberbullying Experience and Gender Differences Among Adolescents in Different Educational Settings.\" Journal of Learning Disabilities 48 (2). 146-155, 2015. [13] C. Fornell, and D.F. Larcker. \"Evaluating structural equation models with unobservable variables and measurement error.\" Journal of marketing research, 39-50, 1981. [14] W. Baker, M. Goudie, A. Hutton, C. D. Hylender, J. Niemantsverdriet, C. Novak and P. Tippett. \"Data breach investigations report.\" Verizon RISK Team. 2011. [15] K. Williams and N. Guerra. \"Prevalence and predictors of Internet bullying.\" Journal of Adolescent Health 41(6), S14S21, 2007. [16] W. M. Al-Rahmi, M.S. Othman and L.M. Yusuf. \"Effect of Engagement and Collaborative Learning on Satisfaction Through the use of Social Media on Malaysian Higher Research Journal of Applied Sciences, Education, Engineering and Technology 9, 12, 1132-1142, 2015. [17] M. Tezer. \"Cyber Bullying and University Students: International Behaviours, Opinions and Reactions.\" Journal of Educational Sciences, 19:2-3, 199-204, 2017. [18] B. Henson, B. W. Reyns, and B. S. Fisher. \"Fear of crime online? Examining the effect of risk, previous victimization, and exposure on fear of online.\" p. 475e497, 2013, doi: 10.1177/1043986213507403. [19] R. S. Tokunaga. \"Following you home from school: A critical review and synthesis of research on cyberbullying victimization.\" Comput. Hum. [20] J. L. Abrantes, C. Seabra, and L. F. Lages. \"Pedagogical affect, student behavior, and engagement in the classroom: Affective learning as an emerging area of educational technology research.\" Computers in Human Behavior, 37, 347– 353, 2014. [21] F. D. Davis. \"Perceived usefulness, perceived ease of use, and user acceptance of information technology.\" MIS Quarterly, 319-340, 1989. [22] M. Norliza et al. \"Women participation in business: A focus on franchising venture.\" Dept. Inf. Syst., Univ. Teknologi Malaysia, Malaysia, Tech. Rep. 104, Dec. 2006. [23] W. M. Al-Rahmi et al. \"Use of E-learning by University Students in Malaysian higher educational institutions: A case in Universiti Teknologi.\" [24] P. Cohen, S. G. West, and L. S. Aiken. \"Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences.\" New York, NY, USA: Psychology Press, 2014. [25] R. Junco and S. R. Cotten. \"No A 4 U: The relationship between multitasking and academic performance.\" Computers & Education, 59(2), 505–514, Sep. 2012. [26] W. Tariq, M. Mehboob, M. Khan, and F. Ullah. \"The impact of social media and social networks on education and students of Pakistan.\" Int. J. Educ. Dev. using Inf. Commun. Technol., 10(1), 27–36, 2014. [27] J. Pennington, R. Socher, and C. Manning. \"Glove: Global vectors for word representation.\" In Proc. Conf. Empirical Methods Natural Lang. Process, 1532–1543, 2014. [28] S. Shayaa et al. \"Sentiment analysis of big data: Methods, applications, and open challenges.\" IEEE Access, 6, 37807– 37827, 2018. [29] C. Junyi, S. Yan, and K.-C. Wong. \"Verbal aggression detection on Twitter comments: Convolutional neural network for short-text.\" Neural Comput. Appl., 32(17), 12649–12662, 2018.

Copyright

Copyright © 2024 K. Subha, D. Perarivalan, S. Santhosh, S. Sanjay. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET61503

Publish Date : 2024-05-02

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here