Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Anurag Rathore, Anmol Kumar, Aman Negi, Nidhi Chandra
DOI Link: https://doi.org/10.22214/ijraset.2023.51463
Certificate: View Certificate
recognition delivers many different kinds of pros and it\'s been uses in a multiple field. Having a different type of language placed a restriction of talking between people. From this project we will going to create and develop to supports different systems that allows persons in that place or situation to change of data through interacting with end device users by voice or speech, after developing this project we will destroy the barriers of communication. This project takes that into consideration and makes an attempt to guarantee that it can identify speech and transform audio input into text. The speech is converted into text format. To overcome the offensive content in real-time every social media platform should implement an effectual hate speech detection system. There are many ways from that we can classify hate speech such as Machine Learning, Rule Based, Deep Learning Based and Hybrid.
I. INTRODUCTION
Hate Speech recognition is proliferating(spreading) the data in the internet growing. So, we will do identification and do an investigation upon the problems that would be faced in auto-mated online tools or applications for the reason of text based hate speech identifications. The nuances of languages, different definitions of what constitute hate speech, and the limitations of the data present for the purpose of training and testing reasons these devices are some of the difficulties. In modern days many methods have interpretability issues, making difficulty for understanding why the system chooses a chosen course of action. . We suggest a multi-view SVM method that, while being simpler and giving more clearly understandable results than neural approaches.
Dismally, the hate crimes are not that new part to our culture. On social media and some others platforms that uses for virtual kind of communications are now playing an increasingly more precious role in field of hate crimes. Because of this, it has been suggested that social media plays a role in the radicalization of some suspects in recent terrorist incidents with a strong racial or religious undertone.
AS such recent terrorist attacks have suggested that social media or online platform for virtual communication been playing a role in the radicalization of some suspects with hard racial or religious affiliations. A one video that went viral in which this kind of incident happen in NZ and it was live streaming on face book platform.
The users can have a view or can express themselves independently and being anonymous on a extensive array of virtual communication platforms that includes Social media. The ability for express oneself independently is a human rights to be cherished, but inciting and spreading hate against another groups is an abuse for that freedoms.
The reason of this project was very important it's because this gives use people from different cultures, nationalities and languages so that they can share their own thoughts between among others. It removes the barriers among the people developed due to the language. Language translation system or technology has been brought us from being several nationalities together and has made great part of humanity in between people. It also simulates economic activity and has a vast impact on the society. Identifiaction of hate speech is a very difficult tasks and having some stages for that process. There are lot of unconvincing things in that data, so anything which is closest to that data need to be preprocessing. Various classifications of algorithms thenafter, recognize of abusive or hate speech in data. There are many different machine learning(ML) algorithms for detecting hate Speech, and also each algorithms is suitable for diverse scenarios
In this project we aim at developing an algorithm which would take various texts as input and check for the presence of hate speech in them. If there is any presence of hate speech in those text it would label that text as hate speech and if not then it would label it as a clean text. It would also be capable enough to identify the targeted victim of that hate speech (for ex. Hate against any of these caste, creed, gender, gender, nationality or religion). Although, while undergoing this project we discovered that hate speech is very indistinctive and has a lot of discriminative properties which makes it’s dataset to be difficult to detect in the long run.
II. LITERATURE REVIEW
A lot of research has been conducted for the domain of video surveillance, object detection and machine learning. Some of the most notable work relevant to this project has been mentioned below
III. METHODOLOGIES
Using Twitter Dataset
The converted text is compared with the Twitter Dataset and then the output will be either hate speech or offensive language
2. Phase 2
Speech To Text Translation
With the help of various machine(ml) and deep learning(dl) technologies we will change the speech to the text. After that we will apply our hate speech detection algorithms to that text to detect the presence of hate speech in that data.
3. Phase 3
Data Preprocessing
In this phase duplicity of data and noise from the data is removed. Data cleaning is method in Data-mining which is to applied to remove the noisy data.
4. Phase 4
Count Vectorizer
Count Vectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis).
5. Phase 5
Decision Tree Classifier
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
IV. RESULTS AND DISCUSSIONS
The Hate Speech detector is made with help of Twitter dataset and using this dataset we are able to check whether the speaker has used any hate speech offensive language.
In order to check through our detector, we need some important libraries and engines. We are using Jupyter we first convert our speech to text in order to do so we use two libraries
A. Speech Recognition
Pyttsx3 is a cross-platform speech (Mac OSX, Windows, and Linux) library. You can set voice metadata such as age, gender, id, language and name. The speech engine comes with a large amount of voices.
Due to the anonymity and mobility of such platforms as well as the shifting political landscape in many parts of the world, the spread of hate speech on social media has considerably increased in recent years. Despite significant effort by legislative authorities, law enforcement agencies, and social media firms, it is widely acknowledged that successful countermeasures rely on computerised semantic analysis of such information. The identification and classification of hate speech according to its targeting traits is a critical task in this direction and due to the numerous users who might compete with each other on Twitter, it is crucial for the success or ruin of one\'s image in today\'s social media. Examples of words with a negative connotation include those used in hate speech. Hate speech, which is included under Article 28 of the ITE Law, may be classified as having evil viewpoints. There are many people who, both knowingly and unknowingly, oppose hate speech on social media. Social media, sadly, lacks the capability to compile data from a discourse already in progress into a conclusion. Using text mining is one method for deriving conclusions from aggregate data. To categorise whether or not the sentence\'s text contains aspects of hate speech is the goal of this essay. In order for subsequent speech to be identified, the author of this research aims to develop a method for classifying hate speech elements in text using a computer. use the Multinomial Logistic Regression technique. The author thinks that after developing this programme, a computer will be able to detect and categorise hate speech in text posted on the social networking site Twitter. According to test findings, the average precision, recall, and accuracy were 80.02, 82%, and 87.68%, respectively.
Copyright © 2023 Anurag Rathore, Anmol Kumar, Aman Negi, Nidhi Chandra. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET51463
Publish Date : 2023-05-02
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here