Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Prof. Ravishankar Sir, Prathamesh P. Bhamanage, Pratik S. Patil, Suraj Y. Ahire, Durgesh P. Jadhav
DOI Link: https://doi.org/10.22214/ijraset.2022.47524
Certificate: View Certificate
Sentiment analysis is mainly concerned with identifying and classifying opinions or emotions that are expressed within a text. These days, sharing opinions and expressing emotions through social networking websites has become very common. This paper presents an idea of extracting sentiments out of the tweet and an approach towards classifying a tweet into positive, negative or neutral. This approach can be in many ways useful to any organization, who gets mentioned or tagged in a tweet. Generally the tweets being unstructured in format, first of all the tweet needs to be converted into the structured format. In this paper, tweets are resolved using pre-processing phase and access of tweets has been accomplished via libraries using Twitter API.. We provide additional comparisons and extract alternatives. exams, apprenticeships, etc., are compared to find higher overall performance, and several scoring criteria have been developed for different techniques.
I. INTRODUCTION
The age of the internet has changed the way people express their ideas. With the ever-increasing popularity of social networking, microblogging and blogging websites, a large amount of data is generated every day. These social networking websites depend largely on the user-generated content. Typically, when people intend to purchase a product, they browse through a lot of websites to gain some information about the products before they make their purchase. They take into consideration the available reviews and ratings of these products on these websites before making purchases. The amount of information is inordinate for a normal person to analyze it using naive techniques.
Thus, in order to make this process efficient and to automate it, several sentiment analysis techniques are used. Symbolic techniques or knowledge-based approach and machine learning techniques are usually used to develop such models. The objective of feeling investigation is recognizing content assumption extremity. Assumption examination could be taken as an order issue. Sentiment analysis is a process that isolates the content into positive, negative or neutral conclusion. Profound neural system and the Gaussian blend model is one of the hearty models for normal preparing language. Conviction analysis is useful for consumers who are trying to research a product or service, or marketers researching public opinion of their company/product. However, doing the analysis of tweets that express human emotions isn’t an easy job. A lot of challenges are involved in terms of tonality, polarity, lexicon and grammar of the Twitter sentiment analysis using bag of words tweets to avoid this circumstance and to increase the revenue of the college, the proposed research is undertaken.
II. LITERATURE REVIEW
2. Data Mining: There are large amounts of databases available with the increase in Information Technology. Data is present in huge amounts which comprise various fields. For the purpose of future decision makings, the data needs to be stored and manipulated. For this, various databases have been developed and research has been carried out for their management. The process of extraction of useful information and patterns from large amounts of stored data is known as data mining. Data mining is a method of examining large existing databases to generate new information. It is a process of processing large volumes of data (usually data stored in a database). It used to locate relationships between styles and information. Various types of data are being analyzed with the help of certain data mining tools.
III. METHODOLOGY
Twitter goes about as stubborn data bank with enormous measure of information accessible, utilized for conclusion analysis. Twitter is very convenient for research in light of the fact that there are enormous quantities of messages, many of which are freely accessible, and acquiring them is actually basic contrasted with scarping sites from the web. Twitter information is gathered for investigation utilizing Twitter API. Two broadly utilized methodologies utilized for the equivalent are Machine Learning and Dictionary Based methodology. We are utilizing Dictionary Based methodology for dissecting the notions of information posted by various clients. At that point extremity arrangement of this information is done. For example Tweets gathered after examinations are grouped into three classes as Positive, Negative and Neutral.
A. Techniques of Sentiment Analysis
The semantic concepts of entities extracted from tweets can be used to measure the overall correlation of a group of entities with a given sentiment polarity. Polarity refers to the most basic form, which is if a text or sentence is positive or negative. However, sentiment analysis has techniques in assigning polarity such as:
B. Application Programming Interface(API)
IV. RESULT AND DISCUSSION
A. Twitter Retrieved
To associate with Twitter API, developer need to agree in terms and conditions of development Twitter platform which has been provided to get an authorization to access a data. The output from this process will be saved in JSON file. The reason is, JSON (JavaScript Object Notation) is a lightweight data-interchange format which is easy for humans to write and read. Moreover, stated that, JSON is simple for machines to generate and parse. JSON is a text format that is totally language independent, but uses a convention that is known to programmers of the C-family of languages, including Python and many others. However, outputs size depends on the time for retrieving tweets from Twitter. Nevertheless, the output will be categorized into 2 forms, which are encoded and un-encoded. According to security issue for accessing a data, some of the output will be shown in an ID form such as string ID. Sentiment Analysis. The tweets will be assigned the value of each word, together with categorize into positive and negative word, according to lexicon dictionary. The result will be shown in .txt, .csv and html. Keyword extraction is difficult in Twitter due to misspellings and slang words. So to avoid this, a pre-processing step is performed before feature extraction. Pre-processing steps include removing URLs, avoiding misspellings and slang words. Misspellings are avoided by replacing repeated characters with 2 occurrences. Slang words contribute considerably to the emotion of a tweet. So they can’t be simply removed. Therefore, a slang word dictionary is maintained to replace slang words occurring in tweets with their associated meanings. Domain information contributes much to the formation of slang word dictionary. Also, we use a technique in which if the overall sentiment of the tweet is obtained, we will be able to find out the sentiment score of the new term by just looking at its relative position in the sentence.
B. Sentiment Analysis
Tweets from JSON file will be assigned the value of each word by matching with the lexicon dictionary. As a limitation of words in the lexicon dictionary which is not able to assign a value to every single word from tweets. However, as a scientific language of python, which is able to analyze a sense of each tweet into positive or negative for getting a result.
C. Information Presented
The result will be shown in a pie chart which is representing a percentage of positive, negative and null sentiment hash tags. For null hash tag is representing the hash tags that were assigned zero value. However, this program is able to list a top ten positive and negative hash tags.
D. Data Pre-Processing
Pre-processing is the progression of concentrated effort to the data from redundant elements. it increases the accuracy of the consequences by means of reducing mistakes inside the statistics. Pre-processing of data is one of the most important tasks that must be done before the dataset be able to be used for machine learning. The real-world statistics are incomplete and incompatible. So, it is necessary to be cleaned. Not by means of pre-processing, such as enchantment corrections, may lead the system to disregard important words. Preprocessing and concentrating the effort of data is one of the most important tasks that must be done before the dataset be able to be used for machine learning. The real-world statistics are strident, incomplete, and incompatible. So, it is necessary to be cleaned. It must be done before the dataset is able to be used for machine learning. The real-world statistics are improper and incompatible. So, it is necessary to be cleaned.
V. ACKNOWLEDGEMENT
The authors would like to acknowledge the support and guidance provided by management and guides of SKN Sinhgad Institute of Technology and Science, Lonavala for providing the necessary support and guidance in carrying out this work.
There are different Symbolic and machine learning techniques to identify sentiments from text. Machine Learning techniques are simpler and efficient than Symbolic techniques. A combination of these two techniques can be used to achieve an accuracy of 100%. In this paper we took the Sanders analytic dataset in order to analyze the tweets. After pre-processing the data we created the feature vector that is used for evaluating Twitter sentiments using Machine Learning techniques. As a result, program will be categorized sentiment into positive and negative, which is represented in a pie chart and html page Although, the program has been planned to be developed as a web application, due to limitation of Django which can only work on Linux server or LAMP. Thus, it cannot be realized. Therefore, further enhancement of this element is recommended in future study. Amongst the various algorithms available, KNN algorithm is used to increase the efficiency of sentiment analysis whereas Naïve Bayes for simple and efficient sentiment analysis by classifying the tweets as either positive, negative or zero. Whenever a tweet is fed for sentiment analysis, it goes through various phases of sentiment analysis. For analyzing a tweet it is very necessary to know the morph and elements of the tweet. Each of these components and phases of sentiment analysis are briefly described in this review paper.
[1] A. Sarlan, C. Nadam and S. Basri, \"Twitter sentiment analysis,\" Proceedings of the 6th International Conference on Information Technology and Multimedia, 2014. [2] C. Kariya and P. Khodke, \"Twitter Sentiment Analysis,\" 2020 International Conference for Emerging Technology (INCET), 2020.. [3] S. A. El Rahman, F. A. AlOtaibi and W. A. AlShehri, \"Sentiment Analysis of Twitter Data,\" 2019 International Conference on Computer and Information Sciences (ICCIS), 2019. [4] V. Pandya, A. Somthankar, S. S. Shrivastava and M. Patil, \"Twitter Sentiment Analysis using Machine Learning and Deep Learning Techniques,\" 2021 2nd International Conference on Communication, Computing and Industry 4.0 (C2I4), 2021. [5] A. Ikram, M. Kumar and G. Munjal, \"Twitter Sentiment Analysis using Machine Learning,\" 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2022. [6] J. F. Raisa, M. Ulfat, A. Al Mueed and S. M. S. Reza, \"A Review on Twitter Sentiment Analysis Approaches,\" 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 2021. [7] A. Roy and M. Ojha, \"Twitter sentiment analysis using deep learning models,\" 2020 IEEE 17th India Council International Conference (INDICON), 2020. [8] R. Wagh and P. Punde, \"Survey on Sentiment Analysis using Twitter Dataset,\" 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018. [9] V. Prakruthi, D. Sindhu and D. S. Anupama Kumar, \"Real Time Sentiment Analysis Of Twitter Posts,\" 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), 2018. [10] N. Yadav, O. Kudale, S. Gupta, A. Rao and A. Shitole, \"Twitter Sentiment Analysis Using Machine Learning For Product Evaluation,\" 2020 International Conference on Inventive Computation Technologies (ICICT), 2020.
Copyright © 2022 Prof. Ravishankar Sir, Prathamesh P. Bhamanage, Pratik S. Patil, Suraj Y. Ahire, Durgesh P. Jadhav. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET47524
Publish Date : 2022-11-18
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here