Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Anvay Akhil Palherkar , Aditya Singh, Satyam Sahib Sharma, Udipt Srivastava, Dr. Mohammed Tajuddin
DOI Link: https://doi.org/10.22214/ijraset.2023.49232
Certificate: View Certificate
In order to make an approximation that is as accurate as feasible, sentiment analysis or opinion analysis is essential. Given that thoughtfully designed and carried out sentiment analysis can produce better and more accurate projections in both politics and business, this is a very essential component. At its most fundamental, feeling Analysis is based on the views that users and individuals share or express.There is a vast amount of material posted and exchanged over the internet every day by the users on various platforms. Understanding how different products, services, political figures, businesses, governments, and other entities are considered and perceived could be gained by being able to find the pattern in such data. This can handle a range of challenges, including being more reliable.Although we have numerous methods for sentiment analysis, a successful plan for regularly extracting and producing reliable sentiment analysis needs to be constructed. Despite the fact that machine learning algorithms have significantly improved—Naive Bayes, Support Vector Machine, and Maximum Entropy are the three that stand out as being particularly popular techniques for analysis, including good and bad views,is very much under study.
I. INTRODUCTION
Sentiment analysis divides and separates different emotions in a text.Tweets have a massive data that can be categorized in different ways. These stats allows us to conclude people’s view and opinions. Therefore, we need to create an automated machine learning sentiment analysis model. Modeling them is difficult because they have both useful and unhelpful qualities
We created a machine learning pipeline to assess Twitter sentiment utilizing machine learning, the sentiment of the tweets provided from the dataset, and three classifiers (Logistic Regression, Bernoulli Naive Bayes, and SVM) in addition to Term Frequency-Inverse Document Frequency (TF-IDF). The accuracy and F1 Scores of these classifiers are then used to evaluate their efficacy.
In this study, we try to apply an NLP Twitter sentiment analysis model that helps in overcoming the challenges of figuring out the feelings of the tweets. For the dataset utilized in the Twitter sentiment analysis project, the following details are required:
The 1,600,000 tweets in the Sentiment140 Dataset, were obtained by using the Twitter API. The several columns in the dataset include:
II. SENTIMENT CLASSIFICATION METHODS
It includes approaches which make use of well-known machine learning (ML) methods as well as include linguistic features[5].The second step involves doing research utilizing a sentiment lexicon, a corpus of pre-compiled sentiment expressions. And further it is broken down into corpus-based techniques which will employ statistical or semantic techniques to determine the frequency of sentiment polarity.With both machine-learning and the hybrid strategy makes use of lexicon-based methodologies. The purpose of the following picture is to give an overview of some of the more well-liked sentiment categorization methods.
A. Use of Machine Learning
Machine learning techniques such as SA are conducted nearly completely and in great detail using machine learning (ML) algorithms[6,7,8,9,].In addition to linguistic and syntactic properties.
B. A Supervised Learning
Supervised Learning is a type of learning with labeled datasets.Such datasets are passed to supervised learning models to predict further[6,7,10].
With supervised learning, material that has been annotated is automatically categorized into specified classifications.[12] Khan et al. (2010) went into great length about a few of the supervised learning techniques. Here it discusses decision tree algorithms, Naive Bayes , K-means,, neural networks, support vector machines are examples of machine learning algorithms. According to [13] Medhat, Hassan, and Korashy (2014), supervised learning methods can also be categorized as Decision trees, linear classifiers, probabilistic classifiers, and classifiers based on rules. Probability classifier are based on mixture models, where the probability of sampling is specified for certain keywords. The Naïve Bayes classifier, Maximum_ Entropy, and Multidimensional algorithms are a few of these classifiers.The usage of these classifiers for sentiment classification is fairly widespread.The benchmark findings provided by these classifiers for sentiment classification [14, 16, 18]; The training data is divided using decision tree classifiers, which create a hierarchical structure in which a leaf node will display the classification outcome [12]. There are various methods like - ID3, C3, and C5 algorithms as well as several spanning tree and graph-based methods.Refer to these studies that employed decision tree algorithms for sentiment categorization for a better introduction[21].
Rule-based classifiers specify a collection of guidelines that are adhered to in accordance with the associations between the items. A rule is represented as a condition on one end, depending on the feature set and term presence, and the class label on the other. Refer to [24] [22] . The TF-IDF is the foundation of linear classifiers, which are used to categorize data. SVM, neural networks, artificial NN, and other linear classification methods are examples [25, 27, 28, 29].
C. Decision Tree Classifiers
The trained data space is broken down hierarchically using this kind of classifier, and attribute values are employed to separate the data. The technique continues for N number of records and when they are linked with the lead nodes that are making use of classification, depending on whether one or more words are present or absent.[10].
D. Naïve Bayes Classifier (NB)
To categorize text documents and perform SA on these types of materials, this classification method is fairly universally employed. The method, which is based on a probabilistic approach, approximates the probability of a particular group using the cooperative probabilities of individual phrases as input in a text document.[11]
E. Maximum Entropy Classifier (ME)
It is a kind of probabilistic classifier that belongs to the exponential family of models and does not rely on the idea that its constituent parts are independent. Conversely, ME is reliant on the Principle of Maximum Entropy. The model with the highest entropy is chosen. Applications for ME classifiers include dialect identification, assumption
research, point arrangement, etc.
III. TOOLS
Massive amounts of heterogeneous data are present on social media, and to undertake analysis, there are specialized tools that can handle massive amounts of text data. On social media data, a variety of tools can do sentiment analysis, and different tools are needed to visualize social network data in real time. Numerous for-profit companies offer sentiment analysis tools that are aimed at analyzing reviews of customer on various products.
We have many tools available for free, including, MATLAB, and Python-NLTK. They offer a wide range of functionality for using machine learning and data analytics.
To keep track of data in real time and gauge emotion behind the text , sentiment visualization tools are employed. They display the structure and their properties.IN-Spire is one of the tools for sentiment visualization. such as Pulse, VISA, TIARA, etc. Table 7 lists the lexical resources and mash-up websites which we can use for sentiment analysis and also includes some visualization tools.
IV. LITERATURE SURVEY
[15] : Here, a group or a collection of tweets written by many end-users —was constructed and were used to tell the user about the behavior that mostly focuses on sentiment analysis. Limitations: Data was acquired from the social sites to determine the person's activity. A website might be developed where people can type the Twitter search keyword.
[17] : We will be using the real and reversed data in pairs i.e. the dual training (DT) and dual prediction (DP) algorithms, respectively.The probability is maximum in case of DT for real and inverted data.We take both sides for review in case of DP. In other words, both the good and bad parts of the first assessment as well as the good and bad parts of the reversed assessment are considered.
[19] : The goal of this research is to outline some of these problems and offer some suggestions that will benefit the researchers and practitioners. Will explore general facets of the Arabic language. There are millions of native arabic people in the world.Additionally, 1.4 billion Muslims use this language for their regular prayers. Morphology is crucial in the highly structured and derivational language of Arabic.
[20] : Practical usefulness for classification has been shown by SVM method. Face detection, text categorization, and bioinformatics are just a few of the areas where SVMs have been successfully applied [32]. They become more used as tools for data analysis. Despite having well-known traits, SVM is challenging to apply to the issue of handling massive data.Computational cost in case of svm is at least square of the number of data points.
Scaling up learning methods is necessary for large data.
The SVM algorithm does not do well with large data sets. SVM performs poorly when the target classes overlap and the data set contains more noise.
[26] : The suggested method extracts features from tweets using both the methods.Supervised learning is used for the sentiment analysis, and utilizing the retrieved features, we train a number of classifiers.Designing and implementing a real-time system architecture in Storm involves feature extraction and classification tasks, which scale well in terms of input data size and data arrival rate, but are not critical steps. We use experimental evaluations to demonstrate the advantages of the proposed system, showing advantages in terms of efficiency, scalability and classification accuracy.
[5]: Sentiment analysis can handle a wide range of challenges, such as reliablity issues, binary classification issues, data issues, and polarity.Although numerous methods are created and proposed finding the emotions behind the text, a successful plan for regularly extracting and producing reliable sentiment analysis needs to be constructed. Despite the significant advancements in machine learning algorithms—Naive Bayes, Support Vector Machine, and Maximum Entropy are the three that stand out as being particularly prevalent in research —sentiment classification by category, including positive and negative sentiments, is still a research interest.. This article surveys popular sentiment analysis approaches and procedures in an effort to provide a clear evaluation report with supporting evidence.
[13]: In the field of sentiment analysis, text mining research is ongoing (SA). The subjectivity, emotions, and viewpoints of a text are handled algorithmically by SA. This survey study takes on a thorough analysis of the most recent advancement in this topic. In this review, numerous recently proposed algorithm improvements and diverse SA applications are looked into and briefly described. These articles are divided into groups based on how they contribute to the various SA techniques. The recent interest of researchers in the SA-related domains of transfer learning, emotion recognition, and resource building is explored.
[7]: The goal of sentiment analysis is to uncover any subjectivity, opinions, or feelings in the text. There are different ways of carrying out sentiment analysis using vocabulary based techniques and machine learning algorithms. In order to provide context, research works on sentiment analysis using machine learning are addressed in this article; (i) they are categorized according to the tasks they perform in terms of information extraction; and (ii) the difficulties that have been encountered and those that may arise in relation to this research topic are reviewed and discussed.
[8]: Utilizing sentiment analysis on more than 1000 Facebook posts on newscasts, this study contrasts the sentiment towards Rai, the Italian public broadcasting service, and La7, a young and more vibrant commercial enterprise. The findings of this study are contrasted with those of the Osservatorio di Pavia, an Italian research centre that examines political communication in the media and focuses on media analysis at both the theoretical and empirical levels. This study takes into account Auditel's statistics on the size of broadcast audiences and merges quantitative data from the public domain with social media analysis, notably Facebook analysis.
[9] : The supervised classifier must classify each tweet as "positive," "negative," or "neutral" in order to solve the sentiment analysis problem, which is stated as a multi-class classification task. In total, a Support Vector Machine (SVM)[34] is trained using the training tweets. In our tests, we make use of an L1 regularization and a linear kernel. To choose the C parameter, cross validation is employed. As mentioned before, emoticons are no longer present in tweets that are utilized as input for the SVM.
[35] : One of the most crucial areas of analysis that makes use of the vast amount of social data as well as its value in terms of its unprocessed and diverse character is sentiment analysis. The handling of such massive amounts of data when conducting analytics and making forecasts creates questions. To understand and convey the emotions portrayed in the text as well as to create predictions to address these issues, sentiment analysis uses text analytics, natural language processing, and other computational techniques to extract, preprocess, and detect subjective data. Social media usage and analytics are growing exponentially, and thus raises security and vulnerability analyses of such data, including the leak of confidential material, provenance and trust issues, and fraud and spam.
Social media security and analysis heavily rely on sentiment analysis. It has been used extremely successfully to analyze social media content, identify numerous security aspects, and aid in the development of workable solutions. The role of sentiment analysis in identifying hostile actors, spammers, and online fraudsters is a key focus of this article. It also covers topics like data provenance, social media mistrust, e-commerce security, event prediction for disaster assistance, risk assessment, and other social media security solutions. Social media sentiment analysis is frequently used to examine user behavior and their interactions with other users.
[36]: HaterNet, an intelligence tool that monitors and recognises hate speech on Twitter, is now used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security. This study has many contributions, some of which are as follows: (1) It introduces the first intelligent system that uses social network analysis techniques to track and broadcast hate speech in social media. (2) It provides a brand-new dataset on hate speech in Spanish that is freely accessible and is made up of 6000 tweets that have been expertly categorised. (3) Using several document representation methods and text classification models, it compares various classification methodologies (4). The best method combines an LTSM+MLP neural network with the word, emoji, and expression token embeddings from tweets that have been enhanced using the tf-idf. This approach outperforms other tactics previously addressed in the literature, with an area under the curve (AUC) of 0.828 on our dataset.
Hate crimes are a specific category of legal infractions where the victims' perceived victimhood is the main driving force. This occurs when the perpetrator chooses his or her victims based on their affiliation with a specific group that is predominantly characterized by the traits mentioned before. There is proof that certain highly publicised events, such as terrorist attacks, unchecked migration, protests, riots, etc., have an impact on hate crimes [37]. These situations frequently serve as triggers, and inside SM, their impact is greatly heightened. As a result, SM functions as a sensor in the actual world [38] and a significant resource for crime predicting [39]. Social media platforms are actually flooded with posts from users calling for the punishment of various targeted groups. After a trigger event, these signals can be accumulated over time and used to study hate crimes in all aspects [37]—climbing, stabilizing, duration, and fall of the threat. The analysis, prognostication, and identification of hate crimes thus depend heavily on monitoring social media.
[40] : These "bots" post content that can be either beneficial (such as recent news articles or PSAs) or harmful (such as spam or phishing links). Such harmful Twitter bots have grown to be an annoyance, recently inspiring a lengthy tirade in The New Yorker [1].
They allegedly contribute to the public's perception of political candidates being distorted. For instance, the website botornet.net asserts that Newt Gingrich, a past candidate for the US presidency, obtained more than a million Twitter followers by deploying bots, a claim Mr. Gingrich apparently denied. Additionally, Hill [2] states that "up to 29.9% of Barack Obama's [Twitter] followers and 21.9% of Mitt Romney's followers may be fake." 83 million of Facebook's users, in the company's opinion, are fake.
In conclusion, it is now widely accepted that a sizable portion of social media is made up of bots, many of which have malicious intentions.
We covered information extraction as well as preprocessing methods for tweets from Twitter. Additionally, we researched Support Vector Machine for text categorization, a supervised learning method that may be utilized to determine the polarity of textual tweets. in studies.We can draw the conclusion that SVM recognizes some characteristics of text, such as high dimension. Various findings demonstrate that SVM performs well on text categorization when compared to ANN. SVM reduces the requirement for feature selection because of its capacity to generalize high dimensional feature space.Tweets polarity is used as Twitter sentiment analysis.It is then passed to the machine learning model to train to and then test with the same, allowing us to employ this model going forward in accordance with the findings. It entails actions including gathering data, text preparation, sentiment categorization, sentiment detection, model training, and testing. But the dimension of data diversification is still missing. Other application difficulties are also consequences of the language and acronyms used. Performance of analyzers is suffering as the number of classes rises. Additionally, the accuracy of models are not tested yet. As a result, sentiment analysis has a highly promising future.Finding the most effective method for recognising feelings in Twitter data proved to be one of the biggest challenges, as comparing different methods is a very tough undertaking when there are no established benchmarks.Future research would be interesting in examining how sentiment analysis algorithms perform for a particular feature. In other words, it was discovered that integrating different features usually improved performance, but in certain situations had mixed results. Therefore, a fascinating work has to be done to help the performance limitations. Another option may be to look into the problem of data sparsity utilizing both ensemble and hybrid methods. The goal of this is to assess how well different Twitter sentiment techniques cope with data scarcity
[1] W. Medhat, A. Hassan, and H. Korashy, ?Sentiment analysis algorithms and applications: A survey?, Ain Shams engineering journal, Vol. 5, No. 4, pp. 1093-1113, December 2014, doi:10.1016/j.asej.2014.04.011. [2] X. Fang, and J. Zhan, ?Sentiment analysis using product review data?, Journal of Big Data, Vol. 2, No.1, p.5, June 2015, doi: 10.1186/s40537-015-0015-2. [3] H. Kang, S.J. Yoo, and D. Han, ?Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews?, Expert Systems with Applications, Vol. 39, No. 5, pp. 6000-6010, 2012 [4] Batrinca, and P.C. Treleaven, ?Social media analytics:a survey of techniques, tools and platforms?, Ai & Society, Vol. 30, No. 1, pp. 89-116, 2015. [5] INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 05, MAY 2020 ISSN 2277-8616166 IJSER©2020 Raktim Kumar Dey, Debabrata Sarddar, Indranil Sarkar, Rajesh Bose, Sandip Roy:A Literature Survey On Sentiment Analysis Techniques Involving Social Media And Online Platforms [6] W. Med Hat , Ahmed Hassan ,Hoda Korashy?Sentiment analysis algorithms and applications: A surve?, Ain Sham University, Faculty of Engineering,Computer & Systems Department, Egypt 19, Vol. 5, No. 4, pp. 1093-1113, December 2014. [7] E. Erdogan, and M. A. Akyol, ?A Comprehensive Survey for Sentiment Analysis Tasks Using Machine Learning Techniques?, In proc. 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), IEEE, Sinaia, Romania, August 2016, doi: 10.1109/INISTA.2016.7571856. [8] F. Neri, C. Aliprandi, F. Capeci, M. Cuadros, T. By,?Sentiment Analysis on Social Media?, In Proceedings of 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, August 2012,doi: 10.1109/ASONAM.2012.164. [9] P. Chikersal, S. Poria, and E. Cambria ?SeNTU: Sentiment Analysis of Tweets by Combining a Rule based Classifier with Supervised Learning?, In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June 2015, pp. 647–651. [10] L. I. Tan, W.S. Phang, K.O. Chain, ?Rule-based Sentiment Analysis for Financial News?, In Proc.Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference, Kowloon, China, January 2016, doi: 10.1109/SMC35812.2015. [11] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M.Stede, ?Lexicon-based methods for sentiment analysis?, Computational linguistics, Vol. 37, No. 2, pp. 267-307, 2011. [12] Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of Advances in Information Technology, 1(1), 4–20. [13] Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5 (4), 1093–1113. [14] Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77–128). Springer US. [15] Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021] IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9 Real Time Sentiment Analysis on Twitter SAI MADHU K, computer science and engineer (KLU), DAMARUKANADHAN CH, computer science and engineer(KLU), CHAKRADHAR REDDY B, computer science and engineer (KLU), POLIREDDY M, computer science and engineer(KLU), [16] Kim, S.-B., Rim, H.-C., Yook, D., & Lim, H.-S. (2002). Effective methods for improving naive bayes text classifiers. In the Pacific rim international conference on artificial intelligence (pp. 414–423). [17] RuiXia,Feng,ChengqingZong,QianmuLi,YongQi, Tao Li, “Dual Sentiment Analysis: Considering Two Sides of One Review, “IEEE TransactionsOnKnowledgeAndData Engineering, vol. 27, No. 8, August 2015. [18] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing – Volume 10 (pp. 79–86). [19] Farghaly, A., Shaalan, K. (2009). Arabic NLP: Challenges and Solutions. ACM Transactions on Asian Language Information Processing (TALIP), ACM, 8(4), 1-22. [20] Thanh-Nghi Do, Franc?ois Poulet,”Parallel Learning of Local SVM Algorithms for Classifying Large Datasets”, December, 2016 . [21] Kim, J. W., Lee, B. H., Shaw, M. J., Chang, H.-L., & Nelson, M. (2001). Application of decision-tree induction techniques to personalized advertisements on internet storefronts. International Journal of Electronic Commerce, 5(3), 45–62. [22] Mukund, S., Ghosh, D., & Srihari, R. K. (2011). Using sequence kernels to identify opinion entities in Urdu. In Proceedings of the fifteenth conference on computational natural language learning (pp. 58–67). [23] Medhat, W., Yousef, A. H., & Mohamed, H. K. (2014). Combined algorithm for data mining using association rules. arXiv preprint arX iv:1410.1343. [24] Apté, C., Damerau, F., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS), 12(3), 233–251. [25] Chakrabarti, S., Roy, S., & Soundalgekar, M. V. (2003). Fast and accurate text classification via multiple linear discriminant projections. The VLDB Journal, 12(2), 170–185. [26] Scalable and Real-time Sentiment Analysis of Twitter Data Maria Karanasou, Anneta Ampla, Christos Doulkeridis and Maria Halkidi Department of Digital Systems, School of Information and Communication Technologies University of Piraeus, Piraeus, Greece [27] Ku, L.-W., Lee, C.-Y., & Chen, H.-H. (2009). Identification of opinion holders. International Journal of Computational Linguistics & Chinese Language Processing, 14(4), 383–402. [28] Ruiz, M. E., & Srinivasan, P. (1999). Hierarchical neural networks for text categorization. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 281–282). [29] Li, Y.-M., & Li, T.-Y. (2013). Deriving market intelligence from microblogs. Decision Support Systems, 55(1), 206–217]. [30] Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55–60). [31] Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995) [32] Guyon, I.: Web page on SVM applications (1999). http://www.clopinet.com/ isabelle/Projects/-SVM/app-list.html [33] Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Sch¨olkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208 (1999) [34] Cortes, Corinna, and Vladimir Vapnik. \"Support-vector networks.\" Machine learning 20.3 (1995): 273-297. [35] Sharma, Sanur, and Anurag Jain. \"Role of sentiment analysis in social media security and analytics.\" Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10.5 (2020): e1366. [36] Pereira-Kohatsu, J. C., Quijano-Sánchez, L., Liberatore, F., & Camacho-Collados, M. (2019). Detecting and monitoring hate speech on Twitter. Sensors, 19(21), 4654 [37] Downs, A. 2.1. Up and Down with Ecology: The” Issue-Attention Cycle. In The Politics of American Economic Policy Making; Peretz, P., Eds; M.E. Shape, Inc: Armonk, NY, USA, 1996; Volume 48. [38] Sui, X.; Chen, Z.;Wu, K.; Ren, P.; Ma, J.; Zhou, F. Social media as sensor in real world: Geolocate user with microblog. In Natural Language Processing and Chinese Computing; Springer: Berlin/Heidelberg, Germany, 2014; pp. 229–237. [39] Scanlon, J.R.; Gerber, M.S. Forecasting violent extremist cyber recruitment. IEEE Trans. Inf. Forensics Secur. 2015, 10, 2461–2470. [40] Dickerson, J. P., Kagan, V., & Subrahmanian, V. S. (2014, August). Using sentiment to detect bots on twitter: Are humans more opinionated than bots?. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) (pp. 620-627). IEEE.
Copyright © 2023 Anvay Akhil Palherkar , Aditya Singh, Satyam Sahib Sharma, Udipt Srivastava, Dr. Mohammed Tajuddin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET49232
Publish Date : 2023-02-24
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here