Social media is a platform where people from any corner of the world share their opinions with each other. Thus current era is called as social media era. Social networking sites such as Facebook and Twitter plays an important role in information retrieval and web data analysis. In a survey it is found that Twitter produces more than 500 million of tweets each day which is about 8 TB of data which can be mined and sentiments analysis can be carried out. The purpose of mining and exacting opinions is to discover and categorize the positive and negative sentiments of society. So as there is a huge repository of data, clustering is efficient and quick way to study people’s expression and get a conclusion. This paper surveys the different mining techniques to carry out opinions and sentiments analysis and represent it to the best way based on subjectivity and polarity.
Introduction
I. INTRODUCTION
Social media is a web based services that allows people to create public or private profile and provides communicating connect with people to share. Thus social media has gained a huge attention now a days. So, social media mining come out as best way to conclude billions of people thinking and get a reliable result out of it which makes it an excellent marketing tool. It can be used for many business related activities for example to analyze customer experiences, one need to recognize the opinions and sentiments of the customers based on which product review can be concluded. As social media data have three characteristics they are having:
Very large amount of data
Dynamic data
Noisy data
These Characteristics of data make research difficult on dat. Hence various data mining techniques are used which makes it possible to extract knowledge out of it. Social media mining techniques are as below:
a. Opinion Mining
b. Sentiment Analysis
c. Clustering
d. Text Mining
e. Web Mining
II. SOCIAL MEDIA MINING APPLICATIONS AND TASKS
Social Media Mining is an elaborative filed of a data mining task, which is used to representing, analyzing and identifying interesting actionable patterns from the raw social media data [1].
A. Social Media Mining Tasks
Social Media Mining performs following tasks:
Social Media Event Detection
Community Structure
Network Measures and Models
Social Search
Trust in Social Media
Sentiment Analysis in Social Media
Social Spammer Detection
Distrust and Negative Links
Role of Social Media in Crises
Location based Social Network Mining
Information Cascade
B. Social Media Mining Applications
Social media mining is performed by Social media analysts. Hence, this process is called Social Media Analytics. Social Media Mining/Analytics can be used in:
Business Development
Social Science Research
Health Service
Educational Purpose
Influence Marketing
Weather Forecasting
Fraud Detection
Financial Banking
Price Detection in Share Market
Criminal Investigation
III. RELATED WORK
In this paper, we have focused on social networking/media sites’ data like Twitter. Twitter is a social networking site which provides us a platform to post real time reactions and opinions about everything. The messages of this platform are short and often people do not maintain rules of grammar in their messages. That is why using the traditional methods for social media mining gives the poor results. Hence to get perfect result with better quality two methods are used:
Sentiment Analysis: Aim of this process is to mine opinions from raw data of social media at sentence and document level [1].
Text Classification: Aim of this process is grouping the similar text messages which make manageable information. It uses a framework that can a form cluster of similar texts which belong to the same topic and similar topics [1].
A. Sentiment Analysis
The process of sentiment analysis follows a specific structure which contains four stages [3]:
Data Collection Stage- Trending topics, topic definition and tweets are downloaded in this stage to generate a document.
Preprocessing Stage- In this stage some techniques are used to convert the raw social media data into an understandable form with best quality.
Data Modeling Stage- Documents which has been created from social media data in the first stage will be transformed into tokens.
Sentiment Analysis Stage- In this stage, tweets will be classified based on polarity scores.
B. Working of Sentiment Analysis
Sentiment Intensity Analzer()- It returns sentiment intensity scores to the sentences.
polarity_scores()- It returns a float value for sentiment strength on basis of the input text. Positive value refers positive valence, negative value refers negative valence and the remaining refers neutral valence.
The polarity scored words will be compared with a value that has a maximum and minimum range.
Then, in next step charts will be created to plot the sentiment based results.
To understand the sentiment analysis process easily we can take the example of trending technologies.
a. Keywords passed are: ‘Smartphone’, ‘Laptop’, ‘Tablet’.
b. Final result after Sentiment Analysis Process
???????C. Opinion Mining
In this paper, manually labelled data have been used as the training data to build a new model for Sentiment Classification process. The new created architecture is known as ‘Opinion Miner’ [2].
First, the tweets from Twitter will be collected and the preprocessing step will be applied on those tweets. Preprocessing is used to indentify various properties of messages that users have posted on the social networking platform. Tweets are used to hold an opinion of a user. In next step, these tweets will be extracted and then tweets will be classified in labeled classifiers [2].
Unique Properties of messages that users post on social networking platform are like [3]:
Usernames: It refers to the Twitter user names that users use to direct their messages. For this ‘@’ symbol will be used.
Hash Tags: Twitter facilitates the user to post the text/keywords with the use of hash tags, which maintains a specific form of “#<tagname>”.
RT (Re-Tweeting): If a tweet of someone is interesting enough for others then other people can do re-tweeting and Twitter uses RT to represent the re-tweeting.
For the pre-processing step some of the tweets will be eliminated like [4]:
a. Tweets that are not in English language.
b. Tweets that have very few words (Threshold value for length of tweet is set as five).
c. Tweets that have very few words apart from the greeting words.
d. Tweets that have just links/Uniform Resource Locations (URL).
???????D. Clustering
Clustering means grouping or separating particular set of objects having some similarities on its characteristics and aggregating them. This is the simpler and most effective way of handling large amount of data to get knowledge from it.
Clustering can be used in social media mining for analyzing and finding the textual similarities between the user contents. Main tasks of clustering in social media mining are:
Pre Processing: It includes tweets collection process from the Twitter. It also contains some sub tasks like
a. Data Extraction: Data Collection process
b. Stop Words Removal: Removing stop words like “the”, “and”, “a”, “as”, “about”, “at” etc.
c. Stemming of the Text: Reducing the inflected words like “talk”, “talked” and “talking” etc. This can be performed by Stemming algorithm.
d. Lexical Analysis: Nouns, verbs, adjectives and adverbs will be grouped as cognitive synonyms sets.
2. Clustering Techniques: Based on clustering techniques various algorithms are applied on set of objects which makes it easier to differentiate. Some of the algorithms used in clustering are:
a. Centroid Based Algorithm: In this algorithm every group is referred by a vector value.
b. Distributed Based Algorithm: This algorithm combines objects belonging to the same distribution.
c. Connectivity Based Algorithm: This algorithm has hierarchical representation based on the relation on distance between them.
d. Density Based Algorithm: This algorithm creates clusters based on the density. The idea is to expand cluster as long as it exceed the threshold value of neighboring clusters.
3. Experimental Results: This process will conclude the final result of analysis in the form of graphs or datasets.
Conclusion
In this paper, we have mined the raw data which has been collected from the microblogging sites like Twitter and FaceBook. On these raw data, techniques like Sentiment Analysis and Opinion Mining has been performed to classify the sentiments of users on the basis of their posts. By Sentiment Analysis the data like posts of users on Twitter and Facebook will be mined then Opinion Mining technique will be applied on that data to cluster the sentiments of users. The final result after Opinion Mining process will be shown as a graph of different sentiments for better understanding. Thus it becomes easy and efficient to conclude the interesting behaviour pattern of the society which can be very much useful for brands, companies, businesses and even an individual.
References
[1] Ardra, Blessy Merin Varughese, Merline Susan Joseph, Preethi Elsa Thomas, Sherly K K “Analyzing the Behavior of Youth to Sociality Using Social Media Mining” IEEE 2017.
[2] Donia Gamal, Marco Alfonse, EL-Sayed M. EL –Horbaty, Abdel Badeeh M.Salem “A Comparative Study on Opinion Mining Algorithms of Social Media Statuses” IEEE 2017.
[3] Po-Wei Liang, Bi-Ru Dai “Opinion Mining on Social Media Data” IEEE 2013.
[4] Andrei Pavel,Vasile Palade, Rahat Iqbal, Diana Hintea “Using Short URLs in Tweets to Improve Twitter Opinion Mining” IEEE 2017.
[5] Baocheng Huang, Guang YuS “Research on the mining of opinion community for social media based on sentiment analysis and regional distribution” IEEE 2016.
[6] Devendra K. Tayal, Sumit K. Yadav “Analysis of sentiments & polarity computation of opinions Sign In or Purchase” IEEE 2017.
[7] Shreya Ahuja, Gaurav Dubey “Clustering and sentiment analysis on Twitter data” IEEE 2017.
[8] Jainee Vora, Anu Mary Chacko “Sentiment analysis of tweets to identify the correlated factors that influence an issue of interest” IEEE 2017.
[9] Rajeshwari Dembala, S. Vagdevi “Conceptual notion for opinion mining from upcoming big data” IEEE 2017.
[10] Anitha Anandhan, Liyana Shuib “Social Media Recommender Systems: Review and Open Research Issues” IEEE 2018.