CNN LSTM Hybrid Approach for Sentiment Analysis

Authors: Mansi Jain, Purvit Vashishtha, Aman Satyam, Smriti Sehgal

DOI Link: https://doi.org/10.22214/ijraset.2023.52191

Abstract

In recent years, one of the most popular study subjects has been sentiment analysis. It is employed to ascertain the text\'s actual intention. It is primarily interested in the processing and analysis of natural language data. The development of technology and the phenomenal rise of social media have produced a vast volume of confusing textual information. It\'s critical to examine the feelings that underlie such writings. Sentiment analysis reveals the core of irrational beliefs kept in enormous volumes of text. The primary objective is to get the computer to comprehend the backdrop of the data so that it may be divided into material that is good or bad. (i) Several machine learning models, including Naive Bayes, XGboost, Random Forest, LGB Machine, etc., are trained in this study. (ii) The implementation of the deep learning model Bi-LSTM, whose accuracy has showed promise. (iii) Bidirectional Encoder Representations from Transformers (BERT), a pre-trained language model that used an external Bi-LSTM model, was implemented. Then, a new approach of CNN-LSTM hybrid model is applied to IMDb dataset which performed better than all the models.

Introduction

I. INTRODUCTION

Nowadays, individuals want to make decisions depending upon recommendations to save time, whether they are purchasing a product or viewing a movie. Understanding client behavior is crucial for successful marketing. Companies have made it possible for customers to leave reviews in order to better understand the decisions made by their customers. But managing such a massive volume of data is a difficult process. Sentiment analysis is a wise approach to help resolve the question of whether the product achieves its goal or not. Besides the benefits that consumers gaining from this user-generated material, a large number of company sectors are also effectively utilizing this developing technology and are employing sentiment analysis to examine the preferences of their particular clients. It is crucial to understand the motivation underlying any text because of this. Figure 1 illustrates three different approaches to sentiment analysis: machine learning methods, deep learning methods like BERT and Bidirectional LSTM.\

A. Challenges

In sentiment analysis there are some considerable challenges which should be encountered to obtain the best result.

Majority of the data is written in English; however other languages are severely underrepresented. that gives a troublesome experience in analyzing and training the data. Relying on previously stored data might give a problem as there is a possibility that opinion of customers might get modified over the period of time.

While performing it with traditional machine learning algorithms, performance is somewhat not up to the mark because of its approach towards larger datasets. Performance of machine learning models on larger datasets is lower as compared to deep learning models.

B. Contributions of the Paper

This study proposes utilizing machine learning, deep learning, and pre-trained models to perform sentiment analysis and determining the most effective model from these approaches.

Machine learning models which are implemented in this paper are: Random Forest, LGB machine, XGboost, Naïve Bayes, Gradient Boosting and Decision Tree and accuracy and receiver operative characteristic scores are used to compare them.

Deep learning techniques are employed to develop a Bidirectional Long Short-Term Memory (LSTM) model, which utilizes a specialized type of Recurrent Neural Network (RNN). This variant, known as Bidirectional Long Short-Term Memory networks (BI-LSTM), has demonstrated remarkable capabilities and surpasses traditional machine learning models in addressing the issue of long-term dependencies.

BERT, a pretrained model, is also used which has performed best of all the models used in this paper.

Hybrid Model is applied to IMDb dataset which performed better than all the other models trained.

C. Approaches Considered in the Paper

Machine Learning Based Approach

Figure 2 illustrates the three different types of machine learning-based methodologies: supervised, unsupervised, and semi-supervised learning. In supervised learning, the papers are clearly labelled. Unsupervised learning uses text categorization to find texts that are neither categorized nor tagged [2]. Notwithstanding this, the aim of semi-supervised learning is just to train a set of data using both a big volume of unlabeled data and a minimal bit of labelled data [2]. Mostly in initial days of machine learning, the usage of sarcastic language created a lot of uncertainty since the technology couldn't really discern the meaning intended by the phrases. Negation detection was yet another obstacle that machine learning was unable to tackle. As was previously indicated, machine learning algorithms were unable to resolve all problems.

2. BERT Approach

BERT, which stands for Bidirectional Encoder Representations from Transformers, is an advanced deep learning technique used in the field of natural language processing (NLP). Created by Google AI Language, BERT is a model based on neural networks that utilizes the Transformer architecture to understand the contextual relationships among words in a given text dataset. Unlike conventional NLP models that process text in a unilateral, sequential manner, BERT is a bidirectional model that accounts for the entire input sentence or paragraph to generate context-sensitive word embeddings. BERT undergoes pre-training using massive volumes of textual data, followed by fine-tuning on particular natural language processing (NLP) assignments like text categorization, answering questions, and identifying named entities. BERT has attained impressive results on a diverse range of NLP benchmarks and has emerged as a prevalent choice for various NLP applications.

3. Bidirectional LSTM Model Approach

Bidirectional Long Short-Term Memory (LSTM) is a sophisticated neural network architecture that has proven to be highly effective in modeling sequential data. In contrast to conventional LSTMs that only consider past inputs, Bidirectional LSTMs leverage a dual-recurrent structure that takes into account both past and future context. By utilizing this bi-directional approach, Bidirectional LSTMs are capable of capturing long-range dependencies and intricate patterns in the input sequence. Bidirectional LSTMs employ a gating mechanism that regulates the flow of information and enables them to selectively remember or forget specific information based on its relevance to the task at hand. This results in a robust and dynamic model that can efficiently process and interpret complex sequential data. Due to their unparalleled performance and versatility, Bidirectional LSTMs have become an indispensable tool for various applications in natural language processing, speech recognition, and time series forecasting.

4. CNN-LSTM Model

This hybrid model is trained on IMDb dataset with English and French language text because of its large available size of data. More size of data will provide more info to the model and therefore more generalized will be the model.

II. RELATED WORK

Depending on the dataset as well as the issue specification, there are several machine learning as well as cutting-edge deep learning techniques for processing natural language and performing sentiment analysis. There are several academic publications that have used various algorithms that built their customized algorithms via simply piling various classifiers and fine-tuning them properly.

Below is a discussion of a few of them:

In paper [2], Marta Fernandes introduced a unique technique for aiding triage medical professionals in patient stratification and identifying patients at increased risk of ICU admission. The data underwent stratified random sampling to divide it into training (70%) and testing (30%) sets. The model was then trained using 10-fold cross-validation. The logistic regression model outperformed the other two, along with random forests and a random under sampling boosting technique.

Within his research [3], Joshua Acosta used Google's Word2Vec model to do sentiment analysis on tweets mentioning US airline firms. These model's word embeddings are frequently utilized to understand nuance and generate high-dimensional vectors within a spatial context, which are subsequently classified using machine learning methodologies. Word2Vec performed the best in terms of accuracy (72%), followed by support vector machine and logistic regression machine learning models.

Word2Vec is just a vector-based encoding of phrases that retains semantic links amongst phrases like fundamental linear algebraic operations, according to studies by Arman S. Zharmagambetov [5]. In regards of computing performance, the aforementioned approach outperforms the other possibilities. With their investigation, they came to the conclusion that deep learning outperformed the Bag of Words model, which produced only marginally favorable solutions that could have been due to the amount of noise and word grouping inaccuracy that had developed during the pre-processing stage.

In her study [6], Monisha Kanakaraj used word-sense separation as well as semantic interpretation as natural language processing (NLP) techniques substantially increase reliability of classification. The collected linguistic data is subjected to ensemble - based processing in order to evaluate mood. Multimodal categorization combines the impact of various individual classifiers on a certain classification job. Because of the greater extent of variability used in estimating the partitions even during identification of feature vector subsets, tests revealed that perhaps the ensemble classifier outperforms traditional machine learning classifiers by just a factor of three to five percent.

In his article [7], Mr. Jeevan Anil Phand, utilized Twitter data to conduct sentiment analysis. In order to achieve this, tweets were first extracted and then categorized using Stanford NLP as either good, negative, or neutral. On some datasets, such as the India vs. Pakistan (Match), the Stanford NLP technique fared better when making predictions, with an accuracy of roughly 100% as opposed to 89% for the Amazon data.

In paper [4], to examine the emotions conveyed in movie reviews on IMDb, a LSTM classifier was employed, utilizing the Recurrent Neural Network (RNN) algorithm. The data was appropriately processed and divided after classification to enhance performance. The outcomes reveal a top classification accuracy of 89.9%, indicating that incorporating the suggested technique into current text-based sentiment analysis is a promising approach.

N Sriram, in paper [8], promoted recurrent neural network language-based model i.e., LSTM which has capability to retain and forget information because of logic gates used in its architecture. Sentiment analysis is performed on US Airline dataset having two classes: positive and negative. This paper focuses on only two classifications, positive and negative, and with enhanced sentiment analysis, a third class, neutral, can be introduced in the future.

Though deep learning models performed significantly well than machine learning models, they still have chances of overfitting of the model and bad performance on test or validation data. To avoid or overcome this obstacle, we move towards hybrid neural network models.

III. METHODOLOGY

A. Dataset and Preprocessing

Datasets which we are used were Amazon Reviews Dataset and IMDb Review Dataset. Both datasets were created by combining from different sources. Amazon Reviews Data consists of total 10000 English and German language reviews containing three feature columns: text, sentiment, and title. In this research, text is considered which contains full review of the product.

IMDb dataset contains total 75000 English and French language reviews from users on various movies which consists of two columns: review and sentiment. For both the datasets, it is binary classification problem because of two classes positive and negative i.e., 1 and 0 respectively.

Though reviews are of range between 1 to 5 stars for any product or movie, but the metadata provided with both datasets states that reviews are already compiled between two classes positive and negative.

Data Cleaning

The initial step in training a model involves data cleaning, which aims to eliminate redundant words and phrases from texts. The objective of this process is to enhance the machine learning model's performance by removing unnecessary elements from the data.

E.g.: Text in raw data- “#5 star is My review regarding the movie Titanic! which I watched @ hall/cinema.</p>”

The following items must be eliminated at this stage:

Punctuation: removed redundant punctuation. After this step-“5 star is My review regarding the movie Titanic which I watched hall cinema </p>”
HTML tags and emojis: removed html tags and certain emojis from text. After this step-“5 star is my review is regarding the movie titanic which i watched hall cinema”

After performing above steps, pre-processing is done on cleaned text data.

2. Text Pre-processing

Text Pre-processing step is also very crucial step in natural language processing as textual data is not recognized by machine learning model which is required to be transformed into numerical data. Some preprocessing steps are:

Lemmatization: Lemmatizer is used from nltk.stem.wordnet library to remove tenses from sentences. It is faster than stemming and is used when dataset size is large. After this step- “5 star review regarding movie titanic watch hall cinema.”
TF-IDF Vectorizer: TF-IDF (Term frequency- Inverse Document Frequency) is a mathematical technique in natural language processing that is applied on cleaned text columns after separating data into training and testing sets to tokenize and generate word frequency scores. TF-IDF Vectorizer takes an array input of corpus or text and assigns importance to unique words scaled by its importance across all documents or sentences in the corpus. Output from this is an array having values for each word relative to all the words present in the corpus or document.

B. Implementation using Machine Learning

Ensemble methods or Stacking model are meta-algorithms that integrate two or more machine learning approaches into a single predictive model to reduce variation, bias, or enhance predictions. This strategy outperforms a single model in terms of prediction performance. In this paper, top three machine learning models based on results after training on both the datasets are stacked based on ROC scores. After preprocessing and applying TF-IDF method on text column, this stacked model with three algorithms is trained on the training set and trained model is tested on testing data.

C. Implementation using Bidirectional Long Short-Term Memory

The Bidirectional LSTM (shown in Figure 8) is an enhanced version of the Gated Recurrent Neural Network model. It is built upon the concept of bidirectional RNN [10], which examines sequential data in both the forward and backward directions using separate hidden layers. The diagram visually demonstrates this setup. In Bidirectional LSTMs, the two hidden layers are connected to the same output layer. In several disciplines, it has been demonstrated that bidirectional networks outperform unidirectional networks.

D. Implementation using Bidirectional Encoder Representations from Transformers (BERT)

BERT is one of pre-trained language models which provides context to words that have been learnt previously from unannotated training set of data. There are many variants of BERT model and in this paper, Distil-BERT base model is utilized. Distil-BERT is a distilled version of BERT which is a small, light, and fast transformer model having less parameters than bert-base-uncased. It was pretrained with following motives:

Distillation Loss: The model was trained to produce the same results as the BERT basic model.
Masked language modelling (MLM): While considering a sentence, it covers 15% of the words in the input at random and then runs the full masked text through the model. This differs from standard recurrent neural networks (RNNs), which typically consider the words sequentially. It enables the model to learn a two-way representation of the sentence.

A random input is given to BERT model which first of all, generates contextualized embeddings through encoders. The embeddings generated can be used to represent the feature for that particular token which acts as an input for decoder that predicts possible classes based on the type of classification, whether it is binary or multiclass.

E. The Proposed Hybrid CNN-LSTM Model

The suggested hybrid model is detailed in depth in this section. IMDb review dataset is taken first of all because of large samples which can be used for better training and testing. Next, the first step of model training is performed i.e., data cleaning in which unnecessary punctuation is removed, converted uppercase letter to lower case letter, removed html tags and certain emojis from text, articles and conjunctions are also removed. After this, tokenization is performed which breaks the raw text into small parts. Embedding layer is used after the tokenizer, that converts each word into a fixed length vector of defined size. Next, batch normalization layer is used to achieve more stability through normalization of the layers, inputs are re-scaled and recentred. Then, a convolutional layer is utilized for feature extraction. The output of CNN layers acts as input for Max-Pooling layer which performs feature reduction. Then, LSTM layer is used to get a sequence output rather than a single valued output. After this, dense layer with ReLU activation function is added for generalization of output from LSTM layer and later, dense layer with sigmoid activation function is used to classify text messages according to our classification model which is binary i.e., 0 and 1.

Sigmoid function is a logistic function that ranges from 0 to 1, as defined by the formula given below-

Conclusion

In this paper, machine learning models such as Light Gradient Boosting Machine (LGBM), Naïve Bayes, Random Forest, Linear Support Vector Classifier (SVC), Decision tree, AdaBoost, Gradient Boosting and Extreme Gradient Boosting (XGBoost) are trained along with an Ensemble model for Light Gradient Boosting Machine, Linear Support Vector Classifier and Random Forest. The model based on ensembles outperformed all other predictive models. For diversity, two multilingual datasets i.e., Amazon review dataset and IMDb review dataset is taken. The Amazon dataset contains reviews in both English and German language while the IMDb dataset encompasses reviews in English and French. Bi-LSTM model, when trained on the IMDb review dataset, gave a good accuracy of 85.5% and 0.92 ROC score and the Amazon review dataset gave an accuracy of 77% and ROC score of 0.81. The BERT approach gave an Accuracy score of 0.86 and ROC Score of 0.935 on IMDb dataset while in the case of Amazon dataset, it gave an Accuracy Score of 0.79 and ROC Score of 0.86, which were better than the Bi-LSTM numbers. For the sake of identifying IMDb reviews, an integrated approach constructed using CNN and LSTM can be utilized. The findings from the experiment highlighted that our advocated deep learning hypothesis, particularly is built upon a mix of CNN and LSTM divisions, beat all other models with about 90% accuracy and a ROC score of 0.96, revealing the algorithm\'s outstanding effectiveness. The use of this method could enormously boost the opinions of others by differentiating constructive and negative feedback in order in order to better comprehend the preferences and interests of individuals from varied backgrounds, as well as assist build the link amongst consumers and enterprises. We haven’t applied the hybrid model on the Amazon dataset because of its small size. For future works, to obtain better accuracy, tuning and adding of certain methods like dropout to avoid overfitting to dataset and applying different optimizers like SGD, Adam etc. can also be done. The proposed model can also be trained on multi-class sentiment analysis problems with slight modification in the last layer i.e, instead of sigmoid function, SoftMax function can be used.

References

[1] E. Aydo?an and M. A. Akcayol, \"A comprehensive srvey for sentiment analysis tasks using machine learning techniques,\" 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 2016, pp. 1-7, doi: 10.1109/INISTA.2016.7571856. [2] Fernandes, Marta & Sun, Haoqi & Jain, Aayushee & Alabsi, Haitham & Brenner, Laura & Ye, Elissa & Ge, Wendong & Collens, Sarah & Leone, Michael & Das, Sudeshna & Robbins, Gregory & Mukerji, Shibani & Westover, M Brandon. (2020). Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing (Preprint). 10.2196/preprints.25457. [3] Acosta, Joshua, Norissa Lamaute, Mingxiao Luo, Ezra Finkelstein and Andreea Cotoranu. “Sentiment Analysis of Twitter Messages Using Word 2 Vec.” (2017). [4] S. M. Qaisar, \"Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory,\" 2020 2nd International Conference on Computer and Information Sciences (ICCIS), 2020, pp. 1-4, doi: 10.1109/ICCIS49240.2020.9257657. [5] A. S. Zharmagambetov and A. A. Pak, \"Sentiment analysis of a document using deep learning approach and decision trees,\" 2015 Twelve International Conference on Electronics Computer and Computation (ICECCO), 2015, pp. 1-4, doi: 10.1109/ICECCO.2015.7416902. [6] Kanakaraj, Monisha & Guddeti, Rammohana Reddy. (2015). Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques. Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing, IEEE ICSC 2015. 169-170. 10.1109/ICOSC.2015.7050801. [7] Phand, S.A., & Phand, J.A. (2017). Twitter sentiment classification using stanford NLP. 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM), 1-5. [8] R. Monika, S. Deivalakshmi and B. Janet, \"Sentiment Analysis of US Airlines Tweets Using LSTM/RNN,\" 2019 IEEE 9th International Conference on Advanced Computing (IACC), 2019, pp. 92-95, doi: 10.1109/IACC48062.2019.8971592. [9] P. Vateekul and T. Koomsubha, \"A study of sentiment analysis using deep learning techniques on Thai Twitter data,\" 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2016, pp. 1-6, doi: 10.1109/JCSSE.2016.7748849. [10] Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. (2020) Predicting Intensive Care Unit admission among patients presenting to the emergency department using machine learning and natural language processing. PLoS ONE 15(3): e0229331. https://doi.org/ 10.1371/journal.pone.0229331 [11] Szlosek, Donald A, and Jonathan Ferrett. “Using Machine Learning and Natural Language Processing Algorithms to Automate the Evaluation of Clinical Decision Support in Electronic Medical Record Systems.” EGEMS (Washington, DC) vol. 4,3 1222. 10 Aug. 2016, doi:10.13063/2327-9214.1222 [12] Abu Kwaik, Kathrein & Saad, Motaz & Chatzikyriakidis, Stergios & Dobnik, Simon. (2019). LSTM-CNN Deep Learning Model for Sentiment Analysis of Dialectal Arabic. 10.1007/978-3-030-32959-4_8. [13] Ghourabi, Abdallah, Mahmood A. Mahmood, and Qusay M. Alzubi. 2020. \"A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages\" Future Internet 12. [14] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.

Copyright

Copyright © 2023 Mansi Jain, Purvit Vashishtha, Aman Satyam, Smriti Sehgal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET52191

Publish Date : 2023-05-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here