Facial Emotion Detection Using Deep Learning

Authors: K. Nivas, M. Rajesh Kumar, G. Suresh, T. Ramaswamy, Yerraboina Sreenivasulu

DOI Link: https://doi.org/10.22214/ijraset.2023.48585

Abstract

The use of machines to perform various tasks is ever increasing in society. By imbuing machines with perception, they will be able to perform a wide variety of tasks. There are also very complex ones, such as aged care. Machine perception requires the machine to understand the surrounding environment and the intentions of the interlocutor. Recognizing facial emotions can help in this regard. During the development of this work, deep learning techniques were used on images showing facial emotions such as happiness, sadness, anger, surprise, disgust, and fear. In this study, a pure convolutional neural network approach outperformed the results of other statistical methods obtained by other authors, including feature engineering. The use of convolutional networks includes a learning function. This looks very promising for this task where the functionality is not easy to define. Additionally, the network he was evaluated using two different corpora. One was used during network training and also helped tune parameters and define the network architecture. This corpus consisted of mimetic emotions. The network that yielded the highest classification accuracy results was tested on the second dataset. Although the network was trained on only one corpus, the network reported promising results when tested on another dataset showing non-real facial emotions. The results achieved did not correspond to the state of the art. Collected evidence indicates that deep learning may be suitable for facial expression classification. Deep learning therefore has the potential to improve human-machine interaction. Because the ability to learn functions allows machines to evolve cognition. And through perception, the machine could offer a smoother response, greatly improving the user\'s experience.

Introduction

I. INTRODUCTION

Automated emotion recognition is a large and important research area that addresses two distinct topics: psychological human emotion recognition and artificial intelligence (AI). Human emotional states can be obtained from verbal and non-verbal information captured by various sensors such as: B.: Facial changes, pitch, and physiological signals. In 1967, Mehrabian showed that his emotional information was 55% visual, 38% verbal, and 7% verbal. Facial changes during communication are the first signs that convey an emotional state. Therefore, most researchers are very interested in this modality. Extracting features from one face to another is a difficult and delicate task to achieve better classification. In 1978, Ekman and Friesen were among the first pantomime enthusiasts of scientific interest to develop the FACS (Facial Action Coding System), in which facial movements are described by the action unit AU. They dismantle the human face into his 46 AU action units. Encoded in one or more facial muscles. Automated FER is compared with other modalities by researchers according to statistics published by Philipp et al. Created is the most researched, but not an easy task as each person presents their emotions in their own way. There are several obstacles and challenges in this field that cannot be ignored, such as head posture, brightness, age, gender, background changes, occlusion issues due to sunglasses, scarves, skin diseases, etc. Traditional methods such as geometric and texture features are used for facial feature extraction, such as local binary pattern LBP, facial action unit FAC, local directional pattern LDA, and Gabor wavelets. Deep learning has become a very successful and efficient approach in recent years. B. Convolutional Neural Network CNN and Recurrent Neural Network RNN.So why did researchers start using this technique to detect human emotions? I've made some efforts. This article provides an overview of recent advances in emotion recognition through facial expression recognition using various deep learning architectures.

II. LITERATURE REVIEW MATERIALS AND METHODOLOGY

A. Literature Review

GJ Si et al. He proposed automatic extraction of facial features in his 2004 Head-Face Boundary. He took candidate facial features , denoised them, applied x- and y-projections to extract facial features such as eyes, nostrils and mouth. In some of the facial images, the jaw contrast is so low that the boundaries are completely unrecognizable.

2. Jones proposed fast multi-view face recognition in 2003 in Mitsubishi Electric Laboratory TR-20003-96.

3. A.T.Lopes proposed in 2017 that he uses convolutional neural networks to recognize facial expressions. Processing sparse data and arranging training patterns. In Darwin described it in his book The Expression of Emotions in Humans and Animals.

4. R.Jiang worked on pattern recognition in 2017. facial expression verification is widely used due to its wide application in affective computing, robot vision, human-machine interaction, and medical diagnosis. With the recent development of the Internet of Things (IoT), there is a need for targeted facial expression verification on mobile he devices. In this case, face encryption is proposed to protect privacy during image/video distribution over public networks.

B. Libraries and Terminologies

Tensorflow

TensorFlow (TF) is an open source machine learning software library written in Python and C++. There was a lot of media coverage. The main reason is that TF is developed by the Google Brain Team. Google already uses TF to improve some tasks across multiple products. These tasks include voice recognition in Google Now, search in Google Photos, and Smart Reply in Inbox by Gmail. Several design decisions in TF led to early adoption of the framework by a large community. One of them is the easy transition from prototype to production. You don't have to compile or modify your code for use in production. In that case, the framework is intended not only as a research tool, but also as a production tool. Another important design aspect is that you don't need to use a separate API when working with CPUs or GPUs. Additionally, computation can be made available via desktops, servers, and mobile devices. A key component of the library is the data flow graph.

The feeling of representing mathematical computations with nodes and edges is a hallmark of TF. Nodes are typically mathematical operations, and edges define input/output mappings between nodes. Information travels on the graph as tensors, multidimensional arrays.

2. Pandas

Pandas is a Python library for data analysis. Pandas was founded by Wes McKinney in 2008 out of a need for powerful and flexible quantitative analysis tools and has become one of his most popular Python libraries. We have a very active community of contributors.
Pandas is based on his two core Python libraries: matplotlib for data visualization and NumPy for math operations. Pandas acts as a wrapper around these libraries, giving you access to many of his Matplotlib and NumPy methods with less code. For example, plot() in Pandas combines multiple Matplotlib methods into a single method that allows you to draw a graph in a few lines. Prior to the introduction of Pandas, most analysts used Python for data collection and preparation, but switched to more domain-specific languages ??such as R for the rest of their workflow. Two new types of objects have been introduced to simplify the analysis task and eliminate the need to change tools. Series with list-like structure and Data Frames with tabular structure.

3. Random

The Python Random module is a built-in Python module used to generate random numbers. These are pseudo random numbers, not really random. This module can be used to perform random actions such as: For example, generating random numbers, outputting random numbers in lists or strin-gs, and so on. Python has a built-in module that can be used to generate random numbers.

4. OS

Python's OS module provides functionality for interacting with the operating system. The operating system is one of Python's standard utility modules. This module provides a portable way to use OS dependent features. The *os* and *os.path* modules contain many functions for interacting with the file system.

5. GLOB

The glob module is a convenience part of the Python standard library. A glob (short for global) is used to return all file paths matching a particular pattern. You can use glob to search for specific file patterns. It's also probably more useful by using wildcard characters to search for files with filenames matching a certain pattern.

6. Keras

Keras is a deep learning API written in Python and running on the TensorFlow machine learning platform. Designed with a focus on enabling rapid experimentation.
Keras is:

a. Simple: Keras reduces the developer's cognitive load so they can focus on the really important part of the problem.

b. Flexibility: Keras follows the principle of progressive disclosure of complexity. Simple workflows should be quick and easy, but arbitrarily advanced workflows should be possible through clear paths that build on what you've already learned.

c. High Performance: Keras provides industry-leading performance and scalability. Used by organizations and companies such as NASA, YouTube, and Waymo.

7. Max Pooling

Max pooling is a pooling operation that selects the largest element from the region of the feature map covered by the filter. So the output after the max pooling layer will be a feature map containing the most salient features of the previous feature map.

8. Flattening

Flattening is a technique used to transform a multidimensional array into a one dimensional array. It is commonly used in deep learning when feeding one-dimensional array information to classification models.

9. Dense Layer

A dense layer is a simple layer of neurons in which each neuron receives input from all neurons in the previous layer, hence the name dense layer. Dense layers are used to classify images based on the output of convolutional layers.

10. Convolution Operation

In mathematics, a convolution operation is defined as a way of mixing two functions. A commonly used analogy is that this operation acts as a filter. The kernel filters out everything that is not important for the feature map and focuses only on specific information.

Two things are required to perform this operation:

a. Input Data

b. Convolution Filter (Kernel)

The result of this operation is a feature map. The number of feature maps (output channels) allows the neural network to learn features. Each channel is independent as it aims to learn new features from the image being convolved.

Finally, the padding type defines the algorithm used when performing the convolution. There is a special case at the end of the input. One type of padding discards the edges of an input because there are no other inputs to scan next to it. Other padding, on the other hand, terminates the input with a value of 0. It's about reducing parameters when collapsing.

11. Train and Test Data Sets

Training dataset and test dataset are two important concepts in machine learning where the training dataset is used to fit the model and the test dataset is used to evaluate the model. Training data is the largest (size) subset of the original dataset used to train or fit a machine learning model. First, training data is fed into the ML algorithm so that it can learn to predict the task at hand.

Now that we have trained the model on the training dataset, let's test the model on the test dataset. This dataset evaluates model performance and ensures that the model can generalize well on new or hidden datasets. The test dataset is another subset of the original data, independent of the training dataset. However, there are some similar types of features and class probability distributions that are used as benchmarks for model evaluation once the model is trained. Test data is a well-organized dataset containing data for each scenario type for a specific problem that the model would face if it were used in the real world. A test dataset typically accounts for about 20-25% of the total original data in an ML project.
Need to Split the Dataset into Training and Testing Sets Splitting the
dataset into training and testing sets is one of the key parts of data preprocessing.
o x_train: used to represent training data features
o x_test: used to represent test data features
o y_train: used to represent training data dependent variables
o y_test: is used to represent the independent variable for testing the data.

C. Methodology

This proposed system is capable of performing automatic facial expressions or emotion recognition of seven universal emotions that are thought to be universal across cultures: disgust, anger, fear, happiness, sadness, neutrality, and surprise. I can do it. Such systems analyze facial images and generate computed predictions of facial expressions. This approach integrates an automatic face recognition module by using a training dataset to generate a neural network.

How to improve accuracy:
Use deep convolutional neural networks. In this network, three neural network architectures are adapted and trained to undergo different classification tasks. Input image data is provided to the network, output layer return values ??are generated from the final model's performance matrix, and the maximum value from the matrix is ??computed. This value represents the current sentiment of the provided input.

Conclusion

In this project, research on facial emotion and static facial image classification using deep learning techniques was developed. This is a complex problem that has been addressed many times using various techniques. Although he had good results with feature engineering, this project focused on his one of the promises of DL: feature learning. The results obtained were not state-of-the-art, but slightly better than other techniques involving feature engineering. This means that DL techniques can finally solve this problem, given enough labeled examples. No feature engineering is required, but image preprocessing improves classification accuracy. Therefore, noise in the input data is reduced. Today, facial emotion recognition software uses feature engineering. An entirely trait-learning based solution still falls short due to her one major limitation, the lack of large datasets on emotions. For example, the ImageNet competition uses datasets containing many images. Larger data sets can implement networks with better feature learning capabilities. In this way, deep learning techniques can be used to achieve sentiment classification. With this I tried to give some simple information about how emotion recognition works in deep learning. We trained a convolutional neural network model on pretrained data (ResNet50) and also used computer vision as part of this model. Haarcascade is a package that OpenCV uses to detect objects in other images. I trained the model on multiple images and used test images to see how the results match. I trained the model for the entire epoch. This model assumes 100 epochs. If the model reaches the threshold and you continue to train the model, you will get unexpected results and less accuracy. After that, increasing the epoch doesn\'t help. The epoch therefore plays a very important role in determining the accuracy of the model and its value can be determined by trial and error. This model achieves an accuracy of 63% on the validation set. To further improve model accuracy, either extend the existing training dataset or increase the model step size. These parameters allow us to improve the model accuracy of this model.

References

[1] Y.-L. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial expression analysis,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. CVPR 2000 Cat NoPR00662, vol. 1, no. 2, pp. 1–19, 2001. [2] F. De la Torre and J. F. Cohn, “Facial expression analysis,” Vis. Anal. Hum., pp. 377–410, 2011. [3] A. Gudi, H. E. Tasli, T. M. Den Uyl, and A. Maroulis, “Deep Learning based FACS Action Unit Occurrence and Intensity Estimation,” vol. 2013, 2015. [4] Z. Ming et al., “Facial Action Units Intensity Estimation by the Fusion of Features with Multi-kernel Support Vector Machine To cite this version?: Facial Action Units Intensity Estimation by the Fusion of Features with Multi-kernel Support Vector Machine,” 2015. [5] R. S. Smith and T. Windeatt, “Facial action unit recognition using multi-class classification,” Neurocomputing, vol. 150, pp. 440–448, 2015. [6] S. Taheri, Qiang Qiu, and R. Chellappa, “Structure-preserving sparse decomposition for facial expression analysis.,” IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc., vol. 23, no. 8, pp. 3590–603, 2014. [7] Y. Wu and Q. Ji, “Discriminative Deep Face Shape Model for Facial Point Detection,” Int. J. Computer. Vision., vol. 113, no. 1, pp. 37–53, 2015. [8] M. F. Valstar and M. Pantic, “Fully automatic recognition of the temporal phases of facial actions,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 1, pp. 28–43, 2012. [9] L. Wang, R. Li, and K. Wang, “A Novel Automatic Facial Expression Recognition Method Based on AAM,” J. Comput., vol. 9, no. 3, pp. 608–617, 2014.

Copyright

Copyright © 2023 K. Nivas, M. Rajesh Kumar, G. Suresh, T. Ramaswamy, Yerraboina Sreenivasulu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET48585

Publish Date : 2023-01-08

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here