Advancements in Emotion Detection: A Comprehensive Review of Text and Audio-Based Approaches

Authors: Arjun Raj K, Muhammed Afthab Aslam, Sneha Sreenivasan Panatte, Anjana K Das, Muneebah Mohyiddeen

DOI Link: https://doi.org/10.22214/ijraset.2024.59451

Abstract

Emotion detection from text and audio has garnered significant attention in recent years due to its applications in various domains such as mental health support, conversational agents, and stress detection. In this review, we analyze methodologies and results in this field. The papers cover a range of topics including machine learning models for emotion detection from speech, deep learning methods for text-based emotion classification, and the integration of AI in healthcare and education. Traditional machine learning algorithms such as logistic regression and SVM are compared with deep learning architectures like CNNs and RNNs/LSTMs for their effectiveness in emotion detection tasks. Spectral and acoustic features for speech emotion detection are examined alongside text-based emotion detection approaches utilizing lexicon-based methods and machine learning techniques. Furthermore, the impact of voice assistant personality on user attitudes and behavior is explored, shedding light on the importance of designing emotionally intelligent AI systems. Additionally, challenges and ethical considerations in deploying AI-driven solutions in mental health and healthcare settings are discussed. Through this comprehensive review, we provide insights into the current state of emotion detection research, highlighting trends, challenges, and future directions in the field.

Introduction

I. INTRODUCTION

Emotion detection from text and audio has emerged as a crucial area of research with wide-ranging applications spanning mental health support, conversational agents, and stress detection systems. Understanding and interpreting human emotions from linguistic and acoustic cues can significantly enhance human-computer interaction and facilitate personalized services. In recent years, advancements in machine learning and deep learning techniques have revolutionized the field, enabling more accurate and efficient emotion detection models.

In this paper, we present a comprehensive review of methodologies and results in emotion detection from text and audio. These encompass a diverse range of topics, including the utilization of machine learning models for emotion classification from speech, the exploration of deep learning architectures for text-based emotion analysis, and the integration of artificial intelligence in healthcare and education sectors to support emotional well-being.

The review delves into the comparison of traditional machine learning algorithms such as logistic regression, decision trees, and support vector machines (SVM) with state-of-the-art deep learning architectures like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) with long short-term memory (LSTM) cells. Additionally, the effectiveness of various feature extraction techniques, including spectral and acoustic features for speech emotion detection is examined in detail.

Furthermore, the review sheds light on the challenges associated with deploying emotion detection systems in real-world scenarios, including issues related to data privacy, ethical considerations, and model interpretability. The impact of voice assistant personality on user attitudes and behavior is also explored, emphasizing the importance of designing emotionally intelligent AI systems to enhance user experience and engagement.

By synthesizing insights from these research papers, this review aims to provide a comprehensive overview of the current state of emotion detection research, identify emerging trends, and outline potential future directions in the field. Through this analysis, we seek to contribute to the advancement of emotion-aware computing systems and their applications in various domains, ultimately fostering more empathetic and user-centric technology solutions.

II. RELATED WORK

In the realm of automatic emotion detection from text, various methodologies have been explored extensively to address the nuances of human expression. One prominent approach involves leveraging a diverse array of machine learning (ML) techniques such as logistic regression, decision trees, and support vector machines (SVM), among others. These algorithms are often employed alongside ensemble methods like random forest and AdaBoost, enabling the extraction of intricate patterns from datasets such as DailyDialog, which typically contain a rich spectrum of emotions categorized into distinct classes [1]. Furthermore, to meet the demand for more nuanced and accurate emotion classification, deep learning (DL) methods have emerged as powerful alternatives. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs)/long short-term memory (LSTM) networks, in particular, have demonstrated remarkable efficacy in capturing complex patterns and dependencies in textual data, thereby enhancing the precision of emotion detection models [1]. These sophisticated approaches are complemented by the robust ecosystem of Python-based libraries like Scikit-Learn, TensorFlow/Keras, NLTK, and NumPy, which offer extensive support for various stages of the machine learning pipeline, including data preprocessing, model training, and result visualization [1].

In the domain of speech emotion detection, the utilization of spectral and acoustic features plays a pivotal role in capturing the intricate nuances of human expression. Mel-frequency cepstral coefficients (MFCC), for instance, provide a robust representation of speech signals by capturing frequency bands that are perceptually relevant to human hearing. Similarly, linear predictive coding coefficients (LPCC) and fast Fourier transform (FFT) techniques offer complementary insights into the spectral characteristics of speech, enabling the extraction of distinctive features associated with various emotional states [2]. Notably, MFCC stands out for its ability to resiliently handle noisy environments, thereby ensuring the robustness of emotion detection systems in real-world scenarios. On the other hand, LPCC emphasizes the spectral components corresponding to vowel sounds, which are often rich in emotional cues, while LPC aids in reconstructing speech signals from their spectral representations, facilitating a more accurate analysis of emotional content [2]. By leveraging these spectral and acoustic features synergistically, researchers aim to enhance the accuracy and reliability of speech-emotion detection systems, paving the way for applications in diverse domains ranging from human-computer interaction to mental health monitoring.

Moreover, the realm of emotion detection from text presents its own set of challenges and methodologies. Here, researchers explore a spectrum of approaches, including lexicon-based and machine-learning paradigms, to decipher the emotional content embedded within textual data. Lexicon-based methods rely on curated resources such as lexicons and ontologies, which map words to their associated emotions based on predefined rules or semantic relationships. These methods offer a structured framework for analyzing text but may face limitations in capturing context-dependent nuances of emotion [3]. In contrast, machine-learning approaches leverage statistical methods such as Latent Semantic Analysis (LSA) to discern underlying patterns in textual data, enabling more nuanced and context-aware emotion detection. Within the machine-learning paradigm, both supervised and unsupervised learning techniques find application, with supervised approaches relying on labeled datasets to train emotion detection models, while unsupervised methods uncover hidden structures and patterns in unlabelled data [3]. Examples of supervised classifiers include support vector machines (SVM) and Naïve Bayes, which excel in categorical emotion models by learning from labeled examples, while LSA offers a dimensionality reduction technique that can effectively capture semantic relationships in textual data, contributing to more accurate emotion analysis.

The integration of AI-driven conversational agents and virtual counselors into healthcare and mental health support systems represents a significant advancement in leveraging technology to address critical emotional and psychological needs. These systems, as highlighted in [4], harness the power of emotion detection from both text and speech inputs to offer personalized interventions and real-time emotional support to individuals seeking assistance. By analyzing linguistic cues and vocal inflections, these AI-driven agents can discern the user's emotional state and tailor their responses accordingly, providing empathetic and contextually relevant guidance [5]. This integration not only extends the reach of mental health services but also helps overcome barriers such as stigma and accessibility, making support more readily available to those in need.

Furthermore, research efforts have delved into understanding the impact of voice assistant personality on user attitudes and behaviors, shedding light on the importance of designing emotionally intelligent AI systems [6]. In the context of healthcare and mental health support, the personality and demeanor of AI-driven conversational agents can significantly influence user engagement, trust, and receptiveness to the provided assistance. By imbuing these systems with traits that evoke warmth, empathy, and understanding, developers can foster stronger connections with users, enhancing the overall effectiveness of the support provided [7]. This highlights the need for a holistic approach to AI design, considering not only functional capabilities but also the emotional resonance of the system with its users.

Moreover, AI-based systems have found application in stress detection and classification, particularly in occupational settings such as IT workplaces, as discussed in [8]. By leveraging machine learning techniques like the K-Nearest Neighbour (KNN) algorithm, these systems can analyze various data inputs, including physiological signals, behavioral patterns, and textual communications, to infer the presence and severity of stress among employees. This proactive approach to stress management enables organizations to identify at-risk individuals early on and implement targeted interventions to mitigate stress and promote well-being. Additionally, the utilization of AI-driven stress detection systems underscores the growing recognition of the importance of mental health in the workplace and the role of technology in supporting employee well-being.

In the realm of education and mental health support, innovative AI architectures like ESCAP [9] strive to alleviate exam stress by delivering timely stress management strategies based on the analysis of speech prosody cues. Similarly, AI-powered chatbots such as I-BOT [10] are designed with a focus on stress relief, employing natural language processing (NLP) techniques and Dialogflow integration to understand user queries and provide tailored positive responses. By leveraging these technologies, users can receive personalized solutions that cater to their specific needs, contributing to enhanced well-being and academic performance.

Furthermore, therapy chatbot applications, as explored in [11], extend support beyond stress relief to address a myriad of personal problems including relationship issues and career/job challenges. These applications utilize supervised learning methodologies to generate responses, ensuring that users receive meaningful and empathetic interactions. Meanwhile, in the field of psychiatry, AI's potential in disease detection, understanding disease progression, and treatment discovery has been underscored [12]. However, as highlighted in [13], challenges such as maintaining patient autonomy and addressing ethical concerns surrounding privacy and trust must be carefully navigated to ensure responsible deployment and utilization of AI-driven solutions in mental healthcare.

The integration of artificial intelligence (AI) into psychiatry has garnered increasing attention. A comprehensive overview conducted by the study explored the evolution of AI from its inception by John McCarthy to its current applications in medical domains, including mental health [14]. While AI has made significant strides in disease detection and diagnosis in physical health, its utilization in mental health remains relatively limited. This discrepancy stems from the inherent challenges in replicating softer skills crucial for mental health diagnosis and treatment, such as rapport building and behavioral observation [15]. Despite these challenges, AI holds immense potential to redefine mental health diagnosis by developing pre-diagnostic screening tools, and risk models, and aiding in the understanding of mental illnesses, including affective disorders like depression, through techniques like EEG analysis.

Voice conversion technology has emerged as a promising area within AI research, offering potential applications in emotional expression and communication. One notable study presented a Duration Controllable Voice Conversion (DCVC) model, which comprises components like a content encoder, duration converter, and decoder aimed at achieving duration-controllable voice conversion [16]. By conditioning on speaker embeddings and employing techniques such as frame-level phoneme-based down sampling and adversarial classifiers, this model demonstrates significant potential in preserving linguistic and emotional information during voice conversion tasks.

Another significant area of exploration in AI research pertains to its contribution to combating global health crises such as the COVID-19 pandemic [17]. A bibliometric analysis delved into the scholarly production during 2020 concerning the intersection of deep learning techniques and COVID-19 research. Through quantitative and qualitative analyses, this study aimed to identify patterns, influential papers, and key features in the application of deep learning techniques for addressing various aspects of the pandemic, including disease detection, prognosis, and treatment [18].

Speech recognition systems represent another facet of AI with profound implications across various domains, including accessibility and communication. An exploration into the complexities and potentials of computerized speech synthesis and recognition highlighted the significance of understanding the benefits and limitations of voice technology [19]. By outlining fundamental concepts, installation procedures, and potential enhancements, this study underscored the importance of continued research and development in speech recognition software to improve its capabilities and address existing limitations.

AI's role in revolutionizing online education has been a subject of systematic investigation. A mapping study aimed to address key questions regarding the approaches, algorithms, and feature engineering techniques employed in education analysis and development using AI. Through the identification of primary approaches, commonly utilized algorithms, and prevalent feature engineering techniques, this research provided valuable insights into the diverse AI-based approaches shaping the landscape of online education.

Moreover, advancements in emotion intensity control within emotional voice conversion frameworks, exemplified by Emovox (Paper 20), present exciting prospects for finely manipulating emotions in speech synthesis. By employing sequence-to-sequence models and relative attributes, these frameworks enable superior speech quality and precise emotion control, paving the way for more nuanced and emotionally resonant human-computer interactions. These developments signify a convergence of AI and emotional intelligence, promising richer and more empathetic experiences in various domains, including mental health support and therapeutic interventions.

The exploration of AI-driven solutions extends to sentiment analysis and emotional disclosure in chatbot interactions [21]. BERT-based sentiment analysis models demonstrate superior performance in detecting emotions such as anger, disgust, fear, and sadness in textual data [22]. Furthermore, investigations into the usage of pre-trained speech recognition deep layers reveal potential applications in emotion detection behind the screen [23]. These studies underscore the importance of AI in understanding and responding to user emotions in various contexts [24].

Lastly, the implementation of interactive healthcare advisor systems using chatbots and visualization tools showcases the practical applications of AI in health monitoring and education [25]. These systems enable users to self-diagnose and receive personalized health information, contributing to effective health management [25]. Overall, the integration of AI-driven technologies in emotion detection, mental health support, and healthcare domains reflects ongoing efforts to enhance human well-being through technological innovation.

III. DISCUSSIONS

The integration of artificial intelligence (AI) into emotion detection, mental health support, and healthcare domains reveals the transformative potential of technology in addressing complex human needs. As highlighted in the provided paragraphs, AI-driven solutions offer promising avenues for enhancing emotional intelligence, enabling personalized interventions, and fostering stronger connections between individuals and support systems. However, several key points emerge from this discussion that merit further exploration and consideration.

Firstly, while AI-based emotion detection systems leverage sophisticated algorithms and data-driven approaches to discern emotional states from text and speech inputs, they must navigate challenges related to cultural nuances, context-dependent expressions, and subjective interpretations of emotions. Addressing these challenges requires interdisciplinary collaboration between computer scientists, linguists, psychologists, and cultural experts to develop more culturally sensitive and context-aware models. Additionally, ensuring the ethical deployment of emotion detection technologies entails transparency, informed consent, and safeguards against potential biases or privacy infringements.

Secondly, the integration of AI-driven conversational agents and virtual counselors into mental health support systems holds significant promise for extending access to care and reducing the stigma associated with seeking help. However, the effectiveness of these systems hinges on their ability to provide empathetic, contextually relevant, and evidence-based interventions. Human-centered design principles should guide the development of these systems, prioritizing user trust, autonomy, and confidentiality. Moreover, ongoing monitoring, evaluation, and refinement are essential to ensure the quality and safety of AI-driven mental health interventions.The discussion underscores the importance of considering user preferences, needs, and cultural backgrounds when designing AI-driven technologies for emotion detection and mental health support. Tailoring interventions to individual characteristics and preferences can enhance engagement, efficacy, and user satisfaction. Additionally, fostering transparency and accountability in the design and deployment of AI-driven systems can help build user trust and mitigate concerns related to data privacy, algorithmic bias, and unintended consequences.

Lastly, while AI holds immense potential for revolutionizing emotion detection and mental health support, it is essential to recognize its limitations and potential risks. Overreliance on technology may inadvertently dehumanize interactions, undermine the therapeutic alliance, or perpetuate inequalities in access to care. Thus, AI should complement rather than replace human expertise and empathy in mental health care delivery. Moreover, ongoing research, collaboration, and interdisciplinary dialogue are crucial for harnessing the full potential of AI while mitigating its risks and ethical challenges.

In conclusion, the integration of AI into emotion detection, mental health support, and healthcare domains offers unprecedented opportunities for improving human well-being. By leveraging advanced technologies, interdisciplinary collaboration, and human-centered design principles, we can harness the transformative power of AI to enhance emotional intelligence, extend access to care, and foster healthier, more resilient communities. However, realizing this vision requires a thoughtful and holistic approach that prioritizes ethical considerations, user needs, and ongoing collaboration across disciplines and stakeholders.

IV. ACKNOWLEDGMENT

The authors extend sincere gratitude to Mrs. Muneebah Mohyiddeen for her Guidance and project coordinators, Mrs. Najla

Nazar and Mrs. Gishma K M for mentorship. Their expertise played a vital role in shaping the direction and focus of this research.

Conclusion

The exploration of automatic emotion detection from text and speech, alongside the integration of AI-driven systems in various domains, underscores the transformative potential of technology in understanding and addressing human emotions and psychological needs. Through a diverse array of methodologies, including machine learning, deep learning, and natural language processing, researchers have made significant strides in developing sophisticated emotion detection models capable of capturing the nuances of human expression with remarkable accuracy. These advancements hold promise for applications ranging from mental health support and therapy to stress management in occupational settings and education. The deployment of AI-driven conversational agents and virtual counselors represents a paradigm shift in healthcare delivery, offering personalized interventions and real-time emotional support to individuals in need. By leveraging linguistic and vocal cues, these systems can provide empathetic guidance and assistance, extending the reach of mental health services and overcoming barriers to access. Additionally, the emphasis on designing emotionally intelligent AI systems underscores the importance of user engagement and trust in fostering meaningful interactions.The integration of AI technologies in stress detection, sentiment analysis, and healthcare advisory systems highlights the multifaceted role of AI in promoting human well-being and effective health management. However, as with any technological advancement, ethical considerations surrounding privacy, trust, and autonomy must be carefully addressed to ensure the responsible deployment and utilization of AI-driven solutions. Overall, the convergence of AI and emotional intelligence presents exciting opportunities for enhancing human-computer interactions, mental health support, and healthcare delivery. By continuing to innovate and collaborate across interdisciplinary boundaries, researchers and practitioners can harness the full potential of AI to positively impact society and improve the quality of life for individuals worldwide

References

[1] Nataliia Kholodna, Victoria Vysotska, Solomiia Albota, ”A Machine Learning Model for Automatic Emotion Detection from Speech,”2021. [2] Anjali Tripathi, Upasana Singh, Garima Bansal, Rishabh Gupta,Ashutosh Kumar Singh, ”A Review on Emotion Detection and Classsification using Speech”. [3] Lea Canales, Patricio Mart?nez-Barco, ”Emotion Detection from text: A Survey”. [4] Sercan ?O. Ar?k, Mike Chrzanowski, Adam Coates, Gregory Diamos,Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi, ”DeepVoice: Real-time Neural Text-to-Speech”, 2017. [5] Caterina B ?erub ?e, Theresa Schachner, Roman Keller, Elgar Fleisch,Florian v Wangenheim, Filipe Barata, Tobias Kowatsch , ”Voice-based conversational agents for the prevention and management of chronic and mental health conditions: systematic literature review”,2021. [6] Farzaneh Nasirian, Mohsen Ahmadian , One-Ki Daniel Lee , ”AI-Based Voice Assistant Systems: Evaluating from the Interaction and Trust Perspectives”. [7] Atieh Poushneh, ”Humanizing voice assistant: The impact of voiceassistant personality on consumers’ attitudes and behaviors”. [8] Suresh Kumar Kanaparthi, Surekha P , Lakshmi Priya Bellamkonda Bhavya Kadiam, Beulah Mungara, ”Detection of Stress in IT Employees using Machine Learning Technique”. [9] Tarashankar Rudra, Manning Li , Manolya Kavakli, ”ESCAP: Towards the Design of an AI Architecture for a Virtual Counselor to Tackle Students’ Exam Stress”. [10] Prof. Kalyani Pendke1 , Avanti Sayare2 , Vishal Chore3 ,ManishWasnik4 ,”Designing of I – BOT for Stress Relief ”. [11] Pranav Kapoor, Pratham Agrawal, Zeeshan Ahmad, ”Therapy Chatbot: A Relief From Mental Stress And Problems ”. [12] Adwitiya Ray, Akansha Bhardwaj, Yogender Kumar Malik , Shipra Singh, Rajiv Gupta, ”Artificial intelligence and Psychiatry: An overview”. [13] Sang-Hoon Lee , Hyeong-Rae Noh , Woo-Jeoung Nam ,and Seong-Whan Lee , Fellow, ”Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck”. [14] Tae-Ho Kim, Sungjae Cho† , Shinkook Choi† , Sejik Park and Soo-Young Lee, ”EMOTIONAL VOICE CONVERSION USING MULTI-TASK LEARNING WITH TEXT-TO-SPEECH”. [15] JANNETH CHICAIZA 1 , STEPHANY D. VILLOTA 2 , PAOLA G. VINUEZA-NARANJO 3 ,AND RUB ?EN RUMIPAMBA-ZAMBRANO, ”Contribution of Deep-Learning Techniques Toward Fighting COVID-19: A Bibliometric Analysis of Scholarly Production” ,2020. [16] Rajat Saini, ”SPEECH RECOGNITION SYSTEM (SPEECH TO TEXT) (TEXT TO SPEECH)”. [17] RAHMAN SHAFIQUE 1 , WAJDI ALJEDAANI2 , FURQAN RUSTAM 3 , ERNESTO LEE 4 ,ARIF MEHMOOD 5 , AND GYU SANG CHOI , ”Role of Artificial Intelligence in Online Education: A Systematic Mapping Study”. [18] Kun Zhou , Berrak Sisman , Rajib Rana , Bjorn W. Schuller C , and Haizhou Li, ”Emotion Intensity and its Control for Emotional Voice Conversion”. [19] Najla Alkaabi , Nazar Zaki , Heba Ismail 2 and Manzoor Khan ,”Detecting Emotions behind the Screen”. [20] Christopher Burr , Jessica Morley, Mariarosaria Taddeo, and Luciano Floridi , ”Digital Psychiatry: Risks and Opportunities for Public Health and Wellbeing”. [21] Prabod Rathnayaka , Nishan Mills , Donna Burnett , Daswin De Silva * , Damminda Alahakoon and Richard Gray, ”A Mental Health Chatbot with Cognitive Skills for Personalised Behavioural Activation and Remote Health Monitoring”. [22] Jenifa Gnanamanickam 1, Yuvaraj Natarajan and Sri Preethaa K.R.”A Hybrid Speech Enhancement Algorithm for Voice Assistance Application”. [23] Gain Park1 , Jiyun Chung2 Seyoung Lee3 ”Effect of AI chatbot emotional disclosure on user satisfaction and reuse intention for mental health counseling: a serial mediation model.” [24] JORGE OLIVEIRA AND ISABEL PRAC ? A, ”On the Usage of Pre-Trained Speech Recognition Deep Layers to Detect Emotions”. [25] Tae-Ho Hwang, JuHui Lee, Se-Min Hyun, KangYoon Lee , ”Implementation of interactive healthcare advisor model using chatbot and visualization ”.

Copyright

Copyright © 2024 Arjun Raj K, Muhammed Afthab Aslam, Sneha Sreenivasan Panatte, Anjana K Das, Muneebah Mohyiddeen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET59451

Publish Date : 2024-03-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here