Empowering Communication with CNN Sign Language Recognition

Authors: Abhijeet Dewtarse, Saurabh Davkhar, Aniket Davkhar, Sujit Aare, Prof. Palwe P. M.

DOI Link: https://doi.org/10.22214/ijraset.2024.58830

Abstract

The SignSense project presents an innovative solution aimed at breaking the communication barriers faced by the hearing-impaired community. Leveraging advanced technologies, including Convolutional Neural Networks (CNNs) and Java-based algorithms, SignSense pioneers a Sign Language Recognition System. This system empowers individuals with hearing impairments to express themselves naturally, ensuring accurate translation of sign language gestures into text or voice. The project\'s motivation stems from fostering inclusivity and understanding between hearing-impaired individuals and the broader society.

Introduction

I. INTRODUCTION

In a world where communication is fundamental, the hearing-impaired often face profound challenges due to the barrier posed by sign language. Sign language, a rich and expressive mode of communication, remains unfamiliar to many, creating a divide between the hearing-impaired and the general populace. Recognizing this divide, our project, SignSense, emerges as a beacon of inclusivity and understanding. The need for a revolutionary communication system stems from the inherent limitations of existing methods, notably sign language. While sign language is a potent tool, it grapples with challenges like a limited vocabulary, cultural differences in interpretation, and the absence of standardized rules. The proposed system steps in with a solution, leveraging advanced technologies such as deep learning and machine learning. These technologies hold the promise of overcoming existing limitations, offering a more comprehensive, accurate, and inclusive means of communication. By delving into the realm of artificial intelligence, this system strives to break down communication barriers, fostering a society where every individual can express themselves freely and meaningfully.

The developed system not only accurately recognizes sign language gestures but also offers a user-friendly interface and cross-platform compatibility. Rigorous testing ensures reliability and scalability, allowing for future enhancements and integration possibilities. This report encapsulates the journey of creating a compassionate and connected society, where every voice resonates, regardless of its form, enabling meaningful interactions and mutual understanding.

II. LITERATURE SURVEY

Deep Convolutional Network with Long ShortTerm Memory Layers for Dynamic Gesture Recognition by Rostyslav Siriak, Inna Skarga-Bandurova, Yehor Boltov at (IEEE – 2019)

In this research, a significant advancement in the field of gesture recognition has been made through the development and implementation of a SVM & K NEAREST CLUSTRING network. The primary problem addressed in this project was the accurate and real-time recognition of hand gestures, crucial for applications in sign language translation and human-computer interaction. The study is grounded in the rapidly evolving field of deep learning and computer vision, with a focus on dynamic gesture recognition. The proposed solution centers around a SVM & K NEAREST CLUSTRING network, a hybrid architecture incorporating Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) layers. CNNs are widely known for their effectiveness in image-related tasks, while LSTMs are specialized in capturing temporal patterns, making them ideal for analyzing sequential data such as video streams. The architecture of the network involves multiple convolutional layers followed by max-pooling, dense layers, and LSTM layers. Techniques like dropout are utilized to prevent overfitting.

The research methodology involves the creation of a labeled dataset of hand gestures and preprocessing steps, including grayscale conversion, segmentation, and simplification. The SVM & K NEAREST CLUSTRING model is trained using this dataset, and extensive experiments are conducted to optimize the hyperparameters and validate the approach. The training process involves constructing a database, data preprocessing, data augmentation, feature identification, and hand gesture recognition. The chosen loss function, categorical cross-entropy, ensures effective training, while the Adam optimizer updates the network weights.

In conclusion, this research contributes significantly to the domain of gesture recognition by introducing a robust and efficient SVM & K NEAREST CLUSTRING network. The approach demonstrates superior accuracy and real-time performance, addressing the challenges associated with dynamic gesture recognition. The utilization of established frameworks and libraries ensures the reliability and replicability of the research findings, making it a valuable addition to the field of computer vision and assistive technology [1].

2. Indian Sign Language Recognition Using ANN And SVM Classifiers by Miss. Juhi Ekbote Mrs. Mahasweta Joshi at (IEEE-2018)

The paper delves into the significance of sign language as a primary mode of communication for the deaf and mute community. Sign language offers a structured way of communication, where specific gestures represent words or alphabets. Each region typically has its own sign language, and this research focuses on Indian Sign Language (ISL). While extensive research has been conducted on sign languages like BSL and ASL, ISL has received relatively less attention.

The research aims to bridge this gap by developing an automatic recognition system specifically tailored for ISL numerals (0-9). The study uses various techniques for feature extraction, including shape descriptors, SIFT, and HOG. Shape descriptors, such as eccentricity, aspect ratio, compactness, extent, solidity, orientation, spreadness, and roundness, are employed to characterize the gestures. SIFT, a method for detecting and describing local features in images, is also utilized to extract keypoints from the sign images.

The research employs a database of 1000 images, with 100 images representing each numeral sign. The images are pre-processed and segmented, and various features are extracted using the aforementioned techniques. The extracted features are then fed into classification algorithms, including ANN and SVM, for sign recognition.

The results showcase the effectiveness of different combinations of feature extraction techniques and classifiers. The study reveals that a combination of HOG and ANN yields an exceptional accuracy of in recognizing ISL numerals.

In conclusion, the research provides a comprehensive overview of the challenges in recognizing ISL gestures and proposes an innovative solution using advanced feature extraction techniques and classifiers. The high accuracy achieved demonstrates the system's efficacy, making it a significant contribution to the field of sign language recognition. [3]

3. Sign language Recognition Using Machine Learning Algorithm by Greeshma Pala; Jagruti Bhagwan Jethwani; Satish Shivaji Kumbhar; Shruti Dilip Patil at (IEEE - 2021) Summary:

This project addresses the communication gap between the speechless community and others by developing an American Sign Language (ASL) recognition system. The project utilizes machine learning algorithms including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN) to translate ASL gestures into text and speech in real-time.

In the context of existing solutions, this project stands out due to its comprehensive approach. While other applications exist for sign language recognition, many face limitations in accuracy and real-time implementation. By employing a combination of KNN, SVM, and CNN algorithms, this project achieves high accuracy, ensuring effective communication between individuals who understand sign language and those who do not.

The project's methodology involves creating a dataset of ASL gestures captured through a webcam. These images undergo pre-processing steps such as grayscale conversion and resizing. The algorithms, KNN, SVM, and CNN, are trained on this pre-processed dataset. KNN leverages nearest neighbors, SVM utilizes support vector clustering, and CNN employs convolutional layers for feature extraction, all contributing to accurate gesture recognition.

The information sources include academic papers, online resources, and textbooks, guiding the project in image pre-processing, feature extraction, and algorithm optimization. Python, Jupyter Notebooks, and libraries like NumPy, Pandas, TensorFlow, and OpenCV were instrumental in project development. Speech recognition and text-to-speech capabilities enhance the system's usability, enabling communication with users unfamiliar with sign language.

In conclusion, this project's innovative integration of multiple algorithms, real-time implementation, and user-friendly interface marks a significant advancement in sign language recognition technology. By effectively bridging the communication gap, it promotes inclusivity and enables seamless interaction between diverse communities. The choice of tools and methodologies ensures a robust, accurate, and efficient ASL recognition system, making it a valuable contribution to assistive technology. [4]

4. Live Action And Sign Language Recognition Using Neural Network by Mrs.Indumathy P, M.Tech1 , Ms.Nithyalakshmi J 2 , Ms.Monisha P 3 , Ms.Mythreyee M 4 at (ICICC-2023)

This research project addresses the crucial need for effective communication tools for the speech and hearing-impaired community. Sign language, a vital means of nonverbal communication, inspired this study. The project proposes a solution involving live action tracking and recognition of sign language gestures through advanced machine learning techniques.

The project delves into the challenges faced by the speech and hearing-impaired population, emphasizing the significant percentage of the global population affected by hearing loss. It explores the intricate nature of sign language, highlighting the importance of body movements and gestures in communication.

A survey of existing solutions reveals diverse approaches. Some studies employ LSTM and GRU models for Indian Sign Language, achieving high accuracy. Other researchers use YOLOv5 and CNN for real-time sign gesture recognition, demonstrating substantial improvements. The research incorporates cutting-edge technologies such as Tensor Flow, Keras, LSTM, and Media Pipe for developing the sign language recognition system. The utilization of OpenCV and Media pipe for dataset collection and pre-processing ensures high-quality input for the machine learning models.

The project acknowledges limitations related to processing speed, considering the time taken for recognizing actions. The throughput might be slow due to the complexity of gesture recognition and language conversion.

In conclusion, the research project offers a comprehensive exploration of sign language recognition. By employing advanced deep learning techniques and leveraging powerful libraries and frameworks, the proposed system aims to bridge the communication gap for the speech and hearing-impaired community. The study contributes to the field by providing a detailed methodology and utilizing state-of-the-art tools, emphasizing the potential for real-time sign language recognition systems in facilitating inclusive communication. [5]

III. METHODOLOGY

The Sign Sense system proposed in this project leverages advanced technologies to enable accurate and accessible communication for deaf individuals.

The core methodology revolves around Convolutional Neural Networks (CNNs) for sign language recognition. The algorithmic approach involves several stages, combining computer vision techniques with machine learning algorithms to achieve precise recognition of sign language gestures. Below is the proposed methodology for developing the Sign Sense system:

Step 1: Data Collection and Preprocessing

a. Data Gathering: Gather a diverse dataset of sign language gestures, capturing various hand movements, facial expressions, and body postures. This dataset forms the basis for training the CNN models.

b. Data Cleaning and Annotation: Clean the dataset by removing noise and annotating each gesture with relevant information. Proper labeling is essential for supervised learning.

2. Step 2: CNN Model Architecture Design

a. Architecture Selection: Choose an appropriate CNN architecture for sign language recognition. This may involve using pre-trained models like VGG or designing a custom architecture tailored to the specific gestures of sign language.

3. Step 3: Training and Optimization

a. Training the CNN: Train the selected CNN model using the preprocessed dataset. Implement techniques such as transfer learning and fine-tuning to optimize the model's performance on sign language recognition.

b. Parameter Tuning: Experiment with hyper parameters, such as learning rates and batch sizes, to fine-tune the CNN model for accuracy and efficiency.

4. Step 4: Real-time Gesture Recognition

a. Integration with Web Cameras and Microphones: Develop modules to capture real-time sign language gestures through web cameras and microphones. Ensure synchronization and alignment between video and audio inputs.

b. Real-time Prediction: Implement algorithms for real-time prediction of sign language gestures using the trained CNN model. Optimize the prediction process for low latency to enable instantaneous recognition.

5. Step 5: User Interface Development

a. Intuitive Interface Design: Create a user-friendly interface that allows users to interact with the system seamlessly. Design intuitive controls for initiating, pausing, or terminating gesture recognition.

b. Output Presentation: Display the recognized sign language gestures as text or voice output in real time. Provide options for users to customize the output format based on their preferences.

6. Step 6: Testing and Evaluation

a. Unit Testing: Perform rigorous unit testing using JUnit to validate the functionality of individual components, ensuring they produce the expected results.

b. User Testing: Conduct user testing with deaf individuals and individuals proficient in sign language to evaluate the system's accuracy, responsiveness, and user experience. Gather feedback for further improvements.

7. Step 7: Documentation and Support

a. Technical Documentation: Prepare comprehensive technical documentation, including system architecture, algorithms used, and API references. This documentation serves as a reference for developers and users.

b. User Manuals: Create user manuals and guides explaining how to use the Sign Sense system effectively. Include step-by-step instructions and troubleshooting tips.

c. User Support: Establish a support system, such as a dedicated email helpline or online forum, where users can seek assistance in case of issues or inquiries.

Conclusion

Throughout this project we will focus on its primary objective: bridging the communication gap between hearing-impaired and hearing individuals. By employing a sophisticated deep learning approach, integrating K-means clustering and Convolutional Neural Network(CNN), we will accomplish a remarkable accuracy rate of over 90% in recognizing symbolic expressions from images. This accuracy is crucial for enabling meaningful and effective communication among individuals, regardless of their hearing abilities. A critical evaluation of our project\'s outcomes reveals a robust implementation of design and technological choices. The selected technologies, particularly the integration of K-means clustering and CNN within our deep learning framework, showcase a meticulous approach to solving the communication challenge. The application will not only fulfills its intended purpose but also offers real-time capabilities, ensuring inclusivity in communication. The success of this project will addressed the long-standing problem of communication barriers but has also elevated the standards of inclusivity in society. Through the rigorous application of advanced deep learning techniques and thoughtful technological choices, we have laid the foundation for a more inclusive future, where effective communication knows no boundaries.

References

[1] Rostyslav Siriak, Inna Skarga, Bandurova, Yehor Boltov, “Deep Convolutional Network with Long ShortTerm Memory Layers for Dynamic Gesture Recognition”, (IEEE – 2019) [2] Wenwen Yang, Jinxu Tao, Changfeng Xi, Zhongfu Ye, “Sign Language Recognition System Based on Weighted Hidden Markov Model” ,(IEEE -2015) [3] Miss. Juhi Ekbote Mrs. Mahasweta Joshi, “Indian Sign Language Recognition Using ANN And CNNClassifiers”, (IEEE-2018) [4] Gresham Pala, Jagruti Bhagwan Jethwani; Satish Shivaji Kumbhar; Shruti Dilip Patil,” Sign language Recognition Using Machine Learning Algorithm”, (IEEE - 2021) [5] Mrs.Indumathy P, M.Tech1 , Ms.Nithyalakshmi J 2 , Ms.Monisha P 3 , Ms.Mythreyee M 4,” Live Action And Sign Language Recognition Using Neural Network”,(ICICC-2023) [6] QAZI MOHAMMAD AREEB 1 , MARYAM1 , MOHAMMAD NADEEM 1 , ROOBAEA ALROOBAEA 2 , AND FAISAL ANWER 1,” -Helping Hearing-Impaired in Emergency Situations: A Deep Learning-Based Approach”,2021 IEEE [7] Suharjito, Herman Gunawan, Narada Thiracitta, Gunawan Witjaksono,” The Comparison of Some Hidden Markov Models for Sign Language Recognition ” ,2018 IEEE [8] Citra Suardi, Anik NurH andayani , Rosa Andrie Asm ara , Aji Prasetya Wibaw a, Lilis Nur Hayati , Huzain Azis,” Design of Sign Language Recognition Using E-CNN”, IEEE Xplore – 2021 [9] Ruchi Gajjar, Jinalee Jayeshkumar Raval , “Real-time Sign Language Recognition using Computer Vision”, IEEE Xplore ,2021.

Copyright

Copyright © 2024 Abhijeet Dewtarse, Saurabh Davkhar, Aniket Davkhar, Sujit Aare, Prof. Palwe P. M.. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET58830

Publish Date : 2024-03-07

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here