Indian Sign Language Recognition Using CNN Inception V3 - A Review

Authors: Mr. Amol Jagtap, Gaurav Pagare, Anushka Sandbhor, Vivek Patil , Sakshi Divate

DOI Link: https://doi.org/10.22214/ijraset.2024.56887

Abstract

This research paper explores the intersection of sign language recognition & deep learning, with a focus on the utilization of CNNs and the Inception V3 architecture. The study emphasizes how crucial proper translation and identification of sign language is to closing the communication gap that exists between the hearing and the Deaf populations. It discusses challenges such as limited datasets and ambiguity in sign language and outlines the potential for future advancements in accessibility and education. By combining the power of deep learning and culturally tailored datasets, this research paper paves the way for more effective sign language recognition, offering the promise of improved communication and accessibility for individuals with hearing impairments. It emphasizes how important it is to remove obstacles to communication and promote inclusion in all facets of daily life for the community of people who are Deaf and hard of hearing.

Introduction

I. INTRODUCTION

In recent years, advancement in technology has paved the way for innovative solutions that bridge gaps between diverse communities. One such groundbreaking development is the Sign Language Translator (SLT) system. This review paper explores intersection of sign language interpretation and deep learning, specially focusing on the utilization of CNNs and Inception V3 architecture. Our project aims to develop an innovative Sign Language Translator that leverages the power of these neural networks to accurately interpret and translate sign language gestures into small sentences.

In this paper, we delve into some latest literature, discussing key studies, methodologies and advancements in this field, of sign language recognition and translation.

II. LITERATURE REVIEW

First, (Sagar P, More, Prof. Abdul Sattar.) “Hand gesture recognition system for dumb people" Authors presented it with mixing of RGB and Grayscale which ends up as a result with White Image and he uses Morphological Filtering. [11]. (A Chandandeep Kaur, Nivit Gill.) “An Automated System for Indian Sign Language Recognition'', believes that gestures are expressive and meaningful bodily motions that include genuine physical movements of the hands, face, and fingers. He adopts a sensor- and vision-based approach. Sign language is the exclusive means of communication. [12]. The automatic recognition of gestures performed by the human is a complex problem, and it is not yet been solved completely. A number of approaches, including machine learning techniques, have been used in the past years to recognize sign language. After the emergence of deep learning techniques, there have been efforts to recognize human gestures. [13] Every motion in sign language has a particular meaning. Therefore, by combining different fundamental elements, complicated meanings can be explained. [3]. The automatic recognition of gestures performed by the human is a complex problem, and it is not yet been solved completely. A number of approaches, including machine learning techniques, have been used in the past years to recognize sign language. [1]. They demonstrated a vision-based system for interpreting isolated hand gestures of Argentinean Sign Language. Here they have used inception model and Recurrent Neural Network (RNN). The model was trained on spatial features using CNN, and on temporal features using RNN. They used Argentinean Sign Language gesture dataset that belong to 46 gesture categories. They achieved an accuracy of 95.2% [14]. SLR is a growing field of study, with the primary goal of recognizing the hand motions that make up the significant components of a sign language. [16]. J. Singha et al. [17] proposed a method for real time recognition where Eigen value-weighted Euclidean distance was used to classify signs. P. Kishore et al. [18] proposed a system by finding active contours from boundary edge map using Artificial Neural Network (ANN) to classify the signs. Another approach used the Viola Jones algorithm with LBP functions for hand gesture recognition in a real-time environment [19]. It had the advantage of requiring less processing capacity to detect the movements.

Real-Time Isolated ISL Gesture Recognition: It was showed how to use statistics to identify ISL gestures in real time that use both hands. For classification, the direction histogram feature is employed. There are two methods employed. They are Euclidean distance and K-nearest neighbor (KNN) metrics. [1]

However, for real-time systems, researchers needed a faster way to solve this problem. The advancements in Deep Learning technologies have enabled automation of image recognition using various image recognition models. For e.g., Convolutional neural networks have made great strides in the field of deep learning in recent years. [20], [21]

III. SIGN LANGUAGE BACKGROUND

Sign language recognition is increasingly vital for bridging the communication divide between Deaf and hearing communities. Understanding the rich background of such language is paramount.

Sign language, a powerful visual language used by the deaf and hard-of-hearing, extends beyond gestures. It possesses its own unique grammar, syntax, and vocabulary. Comprehending these linguistic intricacies is essential for the development of effective recognition systems.

Moreover, acknowledging cultural diversity is crucial. Sign language varies across regions and cultures. Understanding these nuances is key to creating inclusive and adaptable recognition technology. It also allows for the incorporation of regional dialects. Research in sign language recognition should delve into historical evolution, sociolinguistics, and the impact of technology. This knowledge contributes to preserving and promoting sign languages in the digital era.

In conclusion, an in-depth understanding of sign language's linguistic, cultural, and historical aspects is essential for the development of accurate and culturally sensitive recognition systems. This empowers technology to bridge communities and foster understanding effectively.

A. Sign Language Translation And Recognition

Sign language translation is the process of converting sign language gestures and expressions into another language, usually one expressed through speech or writing, and vice versa. In order to facilitate the efficient communication of individuals using sign language and those who do not understand, this translation process is essential.
Hand movements are used to convey information in sign language recognition [5]. Most of the signers use various hand movements for communication; these hand movements are also known as hand gestures. These signs can be easily identified by the community, but not everyone is aware of them, which creates a communication barrier.
So, to reduce the communication gap, we are trying to propose a robust model that can translate hand gestures or hand movements into text or speech. We are also going to focus on finger-spelling recognition. Fingerspelling recognition means hand movements for basic information that does not have a proper sign. For example, someone’s age, address, etc.

B. Approaches

Sign language recognition entails using technology to interpret and comprehend sign language gestures and expressions.

There are numerous methods and techniques for recognizing sign language, including:

Computer vision-based Approaches

a. 2D/3D pose Estimation: Key element placements can be estimated in two- or three-dimensions using computer vision algorithms. This information is then used to recognize and interpret the signs.

b. Depth Sensor: Depth sensor such as Microsoft Kinect or time-of-flight camera can capture 3D positions of body parts, providing rich data for sign language recognition

2. Hand Gesture Recognition

Focusing specifically on hand movements, these systems track and recognize hand shapes, gestures, and movements to interpret sign language. Machine learning and pattern recognition:

a. Supervised Learning: Large datasets of linguistic gesture symbols can be used to train machine learning models, such as neural networks and support vector machines. They pick up on pattern recognition and sign classification.

b. Unsupervised Learning: Clustering and dimensionality reduction techniques can be used to identify meaningful features and group similar markers together.

c. Deep Learning: For the aim of identification, deep neural networks—particularly convolutional neural networks (CNN) and recurrent neural networks (RNN)—are employed to acquire hierarchical representations of sign language motions.

3. Gesture Recognition using Wearable Sensors: Wearable devices with accelerometers, gyroscopes, and other sensors that can record the movement and orientation of the signer's hands and body, allowing them to recognize sign language gestures in real time.

4. Sign Language Databases and Datasets: The development of such recognition systems often relies on large databases and datasets consisting of video or image recordings Sign language gestures. Researchers use these datasets to train and evaluate their models.

5. Interactive and real-time Systems: Some sign language recognition systems are designed for real-time and interactive communication. They allow users to communicate with listeners by translating sign language into text or speech in real time.

6. Sign Language Translation Apps: Mobile apps can use a combination of image recognition and machine learning to recognize sign language gestures and translate them into written or spoken language.

We have decided to take a hybrid approach in our quest to create a sign language recognition system, combining the capabilities of hand gesture recognition with computer vision.

This approach leverages the powerful capabilities of deep learning, with a specific emphasis on Convolutional Neural Networks (CNN), in addition to 2D and 3D pose estimation techniques.

By employing 2D/3D pose estimation, we can accurately determine the spatial positions and movements of key points on a signer's body, providing us with rich data to decode sign language gestures.

This approach enables us to extract intricate details of the signer's hand shapes, movements, and body posture, enhancing the precision of our recognition system.

Moreover, our utilization of deep learning, specifically CNNs, empowers our system to autonomously discover and learn complex hierarchical patterns within sign language gestures. These neural networks are adept at automatically recognizing and classifying these patterns, ensuring the accuracy of our recognition system.

The fusion of 2D/3D pose estimation and deep learning is a dynamic strategy that can significantly improve the effectiveness of our sign language recognition system. In order to improve communication and accessibility for the deaf and hard of hearing community, it enables us to combine the skills of deep learning for pattern recognition with computer vision for spatial awareness. This results in a more robust and accurate interpretation of sign language motions.

IV. MODELS

The success of a sign language recognition system depends largely on choosing the right deep learning architecture. In this study, we adopt the Inception V3 model as the underlying architecture, exploiting its strengths in image recognition and feature extraction.

Convolutional neural networks (CNNs), such as the Inception V3 model, have shown remarkable efficacy in a variety of computer vision applications, such as object detection and image classification.

A. Inception V3 Overview

Inception V3, developed by Google, is famous for its ability to effectively capture complex hierarchies and spatial patterns in images. It uses a deep neural network architecture with an emphasis on parallel processing, thereby reducing computational complexity while maintaining high accuracy. The main components of the Inception V3 model include:

Inception Modules: Inception V3 uses a series of "Inception modules" that integrate many parallel convolutional layers with different kernel sizes. These modules enable the model to capture features at different scales and resolutions, improving its ability to recognize complex patterns in images.
Global Average Pool: Inception V3 employs a global average pool in place of conventional fully connected layers at the end, which lowers the total amount of network parameters. This enhances the generality of the model and prevents overfitting.
Pre-trained weights: Inception V3 benefits from pre-training on a large dataset (e.g ImageNet). This pre-trained knowledge is transferable and provides an effective starting point for fine-tuning specific tasks, such as sign language recognition.

B. Tuning for sign language recognition:

To adapt Inception V3 to our sign language recognition task, several modifications and considerations were implemented:

Data preprocessing: Sign language gestures are inherently different from regular image data. Therefore, we preprocess the sign language dataset to satisfy the model's input requirements. We convert sign language video images into suitable image formats, ensuring compatibility with Inception V3.
Transfer Learning: Leveraging the knowledge embedded in Inception V3's pre-trained weights, we initialize our model's weights with this prior knowledge. Transfer learning allows our model to inherit features learned from various types of images, thereby providing a solid foundation for sign language gesture recognition.
Fine-tuning: To tune the sign language recognition model, we fine-tune the Inception V3 architecture using our sign language dataset. During the fine-tuning process, we adjust the weights to accommodate the unique characteristics and variations found in sign language gestures.
Output layer: We adjust the output layer of Inception V3 to correspond with the quantity of signs or sign language classes in our dataset. This output layer generates predictions for the sign displayed in a given video frame.

In summary, adopting the Inception V3 model as the basis for our sign language recognition system promises to deliver accurate and efficient recognition, paving the way for evaluation and experimental results.

V. COMPONENTS

In this section we will start from basics needs for the components.

Dataset: Gather a comprehensive dataset of sign language gestures. Ensure diverse handshapes, positions, and backgrounds to make the model robust.
Preprocessing: Clean and preprocess the images. This might include resizing, normalization, and augmentation to increase the dataset size and improve generalization.
CNN Model (Inception v3): To extract features, use a feature-trained Inception v3 model. To customize the model for your particular dataset, you can adjust the upper layers while freezing the lower layers.
Transfer Learning: Leverage the knowledge gained by Inception v3 on a large dataset (like ImageNet). Fine-tune the model on your sign language dataset to improve its accuracy.
Sequence Modelling: Given that sign language is composed of a series of motions, you might want to include a sequence modeling element. For this, networks with long short-term memory (LSTM) or recurrent neural networks (RNNs) can be employed.
Output Layer: Design an output layer that corresponds to the classes of sign language gestures you want to recognize. Consider using softmax activation for multi-class classification.
Loss Function and Optimizer: For model training, select a suitable optimizer (such as Adam) and loss function (such as categorical cross entropy).
Training: Train your model on the preprocessed dataset. Monitor metrics like accuracy and loss to ensure the model is learning effectively.
Evaluation: To evaluate your model's generalization and performance, run it on a different set of tests.
Integration: Integrate the trained model into your sign language translator application. This may involve developing a user interface and connecting it to the model for real-time translation.

VI. DATASET:

So in this section we are introducing the publicly available dataset in the following link: https://data.mendeley.com/datasets/kcmpdxky7p/1. These datasets were created specifically for recognition and translation of Indian sign language. The datasets are in a chronological order, it contains images in a sequence for small sentences. With over 700 fully annotated films, 18863 sentence level frames, and 1036 word-level visuals for 100 spoken language sentences delivered by 7 distinct signers, the ISL has a sizable vocabulary. Indian Sign Language (ISL) to text and vice versa can be translated using a machine learning ISL translator to assist local deaf and dumb people. Being able to converse with hearing individuals in real time would eliminate the necessity for an interpreter for deaf and dumb people. Information that is medical or instructional can also be translated with it, allowing deaf and dumb persons to access it in their own tongue. Not only from internet but from the people who are having hearing issues and can’t speak from them we will collect some of the data. The dumb and deaf people know the sign better as they use it in their day-to-day life hence more and precise data will be collected from them for words and small sentences.

VII. CHALLENGES:

This section lists the difficulties that have been reported in the literature as well as additional difficulties that the survey's writers have observed.

A. Limited Dataset

The field of sign language recognition research is faced with a significant challenge due to the insufficient availability of datasets. Recognizing the significance of this challenge is crucial as it goes beyond being a theoretical concept and directly affects the practicality and success of projects in this particular field. The sign language datasets currently available possess value, yet they suffer from deficiencies in terms of quantity and diversity. This presents a notable challenge as it limits the range of signs that AI and ML models can accurately detect. In addition, the issue is further complicated by the existence of regional disparities and diverse cultural aspects within sign language. This implies that a comprehensive and inclusive dataset is required to address these subtleties adequately. It is crucial to address these issues to overcome the limitations caused by the lack of extensive sign language datasets. This will guarantee the project's relevance and impact, while also supporting the overall objective of advancing sign language recognition technology for the betterment of the deaf and hard of hearing communities.

B. Multi Signer Scenario

Sign language recognition encounters a prominent challenge when addressing multi-signer scenarios, especially in the context of practical applications. An obvious challenge would emerge if two deaf individuals conversed using a sign language recognition model. When faced with such a situation, it becomes necessary for the system to differentiate the signals made by each person and guarantee that the signals are accurately assigned to the correct signers. Not only does this concern relate to the accuracy of recognition, but it also touches upon the fundamental aspect of enabling effective communication among several deaf individuals who primarily rely on sign language for interaction. The crucial step towards fostering accessibility and inclusivity for the deaf and hard of hearing community is overcoming the obstacles linked with scenarios involving multiple signers. By resolving these challenges, we facilitate the utilization of this technology for meaningful and smooth sign language conversations.

C. Non-Representable Signs

Sign language recognition faces a notable obstacle in dealing with non-representable signs. Sign languages are characterized by remarkable diversity, encompassing signs that are uniquely tailored to local dialects, specialized groups, or even personal expressions. Recognition systems face a significant challenge when dealing with non-representable signs, commonly known as "out-of-vocabulary" signs. Due to the limited availability of comprehensive datasets, numerous exclusive signs are left unrepresented. Consequently, this poses a challenge for the system to accurately interpret and translate them. The problem becomes even worse when signers use their own unique gestures or signs that are not included in any standard dataset. In order to develop sign language recognition systems that can effectively interpret all the signs used by signers, it is crucial to address the issue of non-representable signs. In order to ensure accurate recognition and inclusion of non-standard signs in the communication process, innovative approaches like constant learning and adaptability are required. This will enhance the technology's utility and inclusivity.

D. Technologically Illiterate Citizens

The significant challenge of illiteracy in the context of technology cannot be overlooked. Technology permeates every aspect of our contemporary society, actively influencing the way we communicate, learn, and access essential services. In spite of that, individuals lacking literacy skills encounter obstacles when using digital interfaces, which ultimately restricts their ability to access vital information and services. The existence of this digital divide has the potential to amplify pre-existing inequalities. Illiterate citizens frequently encounter difficulties in effectively utilizing complex technologies or AI models, such as sign language recognition systems. To overcome this challenge, it is necessary to develop interfaces that are easy for users to navigate and increase awareness and knowledge about digital technology. Ensuring equitable access to the benefits of the digital age is not just dependent on technology, but is also a societal necessity.

E. Ambiguity in Sign Language

Sign language recognition projects present a complex challenge regarding ambiguity, which necessitates thorough contemplation. The meaning of signs in sign languages often depends on the context and the ongoing conversation, making them dynamic and context-dependent. Recognition systems are faced with a daunting task due to the significant variability in context. There can be multiple explanations for signs due to various factors such as how fast the sign was made, the facial expressions accompanying the sign, or the signs that were made before it. As a result, deciphering the intended message of a sign in real-time can be extremely complex. To tackle the issue of ambiguity, it is important to create models that can accurately recognize signs and consider the wider context in which these signs are employed. In order to make sign language recognition systems more practical and useful in real-world communication situations, it is crucial to address ambiguity by ensuring accurate and contextually appropriate interpretations.

VIII. FUTURE SCOPE

The potential for advancements and broad applications in sign language recognition projects is enormous, indicating a promising future scope. As technology keeps advancing, there are numerous thrilling paths worth exploring.

Firstly, incorporating sign language recognition into mainstream technology shows great potential. One aspect of our work involves creating user-friendly sign language interfaces for smartphones, tablets, and smart devices. This breakthrough allows those who are deaf or hard of hearing to communicate more readily, making communication more accessible to them. Moreover, the enhancement of immersive communication and learning environments can be achieved by incorporating real-time sign language interpretation into virtual reality (VR) and augmented reality (AR) experiences.

Secondly, the focus of research can be directed towards improving and broadening sign language datasets. In order to create more comprehensive and culturally inclusive models, it is essential to incorporate regional variations, dialects, and idiosyncratic signs. In order to ensure recognition systems can serve a wider audience, the involvement of crowdsourcing and community-driven efforts is essential.

Thirdly, one should not overlook the potential prospects for sign language recognition in the realms of education and accessibility offered by the future. The development of educational platforms and tools that incorporate sign language recognition can greatly empower and enhance the learning experience for deaf and hard of hearing students. By utilising these technologies, learning becomes not only more engaging but also highly interactive. These systems have the capability to aid in instantaneous interpretation of sign language during online conferences, meetings, and educational webinars, thereby fostering inclusivity in digital communication.

In conclusion, sign language recognition projects hold great potential for various future advancements. These advancements could include integrating the technology into mainstream devices and systems, increasing the diversity of datasets used for training, and developing more accessible solutions for individuals who rely on sign language. The potential of these advancements is immense, as they can greatly improve the lives of the deaf and hard of hearing community. By dismantling communication obstacles and fostering inclusivity in various facets of everyday life, these developments hold the key to enhancing their quality of life.

Conclusion

In summary, by incorporating the Inception V3 model into a tailored dataset showcasing the Indian Sign Language (ISL), a significant step is taken towards enhancing accessibility, communication, and inclusivity for individuals with hearing impairments. The approach used in this project has been proven to offer various advantages and results, which further emphasize its significance compared to other alternatives. The primary benefit of employing this deep convolutional neural network, Inception V3, is its ability to perform strong feature extraction and classification. The model\'s high recognition accuracy is ensured by its capacity to capture complex spatial information within signs, as well as its adaptability to various sign sizes and locations. When it comes to the intricate and varied gestures within ISL (International Sign Language), this approach offers great advantages. It covers a wide range of words, phrases, and expressions. Furthermore, the specifically designed ISL dataset, which has been carefully compiled for Indian sign language, plays a crucial role in the accomplishment of this project. By incorporating regional variations and culturally significant signs, the inclusivity of ISL is enhanced, resulting in a more comprehensive representation. This approach successfully tackles a critical gap commonly found in generic sign language datasets. It is expected that the utilization of this dataset will lead to results that are highly precise and relevant in the Indian deaf and the hard of hearing community, thus perfect for meeting their communication requirements. Essentially, by utilizing the Inception V3 model along with the ISL dataset, this project has the potential to open up a whole new range of possibilities for recognizing sign language. This approach offers a compelling choice over other models due to its superior recognition accuracy, robust feature extraction, and culturally tailored data advantages. By breaking down communication barriers, promoting inclusivity, and enhancing the quality of life for the Indian deaf and hard of hearing community, the outcomes attained in this endeavor are set to have a significant impact. The ongoing progress in sign language recognition technology will further mold a future that is welcoming and accessible to everyone.

References

[1] Varsha M, et. al. “Indian Sign Language Gesture Recognition Using Deep Convolutional Neural Network” IEEE 2021 [2] Wanbo Li, et. al. “Sign Language Recognition Based on Computer Vision” IEEE 2021 [3] Neil Buckley, et. al. “A CNN sign language recognition system with single & double-handed gestures” IEEE 2021 [4] Aakash Deep, et. al. “Realtime Sign Language Detection and Recognition” IEEE 2022 [5] Ritik Kumar, et. al. “An Optimum Approach to Indian Sign Language Recognition using Efficient Convolution Neural Networks.” IEEE 2022 [6] Adithya V., et. al. “Convolutional Neural Network based Sign Language Recognition to Assist Online Assessment” IEEE 2021 [7] Jiangbin Zheng, et. al. “An Improved Sign Language Translation Model with Explainable Adaptations for Processing Long Sign Sentence” Hindawi 2020 [8] Triyono, et. al. “Sign Language Translator Application Using OpenCV.” IOP Publishing 2018 [9] Kohsheen Tiku, et. al. “Real-time Conversion of Sign Language to Text and Speech.” IEEE 2020 [10] Purva C. Badhe, et. al. “Indian Sign Language Translator Using Gesture Recognition Algorithm” IEEE 2020 [11] Prof. Supriya Pawar, Sainath Muluk, Sourabh Koli “Real Time Sign Language Recognition using Python”, International Journal of Innovative Research in Computer and Communication Engineering‚ Vol. 6, Issue 3, 2018 [12] U. Zeshan, “Indo-Pakistani Sign Language Grammar, A Typological-a-Outline, Sign Language Studies”, vel. 3, Pp. 157-212, 2003. [13] Badhe, Purva C., and Vaishali Kulkarni. “Indian sign language translator using gesture recognition algorithm.” 2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS). IEEE, 2015 [14] Masood, Sarfaraz, Adhyan Srivastava, Harish Chandra Thuwal, and Musheer Ahmad. “Real-time sign language gesture (word) recognition from video sequences using CNN and RNN.” In Intelligent Engineering Informatics, pp. 623-632. Springer, Singapore, 2018 [15] Galicia R, Carranza O. et al (2015) “Mexican Sign Language Recognition using Movement Sensor”. In: IEEE. 2015 IEEE 24th Int Symposium on Industrial Electronics. Buzios, 3-5 June. pp.573-578 [16] M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign language recognition techniques,” Int. J. Mach. Learn. Cyber, vol. 10, p. 131–153, 2019 [17] Singha J, Das K. Recognition of Indian sign language in live video. Int J Comput Appl 2013;70(19):17–22. [18] Kishore PVV, Kumar DA. Optical flow hand tracking and active contour hand shape features for continuous sign language recognition with artificial neural networks. In: IEEE 6th international conference on advanced Computing; 2016. [19] Swamy Shanmukha, Chethan MP, Gatwadi Mahantesh. Indian sign language interpreter with android implementation. Int J Comput Appl 2014:975–8887. [20] Sahoo Ashok K, Mishra Gouri Sankar, Kumar Ravulakollu Kiran. sign language recognition: state of the art. In: ARPN Journal of Engineering and Applied Sciences; 2014. [21] Bachani Shailesh, Dixit Shubham, Chadha Rohin, Bagul Prof Avinash. sign language recognition using neural network. International Research Journal of Engineering and Technology (IRJET) 2020;7(4).

Copyright

Copyright © 2024 Mr. Amol Jagtap, Gaurav Pagare, Anushka Sandbhor, Vivek Patil , Sakshi Divate. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET56887

Publish Date : 2023-11-21

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here