Sign language is the best possible communication facility for the hearing-impaired population as a result, its importance increases and research needs to be done for the systems successful implementation. This medium is a vital part of the communication process between individuals with hearing loss. In this paper, we present a framework for developing a word recognition system using a deep learning method where each method has its strength and limitations. Here, we review the various methods available in sign language recognition such as convolutional neural network (CNN) along with long short-term memory (LSTM), recurrent neural network (RNN), and hidden Markov model (HMM) understanding model behaviour and working. Creating a model which will read static as well as dynamic hand gestures and give output in written format.
Introduction
I. INTRODUCTION
The number of deaf people is more than 400 million people all over the world and this is expected to grow over time. In India, alone people suffering from hearing loss or difficulty in hearing are 63 million. According to WHO (World Health Organization), 2015 survey approximately 70 million people are mute they cannot speak. As a result, the communication gap increases if there is no common ground there won’t be any communication with people who are not suffering from any problems. To prevent the isolation of these people, sign language has been introduced this helps convey any message among themselves as well as others. The drawback of sign language is that it changes from region to region because there is no standard sign language it depends on the country as ISL is Indian Sign Language, British use BSL and Americans use ASL.
Not everyone can understand sign language neither they have time to learn it. Finding a person who is well-versed in sign language is difficult and keeping a person as a translator is not feasible every time. Meetings that take place online become a problem if the person in front doesn’t know sign language which creates a barrier for people to communicate with each other. Therefore, there is a need for an automated sign language translator which can easily be used by vocally impaired people as well as people with hearing problems. An automated sign language translator contains lots of challenges to translate first the machine should be able to detect a hand, and its movement position from the camera and identifies action done by palm and finger variations in sign language. Knowing the angle and what is the distance between the person waving the hand sign and the camera is needed for increasing the accuracy of detecting the hand sign. Being able to distinguish between hand and background, elimination of background if needed. A good recognition system should also be able to identify alphabets, numerals, static and dynamic words, non-manual features head nods/shakes, and various kinds of facial expressions. The goal of the system is to convert the hand signs detected to written text.
II. MOTIVATION
The Motivation for implementing this project is to create a platform that will establish communication or interaction with Deaf and Mute people is of utter importance nowadays. These people interact through hand signs. Gestures are the physical action form performed by a person to convey some meaningful information. Hand gestures are a powerful means of communication among humans. There are various signs which express complex meanings and recognizing them is a challenging task for people who have no understanding of that language. This project aims at taking basic step in bridging the communication gap between normal people and the deaf and the dumb people using Indian sign language (ISL). Effective extension of this project to words and common expressions may not only make the deaf and dumb people communicate faster and easier with the outer world, but also provide a boosting Developing autonomous system for understanding and aiding them.
III. RELATED WORK
Convolutional Neural Network (CNN) is the foremost used deep learning algorithm when it comes to images as it can do work image recognition, classification segmentation, and for other correlated work. Image is given as an input to CNN which then assign importance i.e., learnable weights and biases to various aspects of an image then they are fed to the convolution layer in that layer operation takes place on two functions and output is the third function that expresses how the shape of one is modified by the other.
Camgoz et al [4] transformers are used such that they solve the problem of transformation of input sequences into output sequences in deep learning applications, they are a type of artificial neural network. Along with the transformer CTC is also utilized. CTC loss is utilized for such tasks where sequences are used, and alignment between them is required. It helps to tackle sequence problems where timing is variable. Pretrained CNN was taken into consideration for spatial embedding followed by ReLU.
Koller et al [5] The approach is tested on three datasets that are publicly available namely continuous sign language recognition, mouth shape detection, and hand shape classification. A hybrid model is built which uses Bi LSTM and CNN models which are further embedded in HMM. The algorithm learns on weak and noisy labelled data, weakly supervised learning is in which high-level and noisy sources of supervision are used to create much larger training sets much more quickly than be produced by manual supervision.
In [8] Dushyant Kumar et al take video samples of gestures as input and a gesture label as output is provided to the model. The first step is it extracts the frames from the video samples and saves them in sequential order on the disk. The number of frames extracted from each video is recorded for reference. Then, the sequence of contiguous images was fed to the 3D-CNN network.
IV. SYSTEM ARCHITECTURE
The proposed system architecture takes the input from the user which is a hand image from the video. This hand image is captured and analyzed; the analysis part starts by normalizing the image which means there is a change in pixel intensity of images. Normalization of images takes place so that each input parameter has similar data distribution and this data is stored. The dataset of images gathered from different sources is then split into training and testing datasets. CNN model is created which the help of the dataset model and is trained and tested.
Now, the model can successfully classify the hand sign images and can predict the images which were previously taken by the user. The accuracy is checked between predicted output and desired output errors are calculated.
Furthermore, errors can help in refining the parameters which were set earlier and again train the model this is the ideal case of neural network working forward propagation and backward propagation. After successful prediction, the images taken from the video are translated into the desired language.
Conclusion
Communication with a standard person is usually a challenging task for a dumb person. during this paper, a hand gesture recognition system is introduced which is a good communication aid for a dumb person. The system uses advanced technologies like Image process to make sure maximum accuracy. conjointly it\'s convenient once compared to existing systems. Capturing the hand while not the glove ends up in inaccurate outputs. Information base creation and testing using a GUI makes the system additional user friendly. The information will be dilated with additional range of hand gestures and its totally different prospects to improve the performance of the system. The GUI created offers a platform for the user to hold out the hand gesture recognition.
References
[1] Sruthi C.J, Lijiya A. “Signet: A Deep Learning based Indian Sign Language Recognition System.” 2019 International Conference on Communication and Signal Processing (ICCSP). IEEE 2019
[2] Adaloglou, Nikolas and Chatzis, Theocharis and Papastratis, Ilias and Stergioulas, Andreas and Papadopoulos, Georgios Th and Zacharopoulou, Vassia and Xydopoulos, George J and Atzakas, Klimnis and Papazachariou, Dimitris and Daras, Petros “A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition.” IEEE Transactions on Multimedia 2021.
[3] Anderson, Ricky and Wiryana, Fanny and Ariesta, Meita Chandra and Kusuma, Gede Putra and others “Sign Language Recognition Application Systems for Deaf-Mute People: A Review Based on Input-Process-Output”. Procedia computer science Elsevier 2017
[4] Camgoz, Necati Cihan and Koller, Oscar and Hadfield, Simon and Bowden, Richard “Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020
[5] Koller, Oscar and Camgoz, Necati Cihan and Ney, Hermann and Bowden, Richard “Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos” IEEE transactions on pattern analysis and machine intelligence 2019
[6] Cui, Runpeng and Liu, Hu and Zhang, Changshui “A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training” IEEE Transactions on Multimedia 2019
[7] Koller, Oscar and Zargaran, Sepehr and Ney, Hermann “Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs” Proceedings of the IEEE conference on computer vision and pattern recognition 2017
[8] Singh, Dushyant Kumar. \"3D-CNN based Dynamic Gesture Recognition for Indian Sign Language Modeling.\" Procedia Computer Science 2021
[9] Shanta, Shirin Sultana and Anwar, Saif Taifur and Kabir, Md Rayhanul “Bangla sign language detection using sift and cnn”. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) IEEE 2018
[10] Dongxu Li and Rodriguez, Cristian and Yu, Xin and Li, Hongdong} “Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison” Proceedings of the IEEE/CVF winter conference on applications of computer vision 2020