Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Kaustubh Trivedi, Priyanka Gaikwad, Mahalaxmi Soma, Komal Bhore, Prof. Richa Agarwal
DOI Link: https://doi.org/10.22214/ijraset.2022.43220
Certificate: View Certificate
Image classification is one of classical issue of concern in image processing. There are various techniques for solving this issue. Sign languages are natural language that used to communicate with deaf and mute people. There is much different sign language in the world. But the main focused of system is on Sign Language (SL) which is on the way of standardization in that the system will concentrated on hand gestures only. Hand gesture is very important part of the body for exchange ideas, messages, and thoughts among deaf and dumb people. The proposed system will recognize the number 0 to 9 and alphabets from American Sign Language. It will divide into three parts i.e. preprocessing, feature extraction, classification. It will initially identify the gestures from American Sign language. Finally, the system processes that gesture to recognize number with the help of classification using CNN. Additionally we will play the speech of that identified alphabets.
I. INTRODUCTION
Sign Language is a best way of communication between deaf, dumb people and normal people. A sign language is visual language that uses system of manual, facial and body movements. Sign language is not universal language and different sign languages are used in different countries, some countries like UK, USA and India having more than one Sign Language. Hundreds of Sign Languages are using in World, American Sign Language, Indian Sign Language, Japanese Sign Language, Turkish Sign Language these are the some examples of Sign Language.
American Sign Language is the part of Sign Language which is mostly used by deaf and dumb people in the world
Artificial Neural Network (ANN), a brain-style computational model, has been used for many applications. Researchers have developed various ANN’s structure in accordant with their problem. After the network is trained, it can be used for image classification. The Support Vector Machine is a theoretically superior machine learning methodology with great results in classification of high dimensional datasets and has been found competitive with the best machine learning algorithms. In the past, CNNs have been tested and evaluated only as pixel-based image classifiers. Moving from pixel-based techniques towards object-based representation, the dimensions of remote sensing imagery feature space increases significantly. This results increasing complexity of the classification process, and causes problems to traditional sample-based classification schemes.
In this study, we have developed a novel approach of recognizing hand number gestures by recognizing or labeling hand parts in depth images. Our proposed approach consists of two main processes: hand parts recognition by random forests (RFs) classifier and rule based hand number gestures recognition. The main advantage of our proposed approach is that the state of each finger is directly identified through the recognize hand parts and then number gestures are recognized based on the state of each finger.
II. RELATED WORK
This work include the feature extraction, gesture recognition process for that they have proposed a novel approach for recognizing hand number gestures using the recognized hand parts in a depth image. In this paper they proposed a system in that approach they divided that approach into two stages: (i) Hand parts recognition by random forests (RFs) and (ii) rule-based hand number gestures recognition. This contains pairs of depth maps and their Corresponding hand pats labeled maps was generated after that they used DB to training RFs. In the second stage, a depth image was first captured from a depth camera and then a hand depth silhouette was extracted by removing the background. Next, the hand parts of a depth silhouette were recognized using the trained RFs. Next, a set of features was extracted from the labeled on the extracted features, labeled hand parts finally they used rule-based approach to recognize number gesture (2014) [1]. This work include the techniques like thresholding, feature extraction, on the help of this feature they proposed a simple method for recognition of numbers. They used thresholding value for recognition of numbers. They divide that method into three stages i.e. first they captured image by using web camera after capturing image they applied threshold value on that and using that threshold value they recognized the numbers(2011) [2].
This work contain the features like image-preprocessing, HOG, feature extraction, gesture recognition on the basis of that they have propose a system for decomposition of gestures into single handed or double handed gesture. They classifying gesture into these subcategories simplify the process of gesture recognition in the ISL due to presence of lesser number of gestures in each sub category. They used Histogram of Gradients (HOG) features and geometric descriptors using KNN and CNN classifiers were tried on a dataset consisting of images of 26 English alphabets present in the ISL under variable background. The HOG features when classified with Support Vector Machine were found to be the most efficient approach result of this system in term of accuracy is 94.23 [3].
In this proposed system they have made application for those who those vocal and hearing disabilities. It discusses an improved method for sign language recognition and conversion of speech to signs. The algorithm devised is capable of extracting signs from video sequences under minimally cluttered and dynamic background using skin color segmentation. It distinguishes between static and dynamic gestures and extracts the appropriate feature vector. These are classified using Support Vector Machines. Speech recognition is built upon standard module Sphinx. Experimental results show satisfactory segmentation of signs under diverse backgrounds and relatively high accuracy in gesture and speech recognition (2016) [4].
This work presented a dynamic hand gesture recognition system for home appliance control using only the depth camera. The dynamic hand gesture is recognized using static hand postures and hand trajectory. The proposed system can recognize seven commonly used dynamic hand gestures. Experimental results show that the system is effective for home appliance control [5].
This paper introduces a novel hand gesture recognition scheme based on depth data. The hand is firstly extracted from the acquired depth maps with the aid also of color information from the associated views. Then the hand is segmented into palm and finger regions. Next, two different set of feature descriptors are extracted, one based on the distances of the fingertips from the hand center and the other on the curvature of the hand contour, finally, a multiclass CNN classifier is employed to recognize the performed gestures [6].
In this context, the geometric moments and the orthogonal moments namely the Zernike, Tchebichef and Krawtchouk moments are explored. The proposed system detects the hand region through skin color identification and obtains the binary silhouette. These images are normalized for rotation and scale changes. The moment features of the normalized hand gestures are classified using a minimum distance classifier [7].
This work focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Mover’s Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences [8].
Propose method is to recognize the image-based numbers of Persian sign language (PSL) using thinning method on segmented image. In this approach, after cleaning thinned image, the real endpoints have been used for recognition. The method is qualified to provide real-time recognition and is not affected by hand rotation and scaling [9].
This work discusses a simple recognition algorithm that recognizes the numbers from 0 to 10 using thresholding. The overall algorithm has three main steps: image capture, apply threshold and recognizing the number. The assumption is made that user must wear color hand gloves [10].
III. PROPOSED SYSTEM
The propose work is to focus on the hybrid approach of algorithms for classification of numbers and alphabets. Proposed model consists of four phases i.e., Preprocessing, Feature Extraction, Classification, Recognition. In the proposed model the method of thing and CNN algorithm is going to use, CNN for classification of feature extraction to classify the numbers and alphabets.
The first section of the proposed model is consisting of Image capturing or acquisition and preprocessing of the image. The very first step is capture image through a camera or through video. After getting image resize that image and extract hand part from that image. Remove the noise if present and converted into Binary Image.
The second section is consist of Feature Extraction, There are many features available for gesture recognition but this system will concentrating on figure tip and active-inactive fingers using thinning techniques.
After getting features it will give it to CNN. CNN classified the features. CNN classified that features for the higher accuracy and compare with training dataset and gives the output. Additionally we will play the speech of that identified alphabets.
A. Proposed System Architecture
IV. ALGORITHM
A. Convolution Neural Network (CNN) Algorithm
The structure of CNN algorithm includes two layers. First is the extraction layer of features in which each neuron's input is directly connected to its previous layer's local ready fields and local features are extracted. The spatial relationship between it and other features will be shown once those local features are extracted. The other layer is feature_map layer; Every feature map in this layer is a plane, the weight of the neurons in one plane are same. The feature plan’s structure makes use of the function called sigmoid. This function known as activation function of the CNN, which makes the feature map have shift in difference. In the CNN each convolution layer is come after a computing layer and its usage is to find the local average as well as the second extract; this extraction of two feature is unique structure which decreases the resolution.
V. MATHEMATICAL MODEL
Train a CNN in a purely supervised way, with the greedy layer-wise procedure in which each added layer is trained as an RBM (e.g., by Contrastive Divergence).
P is the input training distribution for the network
∈ is a learning rate for the training
l is the number of layers to train
Wk is the weight matrix for level k, for k from 1 to l
bk is the visible units offset vector for RBM at level k, for k from 1 to l
ck is the hidden units offset vector for RBM at level k, for k from 1 to l
Mean_field_computation is a Boolean that is true if training data at each additional level is obtained by a mean-field approximation instead of stochastic sampling
Step 1: for k=1 to l do
Step 2: initialize Wk=0,bk=0, ck=0
Step 3: while not stopping criterion do
Step 4: sample h0=x from P
Step 5: for i=1 to k-1 do
Step 6: if mean_field_computation then
Step 7: assign hji to Q(hji=1|hi-1, for all elements j of hi
Step 8: else
Step 9: assign hji to Q(hji|hi-1, for all elements j of hi
Step 10: end if
Step 11: end for
Step 12: RBMupdate (hk-1,∈, Wk,bk, ck) {thus providing Qhkhk-1 for future use}
Step 13: end while
Step 14: end for
VI. RESULTS AND DISCUSSION
VII. FUTURE SCOPE
We can develop a model for ASL word and sentence level recognition. We can develop a complete product that will help the speech and hearing impaired people, and thereby reduce the communication gap.
Because of the demands on durability, accuracy, and efficiency, hand gesture detection for real-world applications is extremely difficult. Based on the approach suggested using the thinning algorithm and support vector machine, a good comparative research has been adopted in this work. This will aid in boosting number gesture recognition accuracy as well as recognizing the names and positions of active fingers. In our suggested system, we preprocess the input gesture, use a thinning method to extract features, and then use CNN to recognize the specific gesture name, alphabets, and number. We\'ll also translate these alphabets into speech. A physical implementation of the system has shown the design\'s usefulness in a real-world context. This system\'s work will be completed entirely through the use of python and it will successfully execute.
[1] Munir Oudah, Ali Al-Naji, Javaan Chahl, “Hand Gesture Recognition Based on Computer Vision: A Review of Techniques”,2020 [2] Ali Moin, Andy Zhou, Alisha Menon, Simone Benatti, George Alexandrov, Senam Tamakloe, Jonathan Ting, Natasha Yamamoto, Yasser Khan, Fres Burghardt, Luca Benini, Ana C Arias, Jan M Rabaey “A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition”,2021 [3] Abdullah Mujahid, Mazhar Javed Awan, Awais Yasin, Mazin Abed Mohammed, Robertas Damasevicius, Rytis Maskeliunas, Karrar Hameed Adbdulkareem, “Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model”,2021 [4] Yong Soon Tan, Kian Ming Lim, Chin Poo Lee, “Hand gesture recognition via enhanced densely connected convolutional neural network”,2021 [5] Priyanka Parvathy, Kamalraj Subramaniam, G.K.D. Prasanna Venkatesan,P.Karthikaikumar, Justin Varghese, T. Jayasankar, “Development of hand gesture recognition system using machine learning”, 2020 [6] Yong Soon Tan, Kian Ming Lim, Connie Tee, Chin Poo Lee, Chenn Yaw Low, “Convolutional neural network with spatial pyramid pooling for hand gesture recognition”,2020 [7] Manisha Kowdiki, Arti Kharpade, “Automatic hand gesture recognition using hybrid metaheuristic-based feature selection and classification with Dynamic Time Warping”,2021 [8] Gibran Benitez-Garcia, Lidia Prudente-Tixteco, Luis Carlos Castro-Madrid, Rocio Toscano Medina, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, Luis Javier Garcia Villalba, “Improving Real-Time Hand Gesture Recognition with Semantic Segmentation”, 2021 [9] Wen Qi, Salih Ertug Ovur, Zhijun Li, Aldo, Marzullo, Rong Song,” Multi-Sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network”,2021 [10] Fangtai Guo, Zaixing He, Shuyou Zhang, Xinyue Zhao, Jinhui Fang, Jianrong Tan, “Normalized edge convolutional networks for skeleton-based hand gesture recognition”,2021 [11] Sakshi Sharma, Sukhwinder Singh “Vision-based hand gesture recognition using deep learning for the interpretation of sign language”,2021 [12] Hira Ansar, Ahmad Jalal, Munkhjargal Gochoo, Kimbum Kim, “Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities”, 2021 [13] Janmenjoy Nayak, Binghnaraj Naik, Pandit Byomkesha Dash, Alireza Souri, Vimal Shanmuganathan,” Hyper-parameter tuned light gradient boosting machine using memetic firefly algorithm for hand gesture recognition”,2021 [14] Yinfeng Fang, Xuguang Zhang, Dalin Zhou, Honghai Liu,” Improve Inter-day Hand Gesture Recognition Via Convolutional Neural Network-based Feature Fusion”, 2021 [15] Shweta Saboo, Joeeta Singha, Rabul Hussain Laskar, “Dynamic hand gesture recognition using a combination of two-level tracker and trajectory-guided features”, 2021 [16] Qing Gao, Yongquan Chen, Zhaojie Ju, Yi Liang, “Dynamic Hand Gesture Recognition Based on 3D Hand Pose Estimation for Human-Robot Interaction”, 2021 [17] Amin, M.S.; Amin, M.T.; Latif, M.Y.; Jathol, A.A.; Ahmed, N.; Tarar, M.I.N. Alphabetical Gesture Recognition of American Sign Language using E-Voice Smart Glove. In Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 5–7 November 2020 [18] Mehta, A.; Solanki, K.; Rathod, T. Automatic Translate Real-Time Voice to Sign Language Conversion for Deaf and Dumb People. Int. J. Eng. Res. Technol. (IJERT), 2021 [19] Farman Shah, M.S. Sign Language Recognition Using Multiple Kernel Learning: A Case Study of Pakistan Sign Language. IEEE Access, 2021 [20] Pan, W.; Zhang, X.; Ye, Z. Attention-Based Sign Language Recognition Network Utilizing Keyframe Sampling and Skeletal Features. IEEE Access, 2020 [21] Sincan, O.M.; Keles, H.Y. AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods. IEEE Access, 2020 [22] Zhao, T.; Liu, J.; Wang, Y.; Liu, H.; Chen, Y. Towards Low-Cost Sign Language Gesture Recognition Leveraging Wearables. IEEE Trans. Mob. Comput, 2021 [23] Al-Qurishi, M.; Khalid, T.; Souissi, R. Deep Learning or Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues. IEEE Access 2021 [24] Breland, D.S.; Skriubakken, S.B.; Dayal, A.; Jha, A.; Yalavarthy, P.K.; Cenkeramaddi, L.R. Deep Learning-Based Sign Language Digits Recognition from Thermal Images with Edge Computing System. IEEE Sens. J. 2021 [25] Papastratis, I.; Dimitropoulos, K.; Konstantinidis, D.; Daras, P. Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space. IEEE Access 2020
Copyright © 2022 Kaustubh Trivedi, Priyanka Gaikwad, Mahalaxmi Soma, Komal Bhore, Prof. Richa Agarwal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET43220
Publish Date : 2022-05-24
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here