Sign Language Recognition

Authors: Yuvasri J , Sujitha S, Pavithra B, Kamali K, Rajalakshmi S

DOI Link: https://doi.org/10.22214/ijraset.2023.52128

Abstract

All over the world, the hearing and speaking impaired person communicate through the sign language as it is the only reliable source of interaction with themselves and as well as with normal people. This language is mainly made up of the shape of the hand movement and gestures it made. This system aims to bridge this communication gap and aid the deaf and the mute to use technology to carry out their daily transactions by using a simple approach which is easily implementable. Also, to help the hearing and speaking impaired person to communicate with people who do not understand sign language. In this system, the sign language concentrates on the 26 letters of alphabets and some simple phrases like Hello, Good morning, thank you, etc., The proposed system aims to recognize sign language and convert it to text. In this paper a technique is proposed to develop a system which collects the datasets for the input. Input given to the system is an image of the hand depicting. OpenCV is used as a tool for image processing in the proposed system. The system is trained in a specific which we will discuss in the upcoming sections which predict the result of the input given. This mainly uses the TensorFlow for the prediction analysis and detection analysis.

Introduction

I. INTRODUCTION

Sign language is the way of communication of the hearing and speaking impaired persons. It is the major tool for the interaction and sharing thoughts through signs. In general, we use letters, phrases or sentences or even paragraph to convey a message. But in sign language, we use only signs through hand gestures and expressions of our body. This acts as a bridge to connect the communication between the normal people and hearing and speaking impaired persons. The world has various cultures and various regions. As you move from one place to another, the language may change, the grammar and lexicon may change according to the region. The syntax and semantics of the language may also differ from the village to village even in the same region of the country. For example, we speak English in America, Chinese in China, Japanese in Japan and so on. Like the same way, the sign language also changes from region to region.

The expressions and gestures vary from various grammar and lexicon according to its region of the place it has been using. For example, In Pakistan they use Pakistan sign language (PSL). In the early periods, there have been a special tutor or a separate assistant for the deaf and dumb people to communicate with others. But later, it becomes regular in everywhere of the world. As well as the tutor scatters through the earth, the technology also grows in its efficient ways. The scientists discover variety of systems to make the communication easier.

And still, they also working for it. This system explains about the connection bridge between the hearing and speaking impaired persons and normal persons. Section II represents the main abstract of our system. The main advantage of the sign language is that, the communication because it is the crucial thing for many people to use sign language and building relationships. Signs and hand gestures are not the one which is very important for interaction but also facial expressions and body languages too. Learning how to sign can make you more impressive.

II. RELATED WORKS

Saba Jadooki et al. [1] and Bauer, Britta et al. [2] used the Kinect device and coloured gloves respectively. Jitcharoenport, Rujira et al. [3] used flex sensors and gyroscopes for Thai sign language recognition. Nada B Ibrahim et al. [4] recognized the Arabic sign language by extracting multiple hand features. Fused features mining is presented et al. [5]. They extracted the hand features which are classified by the artificial neural network (ANN) algorithm. Although this approach claims an error rate of 0.8, their dictionary size is very limited with only 8 alphabets of ASL. Sumaira Kausar et al. [6] recognized Pakistan sign language by fuzzy classifier approach. They have identified the different positions of fingers and orientation of the hand which is extracted using coloured marked gloves. Aleem Khalid Alvi et al. [7] used a statistical template matching technique for Pakistan sign language recognition.

The mean value and standard deviation of the sensors for a gesture are identified through which PSL is recognized up to 78.2%, the accuracy is affected by environmental changes. This approach also utilized sensors and gloves. Image-based recognition of Pakistan sign language is done by Muhammad Raees et al. [8].

They recognized the position of fingers in 2-D. although this approach is robust for PSL recognition, it gives 16% gesture failure from hand orientation. Ahmed et al. [9] have used SVM based approach for PSL recognition. In this work, region-based and boundary-based features are extracted. Experiments are conducted for only 10 static alphabets of PSL with a dictionary size of 60 samples and an accuracy rate of 83% is achieved. Halim, Zahid et al. [10] recognized Pakistan sign language using DTW algorithm-based approach.

This work considers body parts including head, right wrist, left wrist, right hand, left hand, spine, hp bone, left shoulder, centre shoulder, and right shoulder.

Their accuracy rate varies base on the distance of the signer from the camera which shows that this approach does not scale-invariant and keeping a specific distance of the signer from the camera they achieved an accuracy rate of 91%. Tauseef et al. [11] proposed a new approach for recognition static PSL alphabets which achieved a high accuracy result of 97.4%. They used a colour segmentation approach which can affect the accuracy rate in some situations. Syed Saqlain et al. [12] presented a new approach for categorization of static alphabets of PSL.

The approach calculates the histogram of local binary patterns (LBP) of the input images which are further processed for extraction of different features including standard deviation, skewness, kurtosis, variance, entropy, and energy of the LBP histogram. For classification, multi-class SVM is applied which reported an accuracy rate of 78.18% for the dictionary size of over 3400 samples.

III. EXISTING SYSTEM

In Pakistan, deaf people use Pakistan sign language (PSL) as a means of communication with people. In scientific literature, many studies have been done on PSL recognition and classification. Most of this work focused on coloured-based hands while some others are sensors and Kinect- based approaches.

These techniques are costly and also avoid user-friendliness. In this paper, a technique is proposed for the recognition of thirty-six static alphabets of PSL using bare hands.

The dataset is obtained from the sign language videos. At a later step, four vision-based features are extracted i.e., local binary patterns, a histogram of oriented gradients, edge- oriented histogram, and speeded up robust features. The extracted features are individually classified using Multiple kernel learning (MKL) in support vector machine (SVM).

Disadvantages: The input image should have a colour background. This system is supposed to be costlier and also avoid user friendliness. Accessories should not be present on the hand depicting the letter. Time consumption will be high when searching an image in a large database.

IV. PROPOSED SYSTEM

Sign language recognition is a broad area of research and researchers in the past have proposed many techniques for sign language recognition. All of these approaches are well developed according to their scenarios. Although these techniques have a recognition rate, each of these is bounded to their specific scenario and has some restrictions and issues. Mostly the researchers put their effort into accurately recognizing the sign language which involves different approaches. Usually in an image-based recognition approach, the process starts with the image acquisition which ends with the recognition. In this system, we are also using the same process but with different implementations which is said below.

The input given to the system is an image of the hand depicting the sign alphabet. The image is captured by the camera at the given instance of time.

The image is then saved onto the disk. The pre- saved images of sign letters and the test image are then loaded. Then these images are processed through the labelling by the LabelImg library in python. For the training and prediction analysis, we use the TensorFlow software with the help of some source code. Then these are trained and tested continuously for the correct prediction of the future outcome.

Advantages: It is cost efficient and platform independent. We can have a huge database of images to compare the input image with it. This will improve the speed of the system as well as increase its scope. It reduces time for searching image in a large database.

V. ARCHITECTURE

VI. METHODOLOGY

A. Collecting Datasets

In order to get the input from the user, we need to provide datasets before that. For the collection of datasets, we are using OpenCV which is a module in python libraries used for webcam access so through that we can get the dataset for the systems. In order to collect the dataset for 26 alphabets and few simple phrases, we need to collect 76 images using Jupyter Notebook. OpenCV is an open-source library for the computer vision. It provides the facility to the machine to recognize the faces or objects. The purpose of this is to understand the content of the images or to recognize it which may be an object, a text description or a model or a hand gesture. Like human, they understand the things by what they see, computer can also capture the surrounding around it. There are many ways to capture, one of the ways is to access the system through OpenCV. Accessing camera through it and collecting the datasets for each value and assigning it in each variable leads to the next process. The datasets are the must needed tool for the training of the system. Each datasets have its own weight which will be calculated and then the assigned for its particular value. The dataset will be collected like the following method.

???????B. Labelling Image

After getting the datasets, we need to assign them and label it for particular gestures. Each labelled gestures are set as array of elements and then processed along with the system. For the labelling process, we use the LabelImg to detect the particulars in the image. LabelImg is the tool for the labelling images for object detection. It allows your image to make a map of the objects in the image which automatically save your file in the XML format which allows your image for the labelling process. Labelling is the major process after the dataset collection. Although we have assigned each image to the particular element and each elements need to be labelled for the training process. The system is trained by the TensorFlow which discussed detailly below.

???????C. Setup Tensor flow

The next process is setting up the Tensor flow which is the API used to train the system's model in various ways like prediction, analysis, testing and so on. Using the labelled images of the datasets, the next step leading to the training of the datasets so for that we are using the tensor flow and it is also used for the prediction of the result of the expecting analysis. TensorFlow is an open-source API which is used to create learning model which make the system for the prediction analysis process. This provides a biggest belief for the system is that it provides a way for the prediction and detection analysis. This plays the key role for the system to train the dataset with its label according to its prognostic analysing process. It makes the system easier to understand its process of forecasting the results for the future states of the system.

???????D. Training and Testing

We need to create a separate folder for the training and testing so that the prediction and analysing will happen accordingly. With the help of images in the LabelImg library the datasets are tested and trained accordingly. The training part includes the images to train the certain process of the system which is capable of the prediction analysis used by the TensorFlow and the testing part includes the same copies of images for the testing process which is must to check if the system is processed correctly. The bugs and errors are rectified in the testing process which works after the training section. These two processes play a major role in the system’s working and checks it reliability before the detection process.

???????E. Detection

Detection is the process of the developing, evolving, training, testing, and fine tuning the process. It allows the system to check the availabilities and requirements for the system functioning. Detection allows you to check the pre requisite of the needed and finds the threats in the system, if there is any. With the help of prediction analysis, training, testing processes, the system is trained completely and then with the datasets, the input is given and the output is detected successfully.

???????

Conclusion

Sign language recognition is an important and evolving field that involves using machine learning and computer vision techniques to interpret sign language gestures and translate them into spoken or written language. There are various approaches to sign language recognition, including vision-based recognition, sensor- based recognition, and hybrid approaches. While there are still challenges in accurately recognizing sign language, the potential benefits for improving communication accessibility for people who are deaf or hard of hearing are significant, and ongoing research and development in this area continue to advance the technology.

References

[1] Alvi A.K., Azhar M. Y. B., Usman M., Mumtaz S., Rafiq S., Rehman R. U., and Ahmed I. (2004). Pakistan sign language recognition using statistical template matching, in Int. J. Inf. Technol., vol. 1, no. 1, (pp. 1–12). [2] Kausar S., Javed M.Y., and Sohail S. (2008). Recognition of gestures in Pakistani sign language using fuzzy classifier, in Proc. 8th Conf. Signal Process., Comput. Geometry Artif. Vis., (pp. 101–105). [3] Halim and Abbas G. (Jan. 2015). A kinect-based sign language hand gesture recognition system for Hearing- and speech-impaired: A pilot study of Pakistani sign language, in Assistive Technol., vol. 27, no. 1, (pp. 34–43). [4] Ren Z., Li H., Yang C., and Sun Q. (Jan. 2020). Multiple kernel subspace clustering with local structural graph and low-rank consensus kernel learning, in Knowl. -Based Syst., vol. 188, (pp. no. 105040). [5] Lauriola I., Gallicchio C., and Aiolli F. (May 2020). Enhancing deep neural networks via multiple kernel learning, in Pattern Recognit., vol. 101, (pp. no. 107194). [6] Jadooki S., Mohamad D., Saba T., Almazyad A. S., and Rehman A. (Nov. 2017). Fused features mining for depth-based hand gesture recognition to classify blind human communication, in Neural Comput. Appl., vol. 28, no. 11, (pp. 3285–3294). [7] Bauer B. and Hienz H. (Mar. 2000). Relevant features for video-based continuous sign language recognition, in Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit., (pp. 440–445). [8] Ibrahim N. B., Selim M. M., and Zayed H. H. (Oct. 2018). An automatic arabic sign language recognition system (ArSLRS), in J. King Saud Univ.-Comput. Inf. Sci., vol. 30, no. 4, (pp. 470–477). [9] Raees M., Ullah S., Rahman S. U., and Rabbi I. (Apr. 2016). Image based recognition of Pakistan sign language, in J. Eng. Res., vol. 4, no. 1, (pp. 1–21). [10] Ahmed H., Gilani S. O., Jamil M., Ayaz Y., and Shah S. I. A. (Jul. 2016). Monocular vision-based signer-independent. Pakistani sign language recognition system using supervised learning, in Indian J. Sci. Technol., vol. 9, no. 25, (pp. 1–16). [11] Tauseef H., Fahiem M. A., and Farhan S. (Jul. 2009). Recognition and translation of hand gestures to urdu alphabets using a geometrical classification, in Proc. 2nd Int. Conf. Visualisation, (pp. 213–217). [12] Shah S. M. S., Naqvi H. A., Khan J. I., Ramzan M., Zulqarnain, and Khan H. U. (2018). Shape based pakistan sign language categorization using statistical features and support vector machines, in IEEE Access, vol. 6, (pp. 59242–59252). [13] Behura A. (2021). The cluster analysis and feature selection: Perspective of machine learning and image processing, in Data Analytics in Bioinformatics: A Machine Learning Perspective, (pp. 249–280). [14] Bay H., Ess A., Tuytelaars T., and Van Gool L. (Jun. 2008). Speeded-up robust features (SURF), in Comput. Vis. Image Understand., vol. 110, no. 3, (pp. 346–359).

Copyright

Copyright © 2023 Yuvasri J , Sujitha S, Pavithra B, Kamali K, Rajalakshmi S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET52128

Publish Date : 2023-05-12

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here