Hand Gesture based Speech Recognition system for Hard of Hearing People

Authors: Sanjay S Tippannavar, Surya M S, Pilimgole Sudarshan Yadav , Mohammed Zaid Salman, Ajay M

DOI Link: https://doi.org/10.22214/ijraset.2022.47301

Abstract

For people who are intellectually challenged, this document provides a sign to speech translator. In contemporary society, the dumb have very little ability to converse with the normal person. Therefore, they are unable to interact with common people until common people, like ourselves, learn how to communicate through sign language. Dumb sign language is not something that everyone can learn because it is so challenging to perfect. Because of this, nobody can go and talk to these physically challenged persons. So, here is a system that would make it possible for those who are illiterate to speak with everyone. With today\'s technology, even the deaf may communicate by making hand gestures. Each hand gesture is set to play back a different audio message. Hand motions were captured by a flex sensor and an accelerometer, and the microcontroller then accepted these inputs. The flex sensor tracks hand movements and detects deflections, converting them into the right signals that trigger the system to produce recorded sound signals for transmission via the speakers. There are many different Arduino Boards available right now, and they can be used for a variety of things. The Arduino Mega 2560 is the ideal solution since it is the most cost-effective, allows faster computing, and can integrate and interface various sensors while consuming less power. The device under discussion is basically made up of a glove and a microcontroller-based system. Four strategically placed flex sensors are included in the data glove.

Introduction

I. INTRODUCTION

According to the WHO, there are 300 million deaf people and one million dumb people in the world. Communication power can be both a blessing and a curse. Expression of ideas and emotions is beneficial. It can be quite difficult for silent persons to communicate with non-mute people. Communication becomes extremely difficult because the majority of individuals are not trained in hand sign language. Communicating with others around or delivering a message becomes exceedingly difficult in an emergency or at other times when a mute person is travelling or among unknown individuals. We propose a smart speaking system that uses hand gestures and body language to enable mute persons interact with non-mute people. A speaker unit, a hand motion reading system, and motion and flex sensors are used in the system. Circuitry that runs on batteries powers this system. The system is operated and data processed using an Arduino Mega. The system offers pre-stored messages like "Good morning," "Can you please do me a favour," "Can you help me cross the road," and other common phrases to assist mute people in communicating basic messages. The system interprets people's hand movements for various hand movement variations. As input sensors, the system makes use of accelerometers and flex sensors. Sensor input is continuously received and processed by the microcontroller. Now, it employs a simple logic to look for messages that match the set of sensor values. Text-to-speech processing is used to find the sound signal in memory, retrieve it, and speak it aloud through the interfaced speaker when a matching value is found. With the help of a simple wearable system, mute people can now communicate with hearing people using a fully functional smart speaking system. Additionally, establishing a connection between people with special needs.

We outline a new technique for deciphering and interpreting gestures without words. A human interpreter is no longer required because doing so makes communication easier and less expensive. The messages can be customised to the needs of the individual because they are taken from a database built using predictive analysis.

The primary aim of the proposed work is to:

Using hand gestures to help people with cognitive problems communicate better.
Learn how to employ transducers and microcontrollers to reduce the size of the system.
To the maximum degree feasible, utilise an Arduino to reduce costs without sacrificing efficacy or relevance to the actual world.

The following are the remaining sections: Section II provided explanation of the work's prior research. The proposed mythology is thoroughly and algorithmically explained in Section III. With a comparative analysis, Section IV describes the proposed method's output. Section V wraps up the research project and discusses potential future research

II. RELATED WORK

A CNN based translator for American Sign Language (ASL) fingerspelling was developed by Garcia et al. They use a pre-trained GoogLeNet architecture trained on the ILSVRC2012 dataset as well as the ASL datasets to apply transfer learning to this issue. A model that frequently classifies letters a–e correctly for first-time users and another that typically does the same for letters a–k. One of the major flaws in this paper is how the author hypothesises that efficiency and accuracy could increase if more datasets were available. A work built on conjecture cannot be relied upon or applied in real life [1].

In this paper, Gupta et al. employed a specific kind of deep neural network. They built on a comparative examination of various convolution neural network models while looking at various sign language gestures. The technique for understanding sign language was created by analysing and employing CNN models. Again, the key problem is the lack of evidence that this can be applied in real time for hearing-impaired people. Simple analyses are unable to support the work, making it difficult for people with disabilities to transport a PC and its software around [2].

Computer vision-based models classify movements by using at least one camera and image processing algorithms. The key benefit of this technique is that movements may be made more conveniently for the user since an exposed hand is used. The camera, computer, and data processing software are the only expenses associated with the system. One of the greatest difficulties of these systems is carrying about a camera and CPU in a box or container. The effects of light also play a significant role; if the lighting is adequate, the system might not be able to recognise the hand gesture and might interpret the displayed signal incorrectly, making it challenging for the average person to use this system [4].

In order to recognise the hand gesture and translate it into speech, Sumadeep et al. have proposed a hardware-based system. In this project, there are two units. The receiver circuit is the other, and the first is the transmitter circuit. Microcontroller, Accelerometer, Accelerometer, and Flex Sensor make up the transmitter circuit. Speaker, Amplifier, and Audio Module make up the receiver. Using the accelerometer and flex sensor to detect a gesture, the A to D converter generates the appropriate digital output. The microcontroller is now receiving this information. In the database, the microcontroller matches these values. Particular information is sent to the recipient side for a specific match [5].

To recognise various signs, a platform known as Sign Language Recognition that was created using an algorithm is necessary (SLR). The goal of this article's authors, Shashidhar et al., was to examine a variety of recent SLR techniques that have been applied at various stages of recognition. This paper review provides a comprehensive viewpoint that enables us to design a system that effectively addresses the majority of the world's issues. All things considered, we can say that the evaluation was well-written and aided the readers in assessing the potential future work they might complete[6].

According to research by Zhou et al., a wearable system that uses machine learning is capable of properly translating American Sign Language hand movements into voice. The stretchy yarn-based sensor arrays used in the wearable sign-to-speech translation system, together with a wireless printed circuit board, provide great sensitivity and quick reaction times, enabling real-time translation of signals into spoken words with 98.6% accuracy. this work was done with usage of wearable gloves and applied machine learning. even though the efficiency is higher, in real-time a disabled person will not be able to spend more money on the software including the device and then use it to express their thoughts to a commoner [9].

An app for smartphones that can be used in two modes—user mode and training mode—and that also displays the battery level was developed by Punsara et al. It is laborious to use a smart phone-based mobile application and a Raspberry Pi together. Additionally, the application is built using Adobe XD and Flutter. After reading the work, it appears that the writers overcomplicated the goal by offering a sophisticated solution with a complicated usage. In order to operate the item without any problems at all, all consumers feel the need to comprehend how it works simply [11].

III. METHODOLOGY

The hardware-based glove system is divided into three parts:

Sensing Unit
Actuating Unit
Communication

Fig. 1 displays the proposed system's block diagram. It has two sensors: the Flex sensor, also known as the bend sensor, which records bending or deflection. In layman's terms, the flex sensor can be thought of as a variable resistor. The Arduino microcontroller receives analogue values that represent the force variation measured by the flex sensor. Here, the analog signal is matched to the hand gesture using sensory fusion, and the pre-recorded sound is played through speakers after media verification of the SD card's contents. The circuit is simple, effective, and has the least latency overall. Our work is practically supported by the 10-bit, high-precision ADC that is built into Arduino.

The Arduino Nano has been heavily utilised to miniaturise the system in order to lower the cost of the suggested system. Fig. 2 shows the circuit diagram for the suggested system. The microcontroller's digital pins are used to integrate the speaker, while its analogue pins are used to connect the flex sensors, SD card, and SPI. The use of high-quality flex sensors is essential because the material's and the sensor's sensitivity are crucial factors. We have utilised the user-defined function avg () to decode a particular audio message by calling it in the loop section. This method takes 200 readings of the flex sensor position and returns an average value to the main function. We have six separate audio messages encoded for a total of six permutations of flex sensor positions. The sensors are waterproof, resistant to high temperatures, and resistant to humid conditions, and they can survive direct sunshine without affecting how the system functions.

A mute person may experience various disabilities, such as losing a hand or limb. It takes a lot of courage to express one's own needs to the outside society under such circumstances. We developed the design of integrating an accelerometer to the microcontroller after taking into account the urgency of the situation. This system doesn't call for gloves to be worn because the user might be disabled. We have interfaced the SD card module to play pre-recorded messages by tilting the accelerometer in different directions and from different sides. At this point, all that the user needs to do is tilt the accelerometer in the prescribed directions. The SPI interface / communication is used by the 3D accelerometer to transmit the x, y, and z values to the board. The messages are played over the speaker based on this. The circuit diagram for the suggested system for the disabled with a three-dimensional accelerometer is shown in Fig. 3.

IV. RESULTS AND DISCUSSIONS

With the least amount of weight on the user's hands, Fig. 4 shows the hardware setup of the hand gesture glove integrated with a flex sensor and power supply unit. One important goal is to lighten the weight of the glove. The works in Section II demonstrate that a few of them use the Raspberry Pi single board computer, which adds weight and prevents real-time use of the system. As a result, users prefer our device over competing models because it is lighter, has shorter connections, and uses lithium rechargeable batteries inside the glove.

As shown in Fig. 4, the device is completed with few connections. When the hand gloves are worn, the waving action communicates a "Hello, Good Morning" message through the speakers interfaced on the hand gloves, depicted in Fig. 5. The 5V electromagnetic speaker produces high-quality, clear sound that is audible to the person being communicated with.

Similarly, various messages are communicated by the speaker for various actions. Fig. 6. shows a hand sign that produces “Can you please do me a favour”. This is one of the most important signs as the disabled person might need at any time. During emergencies, these sounds help the commoners, people in the surroundings understand that the mute person is facing trouble and needs help.

Expressing gratitude is among the most crucial things anyone can do, especially those who are disabled. To communicate the message, the action in Fig. 7 must be performed. I'm appreciative, "Thanks a lot." Both the user and the helper, to whom this message is being sent, are pleased when this action is used. To make sure that the disabled person doesn't feel as though they are being denied any form of communication, numerous signs have been added. To completely fit on the gloves, the system has been shrunk down.

The suggested system for the disabled is shown in Fig. 8 and interfaces an accelerometer without the use of flex sensors. The accelerometer's position is iterated over 200 times continuously, and the precise average value is used. The system needs just two seconds to recognise and activate the sound based on the movement of the accelerometer, then repeats the process over 200 times to obtain an accurate value.

Table I allows us to compare the outcomes of the connected studies, as mentioned in Section II. Trials have proven that all of the aforementioned tasks produced 100% correct results, and we can therefore confirm that the system's functioning prototype is suitable for use by individuals with disabilities. With a comparison of the findings from [1], we were able to achieve more precision and efficiency with no relative mistake in the work that was done. The work done in [4] has a superior idealogy, but with the carried-out experiments it has a few flaws that might be categorised as a downside and also risking the life of the person if the system fails to perform at the crucial times. The system in [13] is an improved version, however because people are experiencing hardship in real life, IoT usage and dependence should be decreased, taking into account the possibility that a loss of network or Wi-Fi connections might impair a device's functionality. Making it easier for those who have trouble hearing to communicate in real time and get assistance or attention when required. For a system to be deployed in real-time, several primary goals must be achieved.

I. Comparison of proposed work with Related work

Method	*[1]*	*[2]*	*[4]*	*[5]*	*[13]*	*Proposed Method*
Dataset/Realtime	RT	DT	DT	RT	RT	RT
Accuracy	85%	90%	80%	80%	70%	100%
Error Rate	15%	10%	20%	20%	20%	0%

Conclusion

The mute and deaf communities and the general public can more easily communicate thanks to sign language. The main goal of this work is to build a hand glove with a sensing unit, an actuator, and communication capabilities to help a disabled person interact with society and meet their needs. The majority of people in the modern world don\'t understand sign language, making it challenging for people with disabilities to interact with society. Consequently, a smart, portable, and affordable device has been created. Patients with low speech skills or who are paralyzed might also benefit from it. It may also be employed in commercial and intelligent home applications. The suggested system is user-friendly because it is easy to use and enables productive and successful interactions between people and computers. In order to reduce cost and weight and make the system smaller, we want to improve the sensors and combine the complete system onto a specially created PCB in the future. By evaluating the data, we wish to connect an LCD module so that the speaker system and LCD may be merged, as the message frequently has to be understood quietly in quiet settings. Therefore, using an LCD module is practical. We have achieved maximum efficiency and 100% accuracy in contrast to earlier attempts. Additional research demonstrates that delivering the work using hardware is possible and less expensive. Utilizing software-based applications demonstrates that the user or customer may need to buy a compatible device that supports the programme, carry the device around with the software, and spend time initializing the laptop or PC. Additionally, the gadget may be created for the blind, deaf, and dumb populations so they can use it in home automation systems to manage their environment independently and without the need for assistance from others.

References

[1] Garcia, B., & Viesca, S. A. (2016). Real-time American sign language recognition with convolutional neural networks. Convolutional Neural Networks for Visual Recognition, 2, 225-232. [2] Gupta, N. (2022). Sign Language Recognition Using Diverse Deep Learning Models. In International Conference on Artificial Intelligence and Sustainable Engineering (pp. 463-475). Springer, Singapore. [3] Tippannavar, S. S., Shivaprasad, N., & Kumar, P. (2022, March). Smart Home Automation Implemented using LabVIEW and Arduino. In 2022 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 644-649). IEEE. [4] Jadhav, A. J., & Joshi, M. P. (2016). Hand Gesture recognition System for Speech Impaired People. International Research Journal of Enginnering and Technology (IRJET), 3, 1171-1175. [5] Sumadeep, J., Aparna, V., Ramani, K., Sairam, V., Kumar, O. P., & Krishna, R. L. P. (2019). Hand Gesture Recognition And Voice Conversion System for Dumb People. [6] M. Safeel, T. Sukumar, S. K. S, A. M. D, S. R and P. S. B, \"Sign Language Recognition Techniques- A Review,\" 2020 IEEE International Conference for Innovation in Technology (INOCON), 2020, pp. 1-9, doi: 10.1109/INOCON50539.2020.9298376. [7] Shashidhar R, Arunakumari B. N., A S Manjunath, Roopa M, \" Indian Sign Language Recognition Using 2-D Convolution Neural Network and Graphical User Interface\", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.2, pp. 61-73, 2022.DOI: 10.5815/ijigsp.2022.02.06. [8] Zhou, Z., Chen, K., Li, X., Zhang, S., Wu, Y., Zhou, Y., ... & Chen, J. (2020). Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nature Electronics, 3(9), 571-578. [9] Maharjan, P., Bhatta, T., Salauddin, M., Rasel, M. S., Rahman, M. T., Rana, S. M. S., & Park, J. Y. (2020). A human skin-inspired self-powered flex sensor with thermally embossed microstructured triboelectric layers for sign language interpretation. Nano Energy, 76, 105071. [10] Punsara, K. K. T., Premachandra, H. H. R. C., Chanaka, A. W. A. D., Wijayawickrama, R. V., & Nimsiri, A. (2020, December). IoT Based Sign Language Recognition System. In 2020 2nd International Conference on Advancements in Computing (ICAC) (Vol. 1, pp. 162-167). IEEE. [11] Kudrinko, K., Flavin, E., Zhu, X., & Li, Q. (2020). Wearable sensor-based sign language recognition: A comprehensive review. IEEE Reviews in Biomedical Engineering, 14, 82-97. [12] Dubey, P., & Shrivastav, M. P. (2021). Iot Based Sign Language Conversion. International Journal of Research in Engineering and Science (IJRES), 9(2), 84-89. [13] Choudhary, D. K., Singh, R., & Kamthania, D. (2021, April). Sign language recognition system. In Proceedings of the International Conference on Innovative Computing & Communication (ICICC). [14] More, V., Sangamnerkar, S., Thakare, V., Mane, D., & Dolas, R. (2021). Sign language recognition using image processing. JournalNX, 85-87.

Copyright

Copyright © 2022 Sanjay S Tippannavar, Surya M S, Pilimgole Sudarshan Yadav , Mohammed Zaid Salman, Ajay M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET47301

Publish Date : 2022-11-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here