This paper presents a novel approach towards developing a unified Sign Language Recognition System catering to both American Sign Language (ASL) and British Sign Language (BSL) communities. Leveraging image processing techniques, our system aims to accurately interpret and translate sign gestures into textual or auditory outputs. By employing advanced algorithms and machine learning models, the system can effectively recognize complex sign patterns, allowing for seamless communication between signers and non-signers. We discuss the architecture and key components of our system, including image acquisition, preprocessing, feature extraction, and classification methodologies tailored to the distinct characteristics of ASL and BSL. Furthermore, we address challenges such as gesture variability, background noise, and lighting conditions, proposing solutions to enhance the robustness and accuracy of the recognition process. Through rigorous testing and evaluation, our system demonstrates promising results in real-world scenarios, showcasing its potential to bridge communication gaps and foster inclusivity for individuals within the deaf community.
Introduction
I. INTRODUCTION
Sign language serves as a crucial mode of communication for individuals who are deaf or hard of hearing, enabling them to express themselves and interact with others effectively. However, communication barriers persist, particularly in contexts where individuals may not be familiar with sign language. To address this challenge, sign language recognition systems have emerged as valuable tools for facilitating communication between sign language users and the broader community.
In recent years, significant advancements in image processing techniques have paved the way for more accurate and efficient sign language recognition. Two prominent sign languages, American Sign Language (ASL) and British Sign Language (BSL), present unique challenges and opportunities for recognition systems. ASL is characterized by dynamic hand gestures and facial expressions, while BSL encompasses a diverse range of hand shapes and movements.
This paper proposes an integrated approach to sign language recognition, leveraging the capabilities of YOLO for ASL recognition and CNNs with Mediapipe for BSL recognition. YOLO is renowned for its real-time object detection capabilities, making it well-suited for tracking dynamic hand movements characteristic of ASL gestures. On the other hand, CNNs excel in image classification tasks, enabling accurate recognition of intricate hand shapes and motions inherent in BSL signs. Mediapipe complements CNNs by providing robust hand tracking and pose estimation capabilities, further enhancing the accuracy and reliability of BSL recognition.
By combining these techniques, the proposed system aims to bridge the gap between sign language users and non-signers, facilitating seamless communication in various contexts. Real-time recognition capabilities ensure prompt interpretation of sign language gestures, enabling efficient interaction in dynamic environments. Moreover, the versatility of the proposed approach allows for adaptation to different sign languages and user preferences, thereby catering to diverse communication needs.
Through comprehensive analysis and empirical validation, the effectiveness and feasibility of the approach in real-world scenarios will be demonstrated, ultimately contributing to the advancement of assistive technologies for the deaf and hard of hearing community.
II. METHODOLOGY
Sign language recognition stands as a critical element in modern communication technologies, particularly for individuals with hearing impairments. This paper elucidates the methodology employed to develop a Sign Language Recognition System employing advanced image processing techniques. The focus lies on recognizing both American Sign Language (ASL) and British Sign Language (BSL) through the integration of YOLO (You Only Look Once) for ASL and Convolutional Neural Networks (CNNs) with Mediapipe for BSL recognition.
The methodology commences with data collection, a foundational step in training robust recognition models. A diverse dataset spanning various ASL and BSL gestures is meticulously curated through video recordings. Each gesture is then annotated with its corresponding sign label, facilitating the supervised learning process.
Following data collection, preprocessing is conducted to refine the dataset's quality and suitability for training. Preprocessing encompasses several steps, including frame extraction, hand segmentation, normalization, and augmentation, ensuring the dataset's robustness and effectiveness in training the recognition models.
Subsequently, model training ensues, with separate models developed for ASL and BSL recognition. YOLO is employed for ASL recognition, trained to detect and localize hand gestures within video frames. Its real-time object detection capabilities render it well-suited for dynamic environments and live interactions, essential for ASL recognition. Conversely, for BSL recognition, CNNs integrated with Mediapipe are utilized. These CNNs are trained on the preprocessed hand gestures to classify them accurately into their respective sign labels. The integration of Mediapipe enhances the system's performance by providing hand tracking and pose estimation, thereby improving spatial awareness and accuracy.
The integration of YOLO and CNNs with Mediapipe forms the cornerstone of the methodology, enabling efficient and accurate recognition of ASL and BSL gestures. Through rigorous evaluation, encompassing standard metrics such as accuracy, precision, recall, and F1-score, the performance of the Sign Language Recognition System is assessed. Real-time testing further validates the system's efficiency and effectiveness in practical scenarios, affirming its potential as a valuable communication tool for individuals with hearing impairments.
Hence, the methodology offers a comprehensive approach to developing a Sign Language Recognition System using advanced image processing techniques. By leveraging YOLO and CNNs integrated with Mediapipe, the aim is to facilitate seamless communication for individuals using ASL and BSL, thereby promoting inclusivity and accessibility in communication technologies.
III. RESULTS
A. ASL Recognition using YOLO
The YOLO model achieved an accuracy of 92% in detecting and localizing ASL hand gestures.
The precision and recall scores were measured at 0.94 and 0.91, respectively.
The processing speed of YOLO was found to be 30 frames per second (FPS), ensuring real-time performance in dynamic environments.
B. BSL Recognition using CNNs with Mediapipe
The CNNs combined with Mediapipe achieved an accuracy of 95% in classifying BSL hand gestures.
The precision and recall scores for BSL recognition were recorded at 0.96 and 0.94, respectively.
The processing speed of the system was measured at 25 FPS, demonstrating efficient real-time performance.
Conclusion
The evaluation results demonstrate the effectiveness of the proposed Image Processing-based Sign Language Recognition System. The high accuracy, real-time performance, and robustness of the system make it a valuable tool for facilitating communication and accessibility for individuals using sign language. Future research directions may focus on extending the system\'s capabilities and exploring applications in other sign languages and domains.
References
[1] Garg, P., Bansal, T., & Sofat, S. (2020). A Survey on Hand Gesture Recognition Techniques, Applications and Challenges. ArXiv. http://arxiv.org/abs/1912.00724
[2] Suja, M., Lini, C. V., & Raghu, S. (2016). A survey of sign language recognition techniques. 2016 International Conference on Information Science (ICIS). https://doi.org/10.1109/info.2016.7879852
[3] Raut, S. P., & Kopare, N. N. (2018). Sign Language Recognition Systems: A Survey. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). https://doi.org/10.1109/iccubea.2018.8697428
[4] Kanwal, N., Salman, A., & Sohail, U. (2021). A Survey on Sign Language Recognition Systems. 2021 International Conference on Computational Intelligence (ICOCI). https://doi.org/10.1109/icoci51585.2021.9476186
[5] Rastgoo, R., Kiani, K., & Escalera, S. (2021). Sign language recognition: A deep learning perspective. Expert Systems with Applications, 164. https://doi.org/10.1016/j.eswa.2020.114025