Android Based Object Detection System for Visually Impaired

Authors: Riya Gawande, Mrs. Shraddha P. Mankar, Dipanshu Sankhala, Nikhita Watpal, Siddhi Vispute

DOI Link: https://doi.org/10.22214/ijraset.2024.58593

Abstract

This review paper presents the design and development of a comprehensive mobile application aimed at enhancing the daily lives of visually impaired individuals. The proposed mobile app offers real-time assistance for various visual recognition tasks, including object detection, distance estimation, currency recognition, barcode detection, color recognition, and emotion analysis. The primary objective of this application is to provide visually impaired users with a powerful tool to navigate their surroundings, identify objects and currency, scan barcodes, discern colors, and even gauge the emotions of those they interact with. The app integrates pre-trained deep learning models for object detection and facial emotion analysis, computer vision techniques for distance estimation, and optical character recognition for currency recognition. User feedback and engagement are central to the ongoing improvement of the application, ensuring that it remains a valuable resource for the community it serves. Ethical considerations and privacy concerns are also addressed, with a focus on data security and user privacy. By presenting the development and functionalities of this app, this review paper not only contributes to the field of assistive technology but also underscores the importance of harnessing cutting-edge technology to improve the quality of life for visually impaired individuals.

Introduction

I. INTRODUCTION

In a world where millions contend with visual impairments, comprehending and navigating indoor environments presents a unique set of challenges. The visually impaired often encounter difficulties moving autonomously within enclosed spaces, where obstacles and objects can vary widely in position and type. Traditional aids, such as white canes, though helpful in some instances, do not provide real-time information about indoor objects and surroundings. Recognizing the specific challenges faced by visually impaired individuals in indoor settings, this project aims to introduce a pioneering Android application tailored to enhance their independence and confidence. Our application is designed to assist with object detection, distance estimation, currency recognition, barcode scanning, color identification, and emotion analysis, with a Leveraging the device's built-in camera, the application employs a combination of deep learning models, including object detection using You Only Look Once (YOLO) V3 and Single Shot Detector (SSD), distance estimation with the Mono-depth algorithm, currency recognition based on OCR, barcode detection, color identification, and emotion analysis. It detects, describes, and recognizes indoor objects, provides real-time information about object distances, and delivers comprehensive object-related details through audio output, which can be seamlessly channeled through headphones or the device's speaker. An essential characteristic of this system is its suitability for indoor environments, eliminating the need for external cameras or sensors. By emphasizing indoor functionality, this project addresses the unique challenges faced by the visually impaired within enclosed spaces, enhancing their ability to navigate and interact with their surroundings. The central aim of this project is to showcase how advanced computer vision techniques can significantly improve the quality of life for visually impaired individuals during indoor activities. We will present a comprehensive overview of our application, detailing its various modules, functionalities, and its potential to empower and support the visually impaired in indoor settings. The project implemented an Android-based application designed to solve the problems faced by visually impaired people while navigating indoor environments. Traditional tools such as the white cane are limited in providing real information about indoor products and problems. The app uses the Android device's built-in camera and computer vision technology to fill this gap. The app facilitates interaction with visually impaired users by providing real-time information about indoor equipment, distance and other relevant details through voice output. More importantly, it is designed for an indoor environment and should not have external cameras or sensors. The project aims to increase the independence and confidence of blind people in prisons by focusing on in-house work. The main goal of the project is to show that the use of computer vision technology can improve the quality of life of visually impaired people while navigating at home. From an overview of the app's structure and features, the project focuses on the app's ability to improve and support visually impaired people manage the indoor environment.

II. MOTIVATION

The motivation behind this project, aimed at assisting visually impaired individuals with their indoor activities, is multifaceted and deeply rooted in a commitment to making a positive impact on the lives of those with visual impairments. Several key factors drive the project's motivation. First and foremost, the project is motivated by the goal of empowering independence among visually impaired individuals. Vision loss often poses significant challenges to performing everyday tasks, which can lead to a greater reliance on assistance from others. This project seeks to change that by providing technological solutions that enhance self-sufficiency. By offering tools and features tailored to the unique needs of the visually impaired, it aims to foster greater autonomy in daily life. this system. Raspberry pie has to be powered constantly with the help of a power bank which entails the user to carry a power bank along with him/her all the time. Moreover, this system tells the user the name of the detected objects.

The motivation behind the program stems from a strong commitment to promoting the participation, independence and empowerment of visually impaired people. Despite advances in technology, navigating the home environment is still a challenge for visually impaired people. Traditional service tools, while useful in some cases, often do not provide real-time information about indoor equipment and issues. By creating an Android application suitable for indoor use, we aim to close this important gap and offer practical solutions that will improve the lives of visually impaired people. We are motivated by our belief that all people, regardless of their abilities, should be equal and be able to manage their environment safely and freely. In addition, our activities are directed to create solutions to new problems that are beneficial to people's lives by using the power of today's technologies, especially computer vision and deep learning. Using the capabilities of smartphones and image processing equipment, we strive to create efficient and effective tools that will increase the independence and quality of life of visually impaired people. Finally, our passion is rooted in our belief in the power of technology to break down barriers and create greater unity. Through this project, we hope to help visually impaired people move around in a more comfortable, safe and dignified environment, allowing them to participate and contribute to their communities.

III. RELATED WORK

The study on the Android application mentioned above for indoor navigation and object search for the visually impaired consists of research and technological studies aimed at solving a similar problem. Some of the main areas of work are:

Technological Services for the Visually Impaired: Various services have been developed to assist the visually impaired with navigation and navigation. This includes devices such as wearable cameras, smart sticks, and navigation apps. Research in this area focuses on improving the accuracy, usability, and effectiveness of technology to meet user needs.
Computer Vision for Object Detection: Extensive research has been done on computer vision for object detection, including deep learning-based methods such as YOLO (You Only Look at One Leg) and SSD (Single Shot Detector). This technology, which is used in many areas including object recognition, can provide instant detection of objects in complex environments.
Distance and Depth Estimation: Examining distance estimation and depth perception using relevant computer vision techniques to provide instant information about the distances of objects in the indoor environment. Depth algorithms and stereo vision are often used to estimate the depth of images captured by cameras.
Accessibility and Inclusive Design: Accessibility and inclusive design studies focus on the use of technology to provide easy access to users with disabilities, including the visually impaired. This includes research on designing user interfaces, developing assistive technologies, and ensuring digital content is accessible to all users.

IV. PROPOSED WORK

A. Architecture

B. Activity Diagram

C. Object Detection

Object detection is the principal objective of this system. It encompasses two key elements: object classification and object localization. Object detection entails categorizing objects into predefined classes. In essence, object classification assigns a label to the object within the user's surroundings, thereby providing real-time information about the object's identity.
Object detection is an important task in computer vision that involves identifying and locating objects of interest in images or videos. With the rapid development of deep learning and convolutional neural networks (CNN), object detection has made great progress in recent years and has been widely used in many fields such as driving, surveillance, robotics, healthcare, and virtual reality.
Object positioning: The first step in object detection is to determine the position of the object in the input image. This involves identifying regions of interest (ROIs) that may contain artifacts. The most common placement methods include sliding window methods, regional bidding methods (such as search engines, EdgeBoxes), or new technologies such as Regional Bidding Networks (RPN).
Feature Extraction: Once ROIs are identified, feature extraction is performed to extract information from these regions and display the information contained within them. Convolutional neural networks (CNN) are frequently used for extraction due to their ability to learn hierarchical representations of images. Features obtained from ROIs help distinguish different objects.
Object Classification: In this step, subtraction is used to classify the ROI into different classes or groups. Classification is usually done using a softmax classifier or a similar algorithm, which provides a classification of possible objects for each ROI. CNN is widely used for classification due to its effectiveness in learning discrimination.
Bounding box regression: After splitting the ROI, use bounding box regression to adjust the initial bounding box integration obtained during placement. This step helps improve the fidelity of the object by adjusting the bounding box to better align with the border of the object.
Post-processing: Once object detection is completed in all ROIs, post-processing steps can be used to improve the results. This may include filtering out findings below a certain confidence level, removing minor or irrelevant items, or applying certain restrictions.

D. Distance Calculation

The distance of the object from the visually impaired person will be provided. The OpenCV (Open source Computer Vision Library) will be utilized to compute the distance. The Triangle Similarity Law will be utilized to compute the distance.
Knowing the object's distance from the visually impaired user would be more advantageous than simply knowing the object's name since it will help the user get a sense of the relative space surrounding him. The calibration module must be run each time the program is started for the first time since the device's camera needs to be calibrated before it can calculate the distance.
Finding the camera lens's focal length is the primary goal of calibration since it will be required for other calculations. Finding the camera lens's focal length is the primary goal of calibration since it will be required for other calculations.
Monocular depth estimation: Monocular depth estimation extracts depth information from a single image captured by a monocular camera. This can be done using a variety of techniques, including geometric methods, deep learning, or both. Deep learning models such as convolutional neural networks (CNN) can directly estimate the depth map of the input image by examining large datasets with real depth data.
Stereo Vision: Stereo vision is based on the following principles: Triangulation estimates depth by comparing the difference between corresponding points in two images captured on a stereo camera set. By calibrating the stereo camera system and calculating the difference using algorithms such as block comparison or half-world comparison, the depth of an object will be accurately estimated. Stereo vision provides depth maps and is often used in robotics and depth measurement.
Time-of-flight (ToF) sensors: Time-of-flight sensors emit a pulse of light, or infrared light, and measure the time it takes for the pulse to travel to an object for the sensor and back. The ToF sensor can estimate the distance of objects in space by calculating the travel time of the flight. ToF sensors provide accurate, fast depth measurements and are widely used in electronic devices such as smartphones and game consoles.
Structured Light: Structured light creates a familiar structure on the surface and examines object deformation caused by the geometry of the object. By measuring the distortion of the projected pattern, the depth of the object can be inferred. Structured light technology is used in 3D scanning systems and depth cameras such as Microsoft Kinect.

E. Currency Recognition

To implement currency recognition in the app, collect a diverse dataset of currency images, preprocess them, and extract relevant features. Train a machine learning model, like a CNN, to classify denominations. Integrate the model into the app for real-time recognition using the device's camera. Ensure accessibility with text-to-speech or audio feedback for the visually impaired. Test rigorously, make the system adaptable to various currencies, and gather user feedback for ongoing improvements, creating an inclusive tool.
Image collection: The first step is to collect images of the results to be seen. This can be done using a camera, scanner or other imaging device. In practical applications such as mobile banking applications, images are often captured using the device's camera.
Pre-Processing: Pre-processing is done to improve the quality of the images obtained and to prepare them for feature extraction. Preprocessing techniques include transforming, noise removal, contrast enhancement, and normalization.
Feature extraction: Extract features from the previous image to represent the features of the currency. These features may include color histograms, texture descriptors, edge histograms, or other image descriptors. Feature extraction techniques aim to capture unique patterns and features that distinguish one result from another.
Preparation of training data: In model-based control, recording data containing different images of different results must be trained to recognize the model. Each image in the dataset is associated with a label indicating the corresponding value.
Training a recognition model: Use a dataset to train a recognition model, such as a machine learning classifier or deep learning neural network. During the training process, the model learns to identify features extracted for similar results. Various machine learning algorithms such as support vector machine (SVM), random forests, or convolutional neural networks (CNN) can be used for currency recognition.
Evaluation and evaluation: Once the recognition model is trained, it is evaluated using separate data sets to evaluate its performance. Test data includes images that the model does not see at run time. Metrics such as accuracy, precision, recall, and F1 score were calculated to evaluate the performance of the recognition model.
Integration with applications: Integrate the value recognition model into the target application, such as a mobile application or accounting system. The application uses an authentication model to instantly verify and distribute currency values.

F. Emotion Recognition

To implement emotion recognition in the app for visually impaired individuals, a multifaceted approach is essential. This involves several key steps. First, gather a diverse dataset of facial images displaying a range of emotions, ensuring representation of happiness, sadness, anger, and other emotional states, and including images of different individuals for variety.
Preprocess these images by resizing, normalizing lighting conditions, and enhancing contrast to optimize their suitability for analysis. Next, extract relevant features from these images, which could include facial landmarks, texture, and color patterns associated with different emotions. The core of the system involves training a machine learning model, commonly a Convolutional Neural Network (CNN), specifically designed for multi-class emotion classification.
Image capture: The first step is to capture a photo or video frame containing a human face. This can be done using a camera, webcam or other imaging device. For real-time applications such as emotion recognition or video analysis, images are often captured continuously from real-time streams.
Face Detection: Perform face detection and extract faces from input images or photos. video post. This step involves identifying regions of interest (ROIs) that may contain human faces. face detection algorithms such as Viola-Jones, Histogram of Directed Gradins (HOG), and deep learning such as convolutional neural network (CNN).
Feature extraction: Extract features from front-face images to capture facial and emotional cues. These features include geometric features (e.g., distance between facial features), shape features (e.g., texture descriptors, color histograms), or deep learning representation (e.g., feature vector from pre-trained CNNs (e.g., VGG, ResNet) or Inception). may contain. ).
Curriculum preparation: Training the cognitive model in the supervised learning approach requires recording documents containing facial images with emotional labels (such as happy, sad, angry, surprised). Each image in the dataset is associated with a label that indicates the desired expression.
Train the recognition model: Use recorded data to train a recognition model, such as machine learning classification or deep neural network learning. During training, the model learns to identify extracted features for corresponding emotions. Various machine learning algorithms such as support vector machine (SVM), random forest, or deep learning (e.g., CNN, neural networks) can be used for emotion recognition.
Testing and Evaluation: After the recognition model is trained, it is evaluated using separate data sets to evaluate its performance. Test data includes images that the model does not see at run time. Calculate metrics such as accuracy, precision, improvement rate, F1 score, and confusion matrix to evaluate the effectiveness of the model's recognition theory.

G. Color Recognition

Implementing color recognition in the app for visually impaired users involves a structured approach. First, gather an extensive dataset of color images representing a wide range of colors and variations. Preprocess these images by standardizing lighting conditions, resizing, and enhancing contrast to prepare them for analysis.
Image capture: The first step is to capture a photo or video frame with the color of the object or scene to be known. This can be done using a camera, scanner or other imaging device. For real-time applications, images are typically captured continuously from real-time streams.
Preprocessing: Preprocessing of the input image to improve its quality and prepare it for color recognition. This will include things like resizing, removing noise, increasing contrast, and changing color settings. Preprocessing helps standardize input data and improve the efficiency of subsequent steps.
Color space conversion: Convert the input image from the RGB color space to a color space suitable for color analysis. Color systems commonly used in color recognition include RGB, HSV (hue, saturation, value), LAB (CIELAB), and YUV. Each color space has its own advantages and can be chosen according to the specific needs of the application.
Segmentation: Segmentation is done to define the region of interest (ROI) corresponding to objects or areas with different characteristics in the input image. color These steps may include thresholding, integration, or edge detection techniques to separate different colors from the background.
Feature extraction: Extract features from segmented color regions to characterize color behavior. These features may include color histograms, color duration, or other descriptions of the color distribution in each area.
Curriage preparation: In the educational audit method, color coders must train the color recognition model of informative text containing images accompanying relevant explanations. Each image in the dataset is associated with a label that indicates a known color.
Training a recognition model: Use a dataset to train a recognition model, such as a machine learning classifier or deep learning neural network. During training, the model learns to match the extracted color with the corresponding color. Various machine learning algorithms such as support vector machines (SVM), k-nearest neighbors (k-NN), or deep learning (such as CNN) can be used for color recognition.

H. Barcode Detection

Implementing barcode detection in the app for visually impaired users involves several key steps. Begin by integrating a barcode scanning library or API that's capable of recognizing and decoding barcodes. These libraries often come with pre- trained models for barcode recognition.
Leverage the device's camera to capture real-time images of barcodes. When a barcode is detected, the app can overlay feedback on the screen, such as the decoded information or a description of the product or item associated with the barcode. For accessibility, the app should provide audio feedback through text-to-speech, conveying the barcode's content to users.
Image capture: The first step is to capture the image or video frame containing the barcode to be scanned. This can be done using a camera, scanner or other imaging device. For real-time applications, images are typically captured continuously from a real-time video stream.
Pre-Processing: Pre-processing the input image to improve its quality and prepare it for barcode scanning. This will include functions such as conversion, noise removal, contrast enhancement and edge detection. Preprocessing helps standardize input data and improve the efficiency of subsequent steps.
Edge Detection: There is often a difference between black and white lines in barcodes. First use an edge detection algorithm (such as Sobel, Canny, or Gaussian Laplace) to detect edges in the image. Edges represent the transition between dark and light in the barcode.
Contour Detection: Contour detection algorithms are used to identify potential barcodes in input images based on visible edges. A rectangle or text corresponding to a rectangle is generally considered a candidate barcode area. Additional parameters such as aspect ratio and area can be used to filter out non-barcode text.
Barcode localization: Once the candidate barcode region is identified, further analysis is performed to show the precise boundaries of the barcode. This will involve placing a bounding box or rectangle around the displayed text to accurately define the barcode area.
Decoding: Determination of barcode characters extracted to store data encoding. Many barcode symbols are available, including UPC, EAN, Code 128, QR Code and Data Matrix. Use decision algorithms specific to each symbol to interpret the barcode and extract data.
Checking and correcting errors: The determined barcode data will be checked and corrected for accuracy and reliability. Error correction algorithms, such as Reed-Solomon error correction for some types of barcodes, can be used to correct errors introduced during encoding or decoding.
Integration with applications: Barcode data can be integrated into applications such as retail stores, inventory management software or documents. Please use mobile barcode scanning. The application uses decision-making information for a variety of purposes, such as identifying products, finding items for sale, or accessing online resources.

V. RESULTS

The system is able to detect the objects around the user with 87% accuracy. In order to detect more objects, the system needs to be re-trained with a larger dataset.

VI. FUTURE SCOPE

The future scope for an app designed for visually impaired people, featuring modules such as object detection, emotion recognition, color recognition, currency detection, and barcode detection, holds significant promise. With ongoing advancements in machine learning and AI, these modules can become more accurate and efficient, greatly enhancing their utility for the visually impaired. Furthermore, integration with wearable devices like smart glasses can provide a hands-free experience, and expanding the app's capabilities to include navigation and wayfinding would be invaluable. Language and OCR integration, a supportive user community, and educational modules can further enhance the app's functionality. Maintaining a user friendly interface, ensuring data privacy, and adhering to global accessibility standards are key considerations, as well as collaborating with relevant organizations for wider adoption and support. It is important to continuously improve the accuracy and efficiency of target detection algorithms. This will include improving the fundamental structure of machine learning, optimizing real-time processing algorithms, and reducing errors and omissions. Integration of advanced sensors such as LiDAR (light detection and ranging) or radar with cameras can provide more in-depth information, improving the ability to accurately identify and separate objects, especially in a challenging or challenging environment. weather. In addition to simple object detection, future iterations of the system may include semantic understanding and context recognition. This means not only identifying objects, but also understanding their relationships, functions and effects in different situations. Integrating object detection systems with navigation tools will provide greater assistance to the visually impaired. This may include using GPS and map data to generate audio or feedback to guide users around obstacles or to specific locations. Designing features that provide immediate feedback or descriptions of the environment can improve the user experience. This can include not only the identification of objects but also elements such as lighting, textures and spatial layout. Providing the user with a customizable interface based on the user's preferences and needs is crucial to ensuring interaction. Future work will include creating a better information management system, support for multiple languages, and better compatibility with assistive devices. Integration of object detection with wearable devices such as smart glasses or haptic vests can provide users with a more interactive and intuitive experience. This will enable hands-free working and promote environmental awareness. Optimizing object detection algorithms used at the edge can increase responsiveness and reduce dependence on cloud services. This includes the construction of heavy structures and efficient pipelines suitable for low-energy products with low-cost components.

It is crucial to collect continuous feedback from visually impaired users and incorporate their input into the design process. This ensures that the system remains user-friendly and addresses the needs and challenges faced by its users. Partnering with organizations and communities dedicated to accessibility and disability rights can help ensure that diagnostic products are designed and used in ways that respect and promote the freedom and independence of the visually impaired.

Conclusion

In conclusion, the app designed for visually impaired individuals, featuring modules such as object detection, emotion recognition, color recognition, currency detection, and barcode detection, represents a beacon of hope for a more inclusive and accessible future. With advancements in technology and AI, the app\'s potential for improving the quality of life for the visually impaired is substantial. Its evolution may lead to greater independence, better social interaction, and enhanced everyday experiences. As the app continues to grow and adapt, it has the power to break down barriers and provide new found opportunities, ensuring that the visually impaired community can navigate the world with confidence and autonomy. This app embodies the principles of innovation, compassion, and inclusivity, offering a brighter future for those who rely on its support.

References

[1] Ruffieux S, Hwang C, Junod V, Caldara R, Lalanne D and Ruffieux N. (2023). Tailoring assistive smart glasses a according to pathologies of visually impaired individuals: an exploratory investigation on social needs and difficulties experienced by visually impaired individuals. Online publication date: 1-Jun- 2023. [2] Jafri, Rabia & Ali, Syed & Arabnia, Hamid & Fatima, Shameem. (2013). Computer vision-based object recognition for the visually impaired in an indoors environment: a survey. [3] Murad, M., Rehman, A., Shah, A.A., Ullah, S., Fahad, M., Yahya, K.M.: RFAIDE—an RFID based navigation and object recognition assistant for visually impaired people. In: 7th Inter- national Conference [4] CICERONE- A Real Time Object Detection for Visually Impaired People Therese Yamuna Mahesh1, S S Parvathy2, Shibin Thomas2. [5] Ruxandra Tapu, Bogdan Mocanu, Andrei Bursuc, Titus Zaharia; Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2013, pp. 444-451 [6] Shin, Byeong-Seok, and Cheol-Su Lim. \"Obstacle detection and avoidance system for visually impaired people.\" In International Workshop on Haptic and Audio Interaction Design, pp. 78-85. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. [7] Rodríguez, Alberto, J. Javier Yebes, Pablo F. Alcantarilla, Luis M. Bergasa, Javier Almazán, and Andrés Cela. \"Assisting the visually impaired: obstacle detection and warning system by acoustic feedback.\" Sensors 12, no. 12 (2012): 17476-17496. [8] Zhang, Qian, and Wei Qi Yan. \"Currency detection and recognition based on deep learning.\" In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1-6. IEEE, 2018. [9] Ajmal, Aisha, Christopher Hollitt, Marcus Frean, and Harith Al-Sahaf. \"A comparison of RGB and HSV colour spaces for visual attention models.\" In 2018 International conference on image and vision computing New Zealand (IVCNZ), pp. 1-6. IEEE, 2018. [10] Hansen, Daniel Kold, Kamal Nasrollahi, Christoffer Bøgelund Rasmussen, and Thomas B. Moeslund. \"Real-time barcode detection and classification using deep learning.\" In International Joint Conference on Computational Intelligence, pp. 321-327. SCITEPRESS Digital Library, 2017. [11] Kosti, Ronak, Jose M. Alvarez, Adria Recasens, and Agata Lapedriza. \"Emotion recognition in context.\" In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1667-1675. 2017. [12] Natanael, Gafencu, Cristian Zet, and Cristian Fo?al?u. \"Estimating the distance to an object based on image processing.\" In 2018 International Conference and Exposition on Electrical And Power Engineering (EPE), pp. 0211-0216. IEEE, 2018.

Copyright

Copyright © 2024 Riya Gawande, Mrs. Shraddha P. Mankar, Dipanshu Sankhala, Nikhita Watpal, Siddhi Vispute. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET58593

Publish Date : 2024-02-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here