Implementing Image-to-Speech Recognition by Capturing Image Frames for Visually Impaired

Authors: Prof. Jayant V. Kulkarni, Vedant Bhoir, Prathamesh Deshpande, Sakshi Deshpande

DOI Link: https://doi.org/10.22214/ijraset.2023.50917

Abstract

The difficulties that the visually impaired people go through is often been not considered. They have difficulty reading written materials on a daily basis such as newspaper or posters or many other things used frequently. Their everyday life is very difficult as communication is totally based on speech and text .Based on the literature survey, the research work until now is limited to the accessibility of using software also, the products are not cost effective. Hence, the main objective of the paper is to make a software that is easily accessible and cost effective. This paper is a prototype software that helps the visually impaired people as well as the people with low vision to read and understand the text from the image they are scanning. The prototype is based on the OCR tool and a google text to speech function. The software is tested on numerous photos after completion.The software correctly identified the text. This will make easier for them to read more efficiently.

Introduction

I. INTRODUCTION

Visual impairment is one of the biggest limitations for humanity, especially in these days information is communicated a lot by text messages rather than voice. Our aim is to help people with visual impairment. In this project we develop a device that converts image text to speech.

The main research work includes the OCR applications in various flied. .Dr. Yusuf Parvej,The paper presents introduction, major research work and applications of Optical Character Recognition in various fields [1].The main drawbacks of the existing system is limitation to the languages. The proposed system is a prototype software that can be used to detect any language and output the scanned image in speech.

Recently, Miss. Tejashree U. Bagayatkar proposed a system that contents 4 modules they are: camera module, image processing module, optical character recognition module and text-to-speech module. The device output is in the form voice so, it can be easily hear by visually impaired people [5]. The main disadvantage is that the device is not cost effective. Hence, main of aim of this paper is to build a prototype that will help visually aid and low visioned people to read efficiently without any help. Also to make it cost effective so as it is available easily.

II. RELATED WORK

Dr. Yusuf Parvej,[6]The paper presents introduction, major research work and applications of Optical Character Recognition in various fields. At the first introduction of OCR will be discussed and then some points will be stressed on the major research works that have made a great impact in character recognition. And finally the most important applications of OCR will be covered and then concluded.

Karez Hamad,[7]Optical character recognition is an active research area that attempts to develop a computer system with the ability to extract and process text from images automatically. The objective of OCR is to achieve modification or conversion of any form of text or text-containing documents such as handwritten text, printed or scanned text images, into an editable digital format for deeper and further processing. Therefore, OCR enables a machine to automatically recognize text in such documents

Saharsa Vanria Thota [8] Visual impairment or vision loss is defined as the decreased ability to see clearly and cannot be fixed using glasses. Blindness is the term used for complete vision loss. The common causes of vision loss are uncorrected refractive errors, cataracts and glaucoma. People with visual impairment face a number of difficulties in normal daily activities like walking, driving and reading.

Bhalaji Natarajan[9] This project is implemented using a handheld page or document scanner, an external Bluetooth module when the scanner does not have an inbuilt Bluetooth module, an Android application to perform OCR and speech synthesis and an Android mobile phone.

The cost involved in developing the system is significantly low and the system provides a friendly user interface for the visually impaired people.

K Karthick [10] In this paper, the author has discussed handwritten OCR systems classification and the steps involved in OCR which is one of the automatic identification techniques and also given information regarding recent applications used.

G.A.E.Satish Kumar [11] In this paper the author presented a computer vision technology to extract text from scene images and electronic aid is used to convert text to speech. Design and Development of Smart Assistive Device for Visually Impaired People presented in May 2021, 2016. In this paper the author introduced a smart stick with an RFID technology and IR sensors to give direction and to detect obstacles respectively.

Jamshed Menon, [12] The offline system is a static system in which input data is in the form of scanned images while in online systems the nature of input is more dynamic and is based on the movement of the pen tip having a certain velocity, projection angle, position and locus point. Therefore, an online system is considered more complex and advanced0, as it resolves the overlapping problem of input data that is present in the offline system.

III. METHODOLOGY/EXPERIMENTAL

Algorithm:

Step 1: Installation :

The most precise free OCR engine is Tesseract. In order to identify the text-containing area, it first converts the user's image to a grayscale image. It enters all the text from the.png file and transforms it to a.txt file

2. Step 2 :Image Capturing Process:

It gained access to the system's camera by using OpenCV and other tools. The image is saved as a file as soon as it is obtained. This document is a.png file.

3. Step 3 : Text to Speech :

The text from the text image is added using Pytesseract to the.txt file that is generated. The text file is then accessible using gTTTs, a feature that converts text to speech.

4. Step 4: Audio File :

It stores that file as an mp3 file. The operating system then processes that file, producing sounds of text being read from an image.

IV. FLOWCHART

We take a picture using the laptop's camera that includes text and characters.
After taking the picture, we prepare it for image pre-processing.
We also examine settings like cropping, noise removal, edge detection, grayscale conversion, and threshold.
It recognises and isolates the text-containing area of an image, turning it into a scanned document.
We installed the tesseract to convert images to text.
The best free OCR engine is this one. In order to identify the text-containing area, it first converts the user's image to a grayscale image. The.png file is changed into a.txt file.
The image is saved as a file as soon as it is obtained.
The text was then read from that file and turned to a picture using Pytesseract.
The text file is then retrieved via gTTTs, a feature that turns text into speech.
Once we obtain the audio file, we will be able to play it back using the laptop's audio jack.

V. RESULTS

This system has the accuracy to understand both words and numbers. After placing a particular image in front of the camera, OpenCV library had successful captured the image and pytesseract library converted the texts. We easily got the speech. After training the model to read numbers, the accuracy of the system increased, and hence the coding part of the system is completed .After completion the software is tested on various images. The total number of images count up to 500.The software was able to detect the text accurately.

VII. ACKNOWLEDGEMENT

It is our great pleasure to acknowledge sense of gratitude to all, who have made it possible for us to complete this EDI project with success. It gives us great pleasure to express our deep gratitude to our guide Jayant Kulkarni for their support and help from time to time during project work. It is our pleasure to acknowledge the sense of gratitude to Prof. Rajesh Jalnekar our director for their great support and encouragement in project work. Finally we would like to thank all faculty members and all our colleagues for their valuable suggestions and support.

Conclusion

Visually handicapped people can easily listen to the document\'s text thanks to the installation of this system. One can change the text to the desired language using translation software. He or she can translate the text into another language and then again into audio by using the Google speech recognition tool. We can also widen the field of view for long-range capturing. work on creating an effective portable capture device could also be emphasized.

References

[1] Prof. Vaibhav V. Mainkar,et al:Raspberry pi based Intelligent Reader for Visually Impaired Persons.IEEE Xplore,June 2020 [2] Dr. I S Akila,et al:A Text Reader for the Visually Impaired using Raspberry Pi.IEEE Xplore,2018. [3] Mr.Rajesh M.,et al:TEXT RECOGNITION AND FACE DETECTION AID FOR VISUALLY IMPAIRED PERSON USING [4] RASPBERRY PI.2017 International Conference on circuits Power and Computing Technologies [ICCPCT],2017. [5] Rithika.H,et al:Image Text To Speech Conversion In The Desired Language By Translating With Raspberry Pi.IEEE Xplore,2016. [6] http://elinux.org/RPi_Text_to_Speech_(Speech_Synthesis) [7] https://en.wikipedia.org/wiki/Visual_impairment [8] https://en.wikipedia.org/wiki/Braille [9] https://www.classycyborgs.org/braille-literacy-statistics-india/ www.raspberrypi.org [10] http://www.zdnet.com/article/raspberry-pi-11-reasons-why-its-the-perfect-small-server/ [11] http://aishack.in/tutorials/opencv/ [12] http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_setup/py_intro/py_i [13] ntro.html http://hackaday.com/2016/02/28/introducing-the-raspberry-pi-3/

Copyright

Copyright © 2023 Prof. Jayant V. Kulkarni, Vedant Bhoir, Prathamesh Deshpande, Sakshi Deshpande. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET50917

Publish Date : 2023-04-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here