Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Arpitha Vasudev, Karthik M R, K Pranav, Monish C, Sai Divya
DOI Link: https://doi.org/10.22214/ijraset.2023.52178
Certificate: View Certificate
With the exponential growth in data, it is essential to analyze it. The data is a valuable source of information and knowledge that should be effectively interpreted to be helpful. The data should be interpreted in different forms based on the user requirements to get precise information and knowledge. Nowadays smartphones are the most commonly used electronic device. A smartphone is not only a communication device, it is also a powerful computing device. So it is possible to apply translation, text extraction, summary, and much more techniques, which require much computational work. This paper presents an application to analyze the text from documents on smartphones. However, it is challenging to interpret the documents on smartphones. The proposed application converts the documents or images to searchable and editable digital text, and further, it can be used to analyse them into different forms. The objective of this application is four-fold 1) To recognize text from documents or images by using optical Character Recognition 2) Summarization of the text 3) Translate the extracted text to different languages 4) Generation of speech from the text by using a text-to-speech algorithm.
I. INTRODUCTION
The application accepts documents as input and generates a text file by using tesseract OCR [1], that extracts the text frm the document. This text file is uploaded to cloud storage. The text file can be accessed to summarize, translate, and generate speech. The text recognition module uses Tesseract OCR to extract text. Initially, it converts the color image to a binary image, separates the characters, extracts the information from the images, and finally, it does post-processing. This application uses extractive summarization to summarize the text. Extractive summarization extracts the most important subset of information of the sentences from the text to generate a summary. The machine learning model uses the Text-Rank algorithm [2].
Text translation analyses the structure of sentences, syntax and grammar in the source language and, based on the grammar and rules of the target language, it translates the text. This application uses the Firebase ML kit's API [3] to translate the text into different languages. A Text-to- speech contains two parts. First, it does normalization, converts the text into tokens, and then assigns phonetic transcriptions and prosodic units, then they are converted to the waveform, and by using a synthesizer, the speech is generated. The text file can be accessed from cloud storage and used to generate speech with the help of Text-to- speech modules which use android text-to-speech API [4].
II. RELATED WORK
OCR, Text translation, Text-to-speech, and Text summarization are the key technologies used to interpret the text in different forms for better understanding. Some applications use these technologies for the interpretation of the text.[5] This paper presents the web app that extractstext from the images and uses the real-time OCR for extraction. This application mainly extracts text from an image. The extracted text can be editable. It is not capable of extracting text from a document. It is not able to store the extracted text. [6] proposed an android application that extracts text from the images, and the extracted text can bestored in the local storage. It cannot extract the text from the document, and the text cannot be interpreted into other forms. The text is stored in local storage, and there may bea chance of data loss when the app is uninstalled or if the device is crashed. [7], [8] These papers have researched an android application capable of live translation of the text by accessing each frame from the device's camera. This application is bounded to a small set of users willing to read the text in their local language. It directly performs translation of a given frame to the local language. [9] this paper has researched automatic text summarization. The summarizer used in this reference paper uses a sentence ranking algorithm that accepts text from the document, which is used to generate a summary. The summarized text can be converted into audio format.[10] this paper tells us about an android application that converts the text from the image to speech by translating it into another language. The text is extracted using OCR, and the text is translated, and the translated text is converted to speech. This application does not have any storage facility. Moreover, it works only on the image, not on any document. [11] this paper researched OCR in Android OS.
This paper mainly focuses on OCR, which works using Tesseract for Android. [12] this paper presents the desktop application that can extract the text using Tesseract OCR. It also supports operations on the text like translation, text to speech. It is also capable of speech recognition. This application is bounded to only desktops. All the above-mentioned papers use the technologies to extract the text, and some use them to interpret the text, but all together is not embedded in a single application. We proposed an android application “Text Interpreter & Converter”, accepts the document as input, extracts the text from documents, and converts it into digital editable and searchable text. This text is stored in the cloud. Then, the text file can be summarized, translated, converted to speech. Here we embedded all the above existing technologies in an android application.
A. Text Recognition
Many open-source OCR engines are ready to use, but there are fewer OCR engines for mobile because mobile has less power and less capacity to process. Among the fewer OCR engines, Tesseract is one of them. Coming to android applications mobile, Tesseract OCR [1] provides an open-source library for android, which is helpful for text extraction from scanned images. In this Application, the tesseract engine is provided English trained data. Method: Firstly, the Application accepts images or documents (the documents should be in pdf format). After getting the document or image then, the text extraction takes place. If the input is a document, the record is rendered first through a PDF Renderer while rendering each page of a given document is copied into a bitmap, and the bitmap is passed for text extraction. All set of pages in the given document follows the same procedure. If the input is an image, the image is converted into a bitmap, which is passed for text extraction. After collecting the bitmap from the preceding process for text extraction, the bitmap is passed to the Tesseract engine, which has some trained English data. The tesseract engine accepts bitmap, and it does all pre-processing operations like grayscale conversion, binarization, and text segmentation. Moreover, after performing all the pre- processing operations, it conducts a text recognition operation through which it extracts the characters present in a bitmap. Finally, the output given by the Tesseract is appended to the text editor through which the user can edit and perform other operations.
B. Text-to-speech
Text to speech is one of the unique modules through which users can listen to text converted to speech or audio. Text-to-speech is mainly helpful for people who are visually impaired. Using Text-to- speech, the user can listen to the content of their document, which is reading out by the Text-to-speech engine. For Text-to speech, android's text- to-speech API is used. Method: The Text-to-speech module is initialized by the android Text-to-speech (TTS) API. While initializing, the TTS (Text-to-speech module) is assigned with certain voices and other parameters. After initializing, the text is accessed, the text is given to the TTS API as input, and the TTS API generates output as speech. Here user can select a male/female voice and stop the speech in between.
C. Text Translator
ML Kit's on-device translation API [3] is used for text translation. ML Kit is a mobile SDK that helps access Google's machine learning expertise in mobile applications. By using on-device translation, the translations are performed quickly. Because of the complexity of Natural Language Processing (NLP), the translations may not be appropriate for all users. The primary source language is English. The text can be translated into different languages provided. Text translator is mainly helpful for users who are comfortable reading in their local language instead of English. For every other language translation, there are different machine learning models to translate. If the user chooses the language for the first time, the model is downloaded. If the user has already translated to that language, the downloaded model is used for translation. Every model has a size of 30 MB. Method: In Text translator module, the translator is initialized. After initializing, the text is provided as input to the translator. When the user selects a target language, the translator translates the text into the target language and displays it to the user.
D. Text Summarizer
There are mainly two types of summarizers: abstractive summarizers and extractive summarizers. An abstractive summarizer helps understand the core content of the original text. An extractive summarizer helps in extracting the most critical subset of sentences. An extractive summarizer is quicker to implement using unsupervised learning as it requires no training data. The similarities among the sentences are calculated, and the summary is generated based on the maximum similarity score. The summarizer uses the Text-Rank algorithm [2], to generate a summary in this Application. We deployed the summarization model onthe Heroku cloud by using flask framework.
Steps in Text-Rank: 1. Extract sentences from the original text. 2. Creating vectors for all text units (sentences) based on the words (tokens) present in the original text. 3. Calculating cosine similarity between each pair of the sentence. 4. Creating an n*n matrix where n represents the number of sentences. 5. Creating a graph using the similarity matrix where each vertex is a sentence, and the edge is similarity. 6. Ranking the text units based on similarity score and returning the top n sentences that are to be included in the summarized text. Cosine distance between two vectors is calculated using the Cosine of the angle between them. Cosine Distance (????????, ????????) = 1− ????????????????????????(???????????????????? ???????????????????????????? ????????, ????????) We can say that for the same vectors, the Cosine distance will be 0, and the Cosine similarity will be 1 for perpendicular vectors.
Method: In the Text summarization module, the text is accessed from cloud. The accessed text and the number of lines to be summarized are the two parameters taken as input for the post request. These post requests are handled and forwarded to the text summarizer. This summarizer generates the summary. The summary is converted to JSON format. The JSON object is dealt with in the Application. Finally, the summary of the original text is displayed to the user. The Application workswith all the modules mentioned above as its main features. In addition, database and Authentication services are also integrated with the Application. When tesseract OCR extracts the text, the extracted text is stored in a database (Cloud
Storage) in the text file. This text file is accessed when Text Translator, Text Summarizer, Text- to speech modules are invoked. So to keep track of users and display user’s files Authentication Service is also required. For Authentication, users can log in using their phone number, Email Id, or Google Account. Above mentioned are the three authentication service providers provided in this Application. For Database & Authentication, we have used Firebase. For the database, we have used Firebase Firestore. For the storage of text files, we have used Firebase Cloud Storage. Finally, for Authentication, we have used Firebase Authentication. The user must register using an authentication service to use this Application.
E. Working of Application
When a user opens the Application if the user is a registered user, all the files of that particular user are displayed. If the user is not registered, the user needs to be registered using any particular service provider (phone number, Email, or Google) provided by the Application. Then, users can upload documents or images by using the text recognition module, which helps extract characters from the documents or images. After extraction of text, the user can edit the data, and by clicking on the save button, the text file is uploaded to the cloud storage and database. Finally, the uploaded file uploaded is displayed in the main activity. The user can click on any of the files which he/she are desired to translate or summarize or listen to it. In all the other modules like Text translation, Text Summarization, and Text-to-speech, the text is accessed from the text file stored in cloud storage.
III. LITERATURE SURVEY
IV. SYSTEM DESIGN
USE CASE DIAGRAM : Use case diagrams represent the overall scenario of the system. A scenario is nothing but a sequence of steps describing an interaction between a user and a system. Thus a use case is a set of scenarios tied together by some goal. The use case diagrams are drawn for exposing the functionalities of the system. The use case view models the functionality of the system as perceived by outside uses. A use case is a coherent unit of functionality expressed as a transaction among actors and the system. Here the main actor is the user, who interacts with the system. And the database is an internal entity of the system as it does not interact with any external systems. The users can upload a document or view the extracted text files that are backed up by them. When a user uploads the document then the system is triggered to do an action and extracts the text from the document. The users can invoke a text-to-speech engine and listen to the audio or summarizer module to view the summary or translator to translate the text to other languages. Text-to- speech, summarizer, Translator are subsystems.
The use cases are:
V. SYSTEM ARCHITECTURE
The system architectural design is the design process for identifying the subsystems making up the system and framework for subsystem control and communication. The goal of the architectural design is to establish the overall structure of the software system. System architecture involves the high-level structure of software system abstraction, by using decomposition and composition, with architectural style and quality attributes. A software architecture design must conform to the major functionality and performance requirements of the system, as well as satisfy the non-functional requirements such as reliability, scalability, portability, and availability.
The system consists of subsystems
1). Text Recognition 2). Text Summarization 3). Text Translation 4). Text to speech
Initially, the user can upload an image or document for converting to speech. When the user 19 uploads and presses the convert button, text recognition is invoked. Text recognition does pre-process and invokes Optical Character Recognition that extracts the text in images. The extracted text is stored in a text file. This text file is the intermediate file that can be editable by the user and backed up in the cloud storage. The users can view the intermediate text files that are backed up by them. When the user opens the files they have three options like Text Summarization, Text Translation, Text-to- speech. When the user clicks on the Text Summarization button the summary of the text is generated. When they click on the Text Translation button they are prompted to select the target language and it translates the text to the target language. When they click on Text to speech, the Text-to speech engine is initialized. The users have options to change the pitch, speed, and voice such as male and female. Then the Text-to-speech engine starts reading the text.
VI. IMPLEMENTATION
The implementation of the project is done with the help of Android Studio. Android Studio is one such platform that helps to build Android apps. To implement our Technology and to build an app, we found out Android Studio to be easier. Since it helps with the following problem. • Instant App run • Visual Layout Editor • Fast Emulator • Intelligence code Editor • Colour previews
A. Text Recognition
OCR stands for Optical Character Recognition. OCR systems transform a two-dimensional image of text, that could contain machine-printed or handwritten text from its image representation into machine-readable text. OCR as a process generally consists of several sub- processes to perform as accurately as possible. The subprocesses are:
In the case of an image as there is one image we convert image to bitmap and pass it to the text recognizer to extract text. In the case of a document, as there are pages in the document we cannot pass the whole document to the text recognizer. Firstly we should pre-process the document and then we should pass it to the text recognizer. To pre-process it we traverse each page of the document and extract a bitmap of each page. To create a bitmap we took a standard size and copied each pixel of data of the page to the bitmap and generated a bitmap. And for each bitmap generated out of each page, we passed this bitmap to the text recognizer and this text recognizer extracts the text of each bitmap.
B. Database
In this project, the database is implemented with the help of Firebase. Google Firebase is a Google- backed application development software that enables developers to develop iOS, Android, and Web apps. Firebase provides tools for tracking analytics, reporting and fixing app crashes, creating marketing and product experiment. We mainly used three components in the firebase
C. Authentication
Firebase Authentication provides backend services, easy-to-use SDKs, and ready made UI libraries to authenticate users to our app. It supports authentication using passwords, phone numbers, popular federated identity providers like Google, Facebook and Twitter, and more. We used three providers to authenticate the user to our app • Email
Google • Phone number Users can click on the login or register button and can choose the providers to login or register. After selecting the providers and entering his credentials, then the app checks for the user if the user is a registered user, the app access the details of the user and displays them on the screen. If is the user has not registered then the app registers the user by generating a unique userid and creating an account documents.
Build hierarchies to store related data and easily retrieve the data, need using expressive queries. All queries scale with the size of result set, so app is ready to scale from day one. Cloud Fire-store ships withmobile and web SDKs and a comprehensive set of security rules so we can access our database without needing to stand up your own server. Using Cloud Functions, serverless compute product, we can execute hosted backend code that responds to data changes in your database. Of course, we can also access Cloud Fire-store with traditional client libraries too (i.e. Node, Python, Go, and Java). With Cloud Fire-store, we can automatically synchronize our app data between devices. It notifies us of data changes as they occur so we can easily build collaborative experiences and real-time apps. Our users can access and make changes to their data at any time, even when they're offline. Offline mode is available on iOS, Android and Web! With declarative security language, we can restrict dataaccess based on user identity data, pattern matching onyour data, and more. Cloud Fire store also integrates with Firebase Authentication to give you simple and intuitive user authentication. In our Fire store, we created a collection called documents and in that collection, we maintain each and every document of the users. In each document, there are four parameters theyare • filename (Name of the user's file)
Link (Link to the particular file stored in cloud storage) • time (Time when the file is uploaded) • userid ( User’s id of the particular file user) When the document is being uploaded by the user the app collects the above parameters from the user and passes it to the Firestore. When the text is extracted the extracted text is converted into the file and the text file is stored in the Cloud storage and generates a link to that particular file and then the time is generated by Firestore. The userid is also added and then the document gets uploaded to the Firestore We can also keep track of documents using the Firestore console
When the registered user enters into the app the Cloud Firestore is a NoSQL document database that lets you easily store, sync, and query data for your mobile and web apps - at global scale. Structures data easily with collections and Files Activity gets invoked and by using the query the app accesses the particular user's files and displays them to the user. 25 The app performs the query by getting the userid and of the user and checks the document where the userid is equal to this user's id and displays the document in the descending order of the time when they are uploaded so that the latest document will be on the top. And to follow indexing we should define indexes in the Firestore.
D. Cloud Storage
Cloud Storage for Firebase lets you upload and share user generated content, such as images and video, which allows you to build rich media content into your apps. Users data is stored in a Cloud Storage bucket — an exabyte scale object storage solution with high availability and global redundancy. Cloud Storage for Firebase lets us securely upload these files directly from mobile devices and web browsers, handling spotty networks with ease. In our application, Cloud Storage is used just like a bucket, we store the generated text file in the Cloud Storage under a bucket "files". The files in this storage don’t follow any order it stores with a random name and order. But each file is linked to the document in the Firestore. When the text file is created from the extracted text and this text file is named with some random name and gets uploaded in the Cloud Storage. And Cloud Storage generates the link to the file and passes the link to the Firestore document. We can keep track of files in Cloud Storage console.
Text to speech is one of the unique modules through which users can listen to text converted to speech or audio. Text-to-speech is mainly helpful for people who are visually impaired. Using Text- to-speech, the user can listen to the content of their document, which is reading out by the Text-to- speech engine. For Text-to speech, android's text- to-speech API is used. Method: The Text-to-speech module is initialized by the android Text-to-speech (TTS) API. While initializing, the TTS (Text-to- speech module) is assigned with certain voices and other parameters. After initializing, the text is accessed, the text is given to the TTS API as input, and the TTS API generates output as speech. Here user can select a male/female voice and stop the speech in between.
E. Text Translator
ML Kit's on-device translation API [3] is used for text translation. ML Kit is a mobile SDK that helps access Google's machine learning expertise in mobile applications. By using on-device translation, the translations are performed quickly. Because of the complexity of Natural Language Processing (NLP), the translations may not be appropriate for all users. The primary source language is English. The text can be translated into different languages provided. Text translator is mainly helpful for users who are comfortable reading in their local language instead of English. For every other language translation, there are different machine learning models to translate. If the user chooses the language for the first time, the model is downloaded. If the user has already translated to that language, the downloaded model is used for translation. Every model has a size of 30 MB. Method: In Text translator module, the translator is initialized. After initializing, the text is provided as input to the translator. When the user selects a target language, the translator translates the text into the target language and displays it to the user.
F. Text Summarizer
There are mainly two types of summarizers: abstractive summarizers and extractive summarizers. An abstractive summarizer helps understand the core content of the original text. An extractive summarizer helps in extracting the most critical subset of sentences. An extractive summarizer is quicker to implement using unsupervised learning as it requires no training data. The similarities among the sentences are calculated, and the summary is generated based on the maximum similarity score. The summarizer uses the Text-Rank algorithm [2], to generate a summary in this Application. We deployed the summarization model onthe Heroku cloud by using flask framework. Steps in Text-Rank: 1. Extract sentences from the original text. 2. Creating vectors for all text units (sentences) based on the words (tokens) present in the original text. 3. Calculating cosine similarity between each pair of the sentence. 4. Creating an n*n matrix where n represents the number of sentences. 5. Creating a graph using the similarity matrix where each vertex is a sentence, and the edge is similarity. 6.
Ranking the text units based on similarity score and returning the top n sentences that are to be included in the summarized text. Cosine distance between two vectors is calculated using the Cosine of the angle between them. Cosine Distance (????????, ????????) = 1 − ????????????????????????(???????????????????? ???????????????????????????? ????????, ????????) We can say that for the same vectors, the Cosine distance will be 0, and the Cosine similarity will be 1 for perpendicular vectors.
Method: In the Text summarization module, the text is accessed from cloud. The accessed text and the number of lines to be summarized are the two parameters taken as input for the post request. These post requests are handled and forwarded to the text summarizer. This summarizer generates the summary. The summary is converted to JSON format. The JSON object is dealt with in the Application. Finally, the summary of the original text is displayed to the user.The Application workswith all the modules mentioned above as its main features. In addition, database and Authentication services are also integrated with the Application. When tesseract OCR extracts the text, the extracted text is stored in a database (Cloud Storage) in the text file. This text file is accessed when Text Translator, Text Summarizer, Text- to speech modules are invoked. So to keep track of users and display user’s files Authentication Service is also required. For Authentication, users can log in using their phone number, Email Id, or Google Account. Above mentioned are the three authentication service providers provided in this Application. For Database & Authentication, we have used Firebase. For the database, we have used Firebase Firestore. For the storage of text files, we have used Firebase Cloud Storage. Finally, for Authentication, we have used Firebase Authentication. The user must register using an authentication service to use this Application.
This paper introduced \"Text Interpreter & Converter,\" an android application for analyzing lengthy text. The volume of text is a huge source of information that should be analyzed to extract useful information. This paper proposed an android application to recognize, summarize, translate, and convert text to speech. The proposed application extracted the text from documents, and the text can be summarized, translated, or converted to speech. The UI is so friendly that users can easily interact with it. By using the above-mentioned technologies, we are able to interpret the text in different ways. In this application, the source language is English. In the future, we can add different languages, where we can extract characters from local languages and translate, summarize and generate speech for that particular language.
[1] S. Dome and A. P. Sathe, \"Optical Charater Recognition using Tesseract and Classification,\" 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), 2021, pp. 153- 158, doi: 10.1109/ESCI50559.2021.9397008. [2] S. Pattnaik, S. R. Laha, B. K. Pattanayak and B.C. Pattanaik, \"A Framework to Detect Digital Text Using Android Based Smartphone,\" 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON), 2021, pp. 1-6, doi: 10.1109/ODICON50556.2021.9428993. [3] S. Revathy and S. Nath, \"Android Live Text Recognition and Translation Application using Tesseract,\" 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), 2020, pp. 1259-1263, doi: 10.1109/ICICCS48265.2020.9120973. [4] Ahuja, D., Amesar, J., Gurav, A., Sachdev, S., & Zope, V. (2018). Text Extraction and Translation from Image using ML in Android. International Journal of Innovative Research in Science, Engineering and Technology, 7(1) 176–179. https://doi.org/10.15680/IJIRSET.2018.0701028 [5] Journal, I. J. E. R. T. (2016). IJERT-Multilingual Speech and Text Recognition and Translation using Image. International Journal of Engineering Research and Technology (IJERT). [6] S. Ramiah, T. Y. Liong and M. Jayabalan, \"Detecting text based image with optical character recognition for English translation and speech using Android,\" 2015 IEEE Student Conference on Research and Development (SCOReD), 2015, pp. 272-277, doi: 10.1109/SCORED.2015.7449339. [7] M. Brisinello, R. Grbi?, D. Stefanovi? and R. Pe?kai-Kova?, \"Optical Character Recognition on images with colorful background,\" 2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2018, pp. 1-6, doi: 10.1109/ICCE Berlin.2018.8576202. [8] H. Jiang, T. Gonnot, W. Yi and J. Saniie, \"Computer vision and text recognition for assisting visually impaired people using Android smartphone,\" 2017 IEEE International Conference on Electro Information Technology (EIT), 2017, pp. 350-353, doi: 10.1109/EIT.2017.8053384. [9] O. Foong, S. Yong and F. Jaid, \"Text Summarization Using Latent Semantic Analysis Model in Mobile Android Platform,\" 2015 9th Asia Modelling Symposium (AMS), 2015, pp. 35-39, doi: 10.1109/AMS.2015.15. [10] L. Cabral et al., \"Automatic Summarization of News Articles in Mobile Devices,\" 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), 2015, pp. 8-13, doi: 10.1109/MICAI.2015.8. [11] S. R. Rahimi, A. T. Mozhdehi and M. Abdolahi, \"An overview on extractive text summarization,\" 2017 IEEE 4th International Conference on Knowledge- Based Engineering and Innovation (KBEI), 2017, pp. 0054-0062, doi: 10.1109/KBEI.2017.8324874. [12] R. Dhar and S. Mukherjee, \"Android-based Text Reader for Partial Vision Impairment,\" 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), 2018, pp. 1-5, doi: 10.1109/UPCON.2018.8597074
Copyright © 2023 Arpitha Vasudev, Karthik M R, K Pranav, Monish C, Sai Divya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET52178
Publish Date : 2023-05-13
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here