Voice Assistant Notepad

Authors: Shyam Nayan Tirupathi, Bhavani M, Samba Siva Naidu Etamsetti, Shreya Anumukonda

DOI Link: https://doi.org/10.22214/ijraset.2023.50278

Abstract

The notepad is made on the basic concept of a real-time voice to text conversion technology that translates said words into text exactly as the user pronounces them. We developed a real-time speech recognition system and tested it in normal surroundings. The system is made up of two parts: the first is for processing an acoustic signal acquired by a microphone, and the second is for interpreting the processed signal and translating it to words. We want a voice recognition system that is reliable and inexpensive and with a good efficiency in performance. This system allows us to take notes faster which helps us to increase productivity and maintain good work life balance at the same time. This helps people of all age groups such as kids to take down notes who find writing difficult, adults to note down paragraphs or important points more easily and elderly persons to make use of technology who find typing hard. The software program was created using an object-oriented analysis and design methodology, and it accomplishes Speech Recognition by detecting and also capturing the audio using the microphone on the device. Along with additional advantages, the suggested system decreased the note making duration to more than 50% depending on the user\'s speed. To address the present issues with note taking, we decided to take on and do the project.

Introduction

I. INTRODUCTION

People document important points from day to day activities or observations making notes an essential part of any data documentation. Majority of people use technology to document notes more efficiently according to their convenience of using the software, which influences how simple its to take notes.

The initial voice recognition systems concentrated on numbers rather than words. Bell Laboratories created the "Audrey" system in 1952, which could detect a singular voice speaking numbers aloud. Several years later, IBM released the "Shoebox," which comprehended and replied to 16 English words. This resulted in the discovery of speech to text which identifies or recognises the words spoken. For the purpose of extracting the audio from raw microphone input, it employs speech processing techniques.

In order to convert the raw input audio into words, the system often includes a microphone, processor and application that can conduct advanced speech recognition. A monitor shows the processed data of the input using the above technique the words are extracted and used for collecting information about the words features.

Note taking has become quicker and simpler as a result of effective system implementations. This has also made it quicker for users to note information with a much faster rate to allow productive work flow.

II. LITERATURE REVIEW

A. Speech Recognition

Audio is recorded or recognised by the speech of words through a procedure called speech recognition. So to get the raw audio into processed data it employs speech processing algorithms.

The system typically consist of 2 parts : a microphone for recording raw audio and a software to extract the audio from the microphone using speech recognition that converts the raw audio input into readable character of words. The following are the essential components of this process :-

B. Audio Capture

The stage of audio capture comes first. A microphone for recording the audio is used to record the audio of the user’s speech. Creating a basic audio version, eliminating noise, and enhancing the key features are the key steps in audio pre-processing. Audio filtering is typically done for audio pre-processing.

C. Feature Extraction

The feature recognition step, which comes next, performs a number of tasks, including scaling the audio to a workable aspect ratio. In addition to making the speech into a set of objects.

D. Feature Segmentation

A technique that separates depicted bars or phrases within singular letters is known as feature segmentation. This procedure aims to break down audio from string of letters into smaller depictions from the constituent symbols. The goal of feature segmentation is to break down an audio from string of letters to smaller objects of singular notations.[6]

E. Feature Classification

Feature classifier is the action of extracting letters from a given audio sample, identifying them, then transforming them within readable text in standard representation of data in computer science otherwise another system-mutable format. Action of classifying the given letters in the manner of an established letter group is known as feature classification.

??????? III. PROBLEM IDENTIFICATION AND OBJECTIVES

A. The Problem

There have been several problems regarding documentation or note making in day-to-day life. They are as follows:

If a person had to make note of huge amount of data in a short amount of time it can affect the person’s efficiency.
This might lead for the users to become tired or record inaccurate data.
The recorded data might be in a hard copy which would take more time to note down.

B. The Proposed Solution

Users can use a voice assisted notepad where we record and process the data where we can record and identify the text and note down faster.
This software leads to productive work flow and allows more consistency while note making
The data lasts longer than any hard copy as it can be downloaded and stored as files in the system itself.

IV. USAGE REPRESENTATION

Here the flowchart or the above diagram shows how the speech recognition function works which is used in this project. Using this as a base the later front-end of the system is developed both for a web application and as a mobile application also. The audio is first acquired from a microphone from the device then through a step by step process converts it into editable text and displays it to the user.

??????? V. OVERVIEW OF TECHNOLOGIES

A. Hardware Technologies

Microphone: Audio microphones are employed in the audio recording phase of the process. They are mainly used to capture audio or speech of users.
System: The physical processing system is used as the mainframe in this application and to apply different filtering algorithms. In this project both a computer and a smartphone are used to run the application.

B. Software Technologies

In Computer

a. Speech Recognition Software: This program's speech recognition features enable it to extract the required audio sample from raw audio input.

b. HTML: HTML is an abbreviation for Hyper Text Markup Language. HTML is the industry standard markup language for developing Web pages. The structure of a Web page is described in HTML. HTML is made up of a number of elements. HTML elements instruct the browser on how to render the material.

c. CSS: CSS is an abbreviation for Cascading Style Sheets.

CSS specifies how HTML components should appear on screen, paper, or in other mediums.

CSS saves a significant amount of time. It has the ability to control the layout of numerous web pages at the same time.

CSS files include external stylesheets.

d. JS: JavaScript is the Web Programming Language.

JavaScript has the ability to update and modify both HTML and CSS.

JavaScript has the ability to calculate, modify, and validate data.

2. In Smartphone App Development

a. Java: Java helps us to create different classes, functions, UI/UX based on file templates. It is used as to maintain the code that is used to program the android device application. With the help of java development kit.

b. XML: Xml is known as an extensible markup language which is used to describe the data as compared to HTML which displays the data using text files. It's very adaptable and used for a variety of things such as designing the interface of the android app.

c. Gradle: Android Studio consists of the package gradle which is an latest modern build kit which handles the build and execution of the android application according the required settings.

d. Android Studio: It is the main important part of developing any android application as it is the Integrated Development Environment used for the development process basically like an android code text editor.

???????VI I. RESULTS

We looked into the note taking procedure to understand the issue better and discovered the duration of the action of notes taken while collecting the textual data.

The problems were identified then came up with answers towards the issues by greatly decreasing the data collecting duration as shown in the below results table.

Therefore the time was approximately shortened by a great margin as it increases more with longer the input the accuracy is >90%.

S no.	Input Text	Original method time duration	Software testing time duration
1.	hello how are you	6 s	3 s
2.	The leather jacked showed the scars of being his favorite for years. It wore those scars with pride, feeling that they enhanced his presence rather than diminishing it. The scars gave it character and had not overwhelmed to the point that it had become ratty. The jacket was in its prime and it knew it.	1 min 20 s	24 s
3.	He scolded himself for being so tentative. He knew he shouldn't be so cautious	22 s	7 s

Conclusion

The reasons for this study was the issues and difficulties related to data entry in notepads. The major objective of this study was to create an android application for automatic voice assistant order to manage notes. The complete method that we suggested as a way of addressing the difficulties faced during this textual data collecting procedure. Below are the pros achieved from the completed system’s performance: 1) The data collection has been made virtual which lessens the maintenance of the physical notebooks or records. 2) Shortening the time by speeding the note taking procedure. 3) Thorough documentation of note data. 4) Offers a method for simple information backup and exchange. 5) Sharing real-time information with the user. 6) Easier examination of the recorded data.

References

[1] Nikhil Jain, Manya Goyal, Agravi Gupta, Vivek Kumar Speech to text conversion for using sentiment analysis (v-3 june 2021) [2] Android studio software development kit tutorialspoint [3] Voice Recognition System Research Gate (Pranab Das Nov 2015 [4] JavaScript Languages Speech recognition Geeksforgeeks.com [5] Automatic Speech Recognition Survey (Dr.Arbana Kadriu 2020) [6] HTML, CSS, JS basics from w3schools.

Copyright

Copyright © 2023 Shyam Nayan Tirupathi, Bhavani M, Samba Siva Naidu Etamsetti, Shreya Anumukonda. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET50278

Publish Date : 2023-04-10

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here