Virtual Assistant

Authors: Atish Patil, Madhuri Kardule, Praveen Gupta

DOI Link: https://doi.org/10.22214/ijraset.2023.54064

Abstract

Python is a relatively new programming language, and writing a script for a voice assistant in Python is simple. How the assistant responds to our requests is entirely under our control. We can convert anything we speak into text using voice recognition. This method is used by all voice-activated assistants like Alexa, Siri, Cortana, and others. Python\'s Speech Recognition library makes it easy to convert speech into text. The challenge of making my own personal helper was exciting. You can now launch your preferred IDE, send emails, perform web searches, play music, and more with just one voice command without ever opening a browser. Given the status of technology today, it is capable of performing any task at least as well as we do, if not better. This project\'s capabilities include being able to send emails to anyone, read PDF files for you, send WhatsApp messages, etc. Now, the fundamental question is: How is it an AI? The virtual assistant I developed isn\'t quite artificial intelligence, but it is the result of a number of statements. Artificial intelligence (AI) computers\' main objective is to carry out tasks as effectively and efficiently as people. My virtual assistant is not a particularly good example of artificial intelligence, but it is one, and that is a fact.

Introduction

I. INTRODUCTION

Artificial intelligence, when used in conjunction with machines, demonstrates the ability to think like a human. In this sense, computer systems are usually designed to require human interaction. As you know, Python is an emerging language, so it is easy to write voice assistants in Python. Assistant instructions can be processed according to the user's needs. Python has an API called Speech Recognition that can convert speech to text. Making my own assistant was fun. Send an email without typing a word, search Google without opening a browser, play music or open your favourite IDE with a voice command, and many other everyday tasks are easier to do. In the current scenario, technological advancements have made it possible for humans to perform any task with human efficiency, or perhaps even more. By doing this project, we realised that the concept of artificial intelligence reduces human effort and saves time in all fields. Voice assistants use artificial intelligence, so the results they provide are very accurate and efficient. Assistants help reduce human effort and save time while doing anything. They completely remove the concept of typing and act like another person we are talking to or ask them to do something. Assistants are not worse than human assistants, but they can do everything more effectively and efficiently. The libraries and packages used to create this assistant focus on time complexity and time saving. A virtual assistant is usually a cloud-based application that requires an internet-connected device or application to function. The technology that powers virtual assistants requires extensive knowledge to power not only the platform, but also the areas of machine learning, verbal communication processes, and speech recognition. A virtual assistant is a software program that helps make everyday tasks easier, such as checking the weather forecast, creating reminders, or creating a shopping list.

They can take commands through text input or by voice. Voice-based intelligent assistants need an invoking word or wakeup word to activate the assistant, followed by the command. Today we have so many voice assistants, like Apple’s Siri, Amazon’s Alexa, and Microsoft’s Cortana. For this project, the wake word was chosen as Hello Mark.

A. Objectives

The main objective behind creating this self-help software (virtual assistant) is to use semantic information available on the web to create content and provide information to users.
The purpose of this virtual is to answer the questions that the user may have; thus, this technology can be used in a business environment, for example, on a business website with a chat interface.
It includes call-to-action programmes, which ask for and then respond to feedback. This project aims to provide Windows users with a virtual assistant that helps not only with daily tasks such as web browsing, playing music, and many other things.
The long-term goal of this project is to create a complete service provider that manages all server management processes (deployment, backup, autoscaling, logging, monitoring, and processing) and is smart enough to replace the average worker.

B. Purpose

The purpose of virtual assistants is to be able to interact with voice, play music, make to-do lists, play audiobooks, and tell us news such as weather, work, sports, etc. while providing us with real time information. Virtual assistants allow users to use voice commands to control devices and applications. Millennial consumers in particular are showing increased awareness and increased comfort towards this technology. In this ever-evolving digital world, which is constantly optimised for speed, efficiency, and convenience, it is a fact that we are moving towards less screen interaction.

II. LITERATURE REVIEW

In today's world, machines are trained to think like humans and perform tasks on their own, replacing what humans can do. Based on this situation, the concept of voice assistants, which can perform various human tasks based on the human voice, was born. A virtual assistant can filter the voice commands given by the user and return relevant information. People around the world are transforming their digital experiences with future technologies such as virtual reality, augmented reality, and voice interaction. Voice assistants are emerging as a new evolution in human-machine interaction, where analog signals are converted into digital waves by audio signals. Over the past few years, smartphone usage has grown significantly, leading to the widespread use of voice assistants such as Apple's Siri, Google Assistant, Microsoft's Cortana, and Amazon's Alexa. Voice assistants are built using technologies such as speech recognition, text-to-speech, and natural language processing (NLP) that offer unlimited applications to make users' lives easier and more convenient.

Voice assistants provide many services to satisfy their users, such as:

Answer questions from users.
Play music from streaming music services and YouTube videos.
Set alarm.
Send WhatsApp messages.
Send Email.
Tell the weather forecast.
Control other smart appliances.

The capabilities of voice assistant are expanding according to the needs of the user.

According to Deepak Shende, Ria Umabiya, AIVA (Microsoft, Google's Google Assistant and the smartest assistant named "AIVA" 2018) aims to create a voice assistant that can do many things, such as searching the Internet. It has new features such as commenting on social media such as Facebook and Twitter. With just few simple commands you can learn about the weather around you and get information about the weather in your area.

Tulshan explains that user's fingers can be injured due to constant typing. In order to avoid such problems, we need to create a system that allows us to do everything with voice commands. The speech will be recognized by the system. The recognition words will be compiled, if necessary are clarified, then printed on the screen, and after this again the recognized word will be matched with a specific keyword and if the match is found, then program will be compiled and executed.

Dr. Kshama V. Kulhalli presents research on voice assistants such as Google Assistant, Apple's Siri and Microsoft's Cortana. From this research, it was concluded that Google Assistant's answers are more accurate than others because it can easily understand the sound change.

III. METHODOLOGY

Speech Recognition: This system uses googles speech recognition system to convert your speech input into text. The audio input obtained from the microphone is temporarily stored in a variable and sent to the google cloud speech api for speech recognition. Then the equivalent text is obtained and given to the system.
Python Backend: The Python backend analyzes the output of the speech recognition module to determine whether a command or speech output is an API call, context extraction, or system call. The output is then fed back to the Python backend to give the user the desired result.
API Calls: API stands for Application Programming Interface. An API is a software interface that allows two programs to communicate with each other. In other words, an API acts like a messenger between client and server.
Content Extraction: Content extraction is the process of extracting and converting machine-readable data into a structured format. Often, these tasks require natural language processing (NPL) to process human-language texts. The test results can be found in recent multimedia studies such as automatic annotation and context extraction from images, audio, and video.
System call: The mechanism of requesting help from the operating system kernel that the computer software runs is called a system call. Examples of this are communication with key services such as hardware services (for example, accessing hard drives), creating and executing new processes, and scheduling a process. System calls act as an interface between processes and the operating system.
Text-to-speech: The computer's ability to read aloud is called text-to-speech (TTS). The text is converted into a representation of words, which are then converted into waveforms that can be reproduced as sound by a text-to-speech engine. Third-party providers provide text-to-speech engines in a variety of languages and vocabularies.

A. Existing System

We are familiar with many existing voice assistants like Alexa, Siri, Google Assistant, and Cortana, which use concepts of language processing and voice recognition. They listen to the commands given by the user and, as per their requirements, perform that specific function in a very efficient and effective manner. As these voice assistants use artificial intelligence, the results that they are providing are highly accurate and efficient. These assistants can help reduce human effort and consume less time while performing any task; they have removed the concept of typing completely and behave as another individual to whom we are talking and asking to perform tasks. These assistants are no less than human assistants, but we can say that they are more effective and efficient at performing any task. The algorithm used to make these assistants focus on the time complexities and reduce time, but for using these assistants, one should have an account (like a Google account for Google Assistant or a Microsoft account for Cortana) and can use it with an internet connection only because these assistants are going to work with internet connectivity. They are integrated with many devices like phones, laptops, speakers, etc.

B. Proporsed System

Developing this assistant was an interesting task. This virtual assistant makes it easier to send an email without typing a word, search Google without opening a browser, play music, or open your favourite application with a voice command. Many other tasks can be done with ease using this virtual assistant with just a voice command. This virtual assistant differs from other virtual assistants because it is specific to desktops and does not require creating an account to use it, nor does it require an Internet connection to receive instructions for doing certain tasks. The IDE used for this project is Visual Studio Code. All Python files are written in Visual Studio code, and all required packages can be easily installed in this IDE. We created a live GUI to interact with this virtual assistant and gave it an interesting look and design during conversations. Advancements allow virtual assistants to do anything as efficiently or even more efficiently than we do. By making this project, we realized that the concept of artificial intelligence reduces human effort and saves time in all fields. The features of this project are:

It can send emails.
It can read PDF.
It can send text on WhatsApp.
It can open and close applications.
It can play music.
It can do Wikipedia searches for you.
It can open any websites.
It can give weather forecast.
It can read news.
It can do screen recording.
It can check your internet speed.
It can perform arithmetic calculations.
It can capture screenshot.
It can have some random conservation, etc.

IV. SYSTEM DESIGN

A. Data Flow

The data flow of Virtual Assistant is as follows:

Originally, the system was in idle mode. As soon as it receives any command, it begins to execute. The received command is recognised, whether it's a questionnaire or a task to be performed. Specific action is taken accordingly. When a question is being answered or a task is being performed, the system waits for another command. This circles unless it receives a quit command.

D. Sequence Diagram

The end user sends a command to the voice assistant in audio form. The command is passed to the interpreter of the system. It identifies what the end user has asked for and directs it to the task execution function. If the command is incorrect or missing some information, the voice assistant asks the end user about it. The received information is then transferred back to the task, and it's fulfilled. After execution, feedback is transferred back to the end user.

This sequence diagram describes the sequence of interactions that happens in virtual assistant.

V. SOFTWARE DETAILS

The IDE used for this project is Visual Studio Code. All Python files are written with Visual Studio code, and all required packages can be easily installed in this IDE. Modules and libraries such as pyttsx3, SpeechRecognition, Datetime, Wikipedia, keyboard, pywhatkit, pyjokes, PyPDF2, pyautogui, and PyQt are used in this project. A live GUI is created for interacting with the virtual assistant, as it gives the conversation a unique and interesting look.

A. Visual Studio Code

It is an IDE, i.e., Integrated Development Environment, which has many features like supporting scientific tools (like Matplotlib, NumPy, and SciPy), web frameworks (example: Django, web2py, and Flask), refactoring in Python, an integrated Python debugger, code completion, code and project navigation, etc.

VI. IMPLEMENTATION WORK DETAILS

Virtual Assistant is a desktop voice assistant that can perform many daily tasks on the desktop, like playing music or opening your favourite IDE, with the help of a single voice command. Virtual Assistant is different from other traditional voice assistants in terms of the fact that it is specific to desktops and the user does not need to make an account to use it; it does not require any internet connection while getting instructions to perform any specific task.

A. Real Life Application

Saves Time: Virtual Assistant is a desktop voice assistant that works on the voice commands offered to it, can do voice searching, and can let us complete a set of tasks.
Conversational Interaction: It makes it easier to complete any task as it automatically does it by using the essential modules or libraries of Python in a conversational way. Hence, any user, when instructing it to do any task, feels like giving a task to a human assistant because of the conversational interaction between giving input and getting the desired output in the form of a task done.
Reactive Nature: The desktop assistant is reactive, which means it knows human language very well, understands the context that is provided by the user, and gives a response in the same way, i.e., in human-understandable English. So, the user finds its reaction in an informed and smart way.
Multitasking: Its main application is its multitasking ability. It can ask for continuous instructions one after the other until the user "quits" it.
No Trigger Phase: It asks for the instructions and listens to the response that is given by the user without needing any trigger phase, and then only executes the task.

B. Data Implementation

As the first step, we will install all the necessary libraries and packages. The command used to install the libraries is "pip install," and then import them. The necessary packages included are as follows:

pyttsx3: This is a Python library used for converting text to speech.
Speech Recognition: This is a Python library that converts speech to text.
pywhatkit: This is a Python library for sending WhatsApp messages with some additional features.
DateTime: This library provides us with the current date and time.
Wikipedia: This is a Python module for searching anything on Wikipedia.
pyPDF2: It is a Python module that can read, split, and merge any PDF.
Pyjokes: This is a Python library that is used to create one-line jokes for programmers.
Web browser: It is a Python library that is used to open the given URL using the default browser.
Pyautogui: This is a cross-platform GUI automation Python module that is used to programmatically control the mouse and keyboard.
os: The os module in Python provides functionality for interacting with the operating system.
sys: This allows control of the interpreter as it provides access to variables and functions usually associated with the interpreter.
request: used to send http requests to the specified url.
get: It is used to get data from the given URL.
Sleep: It is used to stop the execution of a program for a given number of seconds.
Keyboard: It is a Python library that allows us to get full control over the keyboard.
Wolfram Alpha: This is an API that calculates expert responses using Wolfram algorithms, knowledge bases, and artificial intelligence.
json: It is a Python library that is used for processing and reading JSON files.
GoogleTrans: It is a Python library that is used to call methods such as detect and translation.
gtts: It is a Python library that is used to convert text to audio that can be saved as mp3 files.
playsound: It is a cross-platform module that can play audio files.
pywikihow: It is a Python library that is used to search for anything on WikiHow.
openai: It is a Python library that conveniently accesses the Open AI API for applications written in Python.
tkfilebrowser: tkfilebrowser has replaced tkinter.filedialog, which allows the user to select files or directories.
cv2: cv2 is a Python library designed to solve computer vision problems.
win32api: This provides access to many Windows APIs from Python.
numpy: It is a Python library that is used for working with arrays and numeric data.
PyQt5: PyQt5 is the most important Python binding. It has several GUI widgets. PyQt5 has some important Python modules like QtWidgets, QtCore, QtGui, etc.

C. Functions

takeCommand(): The function is used to take the command as input through the microphone of the user and return the output as a string.
wishMe(): This function greets the user according to the time, like Good Morning, Good Afternoon," and Good Evening.
taskExecution(): This is the function that contains all the necessary task execution definitions like sendEmail(), pdf_reader(), news(), and many conditions in the if-elif-else ladder like "open Google,", "open Notepad,", "search on Wikipedia" ,"play music," "open command prompt," etc.

VII. RESULT

This part of the research report is a brief description of the results of our project. We chose Python as the preferred programming language for our project. We are focusing on activities performed by voice assistants. The main reason to use Python in this project is its robust standard libraries.

Following are some screenshots of the output that our virtual assistant gives on executing the following commands:

Open Google
Play Song
Open Command Prompt

VIII. FUTURE SCOPE

Make the virtual assistant learn more on its own and develop new skills with it.
A virtual assistant Android application can also be developed.
Increase the number of voice terminals available.
Voice commands can be encrypted to maintain security.

The following are some of the places that may be relevant for the implementation of virtual assistants in the future:

a. Organisational Inquiry Desk: The system may be utilised in different organisations for simple access to information about the organisation using voice commands.

b. Embedded Systems: In embedded systems, voice commands may be used to handle multiple activities using speech recognition technology. This promotes the automation of labour and can thus be very advantageous in industrial process automation.

c. Application for People with Disabilities: People with disabilities may also benefit from voice recognition software. It is particularly beneficial for those who are unable to use their hands.

Conclusion

We\'ve covered Python-based personal virtual assistants for Windows in this research report. Human’s lives are made simpler by virtual assistants. Using a virtual assistant gives us the ability to use the services with just a single voice command. Python is used to create this virtual assistant for all Windows desktops, which is similar to Alexa, Cortana, Siri, and Google Assistant, which are available on smartphones. Artificial intelligence is used in this project, and virtual personal assistants are an excellent method to keep track of your calendar because of their portability, accuracy, and availability at any moment. Virtual personal assistants are more dependable than human personal assistants. Our virtual assistant will get to know you better and be able to provide ideas and follow orders. This device will most likely be with us for the rest of our lives. It is possible to enhance education by using immersive technology. Voice assistants may help students study in new and innovative ways. This article contains studies on the use of AI voice assistants in day-to-day life. Not much research has been done on voice assistants, but that\'s about to change. Based on this research, new discoveries can be made in the future. The next few years will be all about audio devices such as smart speakers and virtual assistants. Exactly how they will achieve success in the classroom remains a mystery. So not all voice assistants are bilingual, and this can be a problem. In addition, the lack of sufficient security and protection filters for voice assistants can be a bit of a problem. The use of these devices in the classroom can only be successful if instructors are given the proper training and incentives to do so. This system has adequate scope for modification in the future if it is necessary.

References

Applied Science & Engineering Technology (IJRASET), ISSN: 2321-9653, Vol.10, Issue VI, page no.3574-3575, June-2022 [2] Pooja C. Goutam, Monika S. Jalpure, Akshata S. Gavade, Pranjali Chaudhary, Prof. A.V. Gundavade, “VOICE ASSISTANT USING PYTHON”, International Journal of Creative Research Thoughts (IJCRT), ISSN: 2320-2882, Vol.10, Issue 6, page no.c802-c803, June-2022 [3] Abeed Sayyed , Ashpak Shaikh , Ashish Sancheti , Swikar Sangamnere , Prof. Jayant H Bhangale, “Desktop Assistant AI Using Python”, International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), ISSN (Online) 2581-9429, Vol.6, Issue 2, page no.1330-1333, June-2021 [4] www.stackoverflow.com [5] www.pythonprogramming.net [6] www.codecademy.com [7] www.tutorialspoint.com [8] www.google.co.in

Copyright

Copyright © 2023 Atish Patil, Madhuri Kardule, Praveen Gupta. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET54064

Publish Date : 2023-06-14

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here