WorkBuddy- A Personal Desktop Assistant

Authors: P. Pooja, Mehak Jain, K. Sahithya, K. Sravani, U. Jayasri

DOI Link: https://doi.org/10.22214/ijraset.2023.54225

Abstract

Artificial intelligence is a fundamental expertise in creating intelligent devices, particularly computer programmes. It is related to the well-known task of employing computers to comprehend human intellect. AI is roughly defined as the study of computations that enable perception, reasoning, and action. This paper describes a Python-based personal assistant for Windows-based platforms. It is essentially a software program that performs tasks or answers questions based on the user\'s instructions. It understands voice commands and performs various tasks using the command line interface. A user can get product reports, track products for price drops, get weather and news updates, check trending topics on Twitter, send email messages, read unread email messages of the user, download images, browse the Internet, open and close different applications all without using a keyboard. Humans have become increasingly dependent on computers as technology has advanced. Users have switched from text input to speech input to their assistants because they want their assistants to be smarter and customize their results. As a result, personal assistants are in greater demand. The user\'s productivity has increased to a great extent as he does not need to do the same regular tasks, thereby saving a lot of time and effort.

Introduction

I. INTRODUCTION

AI has taken a major part in our human lives, and we have seen many innovations and technological development in recent years. AI has changed our personal and professional lifestyles. AI Assistants increase our productivity by reducing processing time while not doing repetitive jobs. Accessibility is improved thanks to AI assistants, which help make technology more usable with disabilities. For example, those who are visually impaired can use voice commands to access information and control their devices more easily. We can generate Personalized Taglines and notifications as per our needs, which adds personalization to our workplace. We can have the convenience of doing our tasks manually or with the help of an AI assistant. AI assistants can help us exchange text messages or calls through voice commands when we are occupied, which makes communication easier.

The use of virtual assistants to regulate your environment has recently become commonplace. We utilize Google AI, Siri, Alexa, Cortana, and a variety of other virtual assistants to do things for us with a simple voice or audio command. You could instruct them to play music, open a specific file, or execute any other comparable operation, and they would do so with ease. While these gadgets are amazing, it is also interesting to create our own AI speech automated assistant that can be used to operate your desktop using your voice. We can utilize such an AI to talk with you, open videos, play music, and do a variety of other things.

In this paper, we will work on an introduction project for an AI assistant that will allow you to operate your Computer or any other comparable device using your voice. Speech and voice recognition are simple and basic jobs for humans to know. We can sense and respond to most human emotions by listening to and reading into various voices and figures of speech. Nevertheless, robots do not yet have a thorough comprehension of the emotions behind speech. Although we have yet to fully construct robots that grasp human feeling, we have successfully developed a number of technologies that can detect and interpret speech. When currently programmed, the AI can identify speech and establish a network mapping to interpret the discourse and do the appropriate task.

There are many voice assistants available in the market, like google assistant for google and android devices, Siri for apple devices, and Cortana for windows, but these assistants do have some drawbacks, such as they require an associated account linked to them even to perform daily tasks on the system with high-speed internet connectivity as a requirement. Also, it gets linked to many devices which have access to that account, such as mobile, tabs, laptops, and speakers, and there are high chances of data getting leaked. There are also some cases when they are slow to respond as they require high internet speed. Python is a versatile programming language and has an extensive library ecosystem. Python has a simple and intuitive syntax that makes it easy to learn and use. This has contributed to its popularity among beginners and experts alike. Python's ease of use, along with its powerful libraries and modules, makes it a popular choice. We can start with an overview of some of the essential prerequisites necessary to build this project before putting it all together in a Python file that will allow the AI Voice assistant to respond to our requests.

The instructions given to the assistants are natural audio of humans which is converted into computer format and sends the output back in human language. It can automate regular tasks and can perform many daily functionalities such as Getting Current news, Getting weather updates, Opening, and closing applications, Listening to music, Reading pdf, Creating a note, Notifiers, Wikipedia related queries, taking a Screenshot, Sending an email, Reading unread email messages, Sending WhatsApp Message, Searching for Product report, Adding Product to track, Downloading images, Checking Battery, Speed test, Checking the IP address, Sleep for some time, Today’s Date and time, Telling a joke, Shutdown, restart the system, Empty the recycle bin.

II. LITERATURE REVIEW

A. Existing System

Already there are many voice assistants available to perform tasks through our voice commands in mobile phones, tablets, and desktops

Alexa [1] is a virtual assistant that responds to voice commands. She can play music, run your smart home, answer questions, and link you to your favorite services in order to keep you organized, informed, safe, connected, and delighted. She's also your personal shopper because she's an Amazon product. Alexa, which is hosted in the cloud, is accessible via an increasing number of smart speakers and other Alexa-enabled devices.

Google Assistant [2] is a Google-developed virtual assistant software programme that is largely available on mobile and home automation devices. Google Assistant, which is powered by artificial intelligence, can hold two-way conversations, unlike the company's previous virtual assistant, Google Now.

Siri [3] is a virtual assistant that is included in Apple Inc.'s operating systems iOS, iPadOS, watchOS, macOS, tvOS, and audioOS. It answers questions, makes suggestions, and performs activities by delegating requests to a collection of Internet services using voice inquiries, gesture-based control, focus-tracking, and a natural-language user interface. It adjusts to users' specific language usages, searches, and preferences over time, giving personalized results.

Some of their limitations are:

Alexa

Privacy concerns: Because Alexa is constantly listening for the "wake word," she may inadvertently capture conversations or noises that you did not want to share.

Alexa [4] may not be compatible with all devices and services, and some functions may be accessible only in specific regions.

Inconsistent performance: Several users have expressed frustration with Alexa's speech recognition and response accuracy.

2. Google Assistant

Privacy concerns: Google Assistant, like Alexa, is constantly listening for the "Hey Google" wake word, and some users may be concerned about Google gathering data on their chats.

Inadequate device control: Google Assistant may not be able to manage all of your home's gadgets, and certain functionalities may not be accessible on all platforms.

While Google Assistant [4] is compatible with many major services and platforms, it may not be compatible with all of the apps and devices you use.

3. Siri

Siri [4] has limited device compatibility, which may restrict its utility for people who prefer alternative platforms.

Siri has fewer customization possibilities than Alexa or Google Assistant, and it may be unable to do some tasks or integrations.

Several customers have complained about Siri's speech recognition and response accuracy, which may be aggravating.

But the major existing system for desktop based assistants is Windows Cortana which has control over our laptop through our voice command and can perform OS related tasks also. To overcome the issues such as privacy concerns, Customizability and browser tracking through our commands, we have created a personal desktop assistant in python which can not track our history and collects the information through widely available open data on the internet.

The below table is a comparison of windows cortana [5] with desktop assistants in python.

Table 1: Comparison of Windows Cortana and Desktop assistant using python

FEATURE	WINDOWS CORTANA	DESKTOP ASSISTANT IN PYTHON
Inbuilt in OS	Yes	NO-we need some additional software
Availability	Only in windows OS	Can be installed in any type of OS
Prerequisite	Microsoft account needed	No account needed
Device Compatibility	Limited	Easy to install python
Integrity	Highly integrated to Microsoft's services, such as Outlook, OneDrive, and Bing	Not integrated to any account
Feature set	Limited	Extensible
User base	Large	Small
Open Source	No	Yes
Customizability	No	Yes
Flexibility	No	Yes
Technical expertise	No	Yes
Privacy Concerns	Highly risky as data is shared	Less risky

B. Proposed System

The major drawbacks of the existing system are the orders were not the same every time. It was not possible to offer a feeling of personalization in responses to the user. There were many security issues related to the assistants, as the user’s search data was shared online. There were some prerequisites to use them. Account’s were highly integrated to different services.

In our proposed system the data of the user’s commands is very less shared and information is extracted largely from publicly available data on the internet. Product tracking and web searches can become fully anonymous. When some confidential actions such as sending email messages or reading unread email messages, face authentication is necessary, and an alert SMS message is delivered to the User's registered mobile number.

Contact details of family, friends and colleagues are stored in the database along with their pet name, Full name, email id, mobile number, and their birth dates. Users need not remember their full name, or email id or mobile number to send messages, our assistant can map their details whenever called with their pet name or full name.

Using web scraping, we can obtain a product comparison report from the Amazon and Flipkart websites for a certain product search. We may compare prices from different options and then order or track the product link for price falls which can be easy to use without any difficulty.

A list of items, together with their links, is saved in the database to track for drop in price, and if the set price for a product is decreased, an email alert is sent to the user's email id, along with the price drop link. This feature is activated once each hour. It is simple to add and remove products from the tracking list.

It can also get the news updates, weather updates, trending topics on twitter, Wikipedia results, from largely available public data on the internet which does not require any account to link with.

It can also perform OS functionalities such as listening to music, Opening and closing of applications, Reading of pdf, Taking a Screenshot, Checking the Battery percentage, Taking the Speed test, Checking for IP address, Can sleep for some time, Informing today’s Date and time, Telling a joke, volume up, volume down, Volume mute, Shutting down and restarting the system, Emptying the recycle bin.

Speech Recognition

Using the speech recognition library in python, input is taken through the microphone and gets converted into a suitable form. Speech recognition includes computer science and linguistics to convert spoken words to text, allowing computers to understand human language. All the voice assistants like Siri, Cortana, and Google assistant use speech recognition to talk to you seamlessly by answering your queries.

2. API Calls

An API is the messenger who takes your request to the provider and then gives the reply. To get news and weather updates we used API calls to some websites which we trust and follow regularly.

3. System Calls

A system call is the process through which computer software seeks a service from the kernel of the operating system on which it is executing. Checking the Battery percentage, Checking for IP address, Sleeping for some time, Informing today’s Date and time, Telling a joke, volume up, volume down, Volume mute, Shutting down and restarting the system, Emptying the recycle bin all use system calls to kernel to inform the user with required details.

4. Threads

A thread is a distinct execution sequence. This implies that your software will have two things going on at the same time. Alarms, notifiers and the tasks such as tracking which need to run every minute are created as separate threads and executed every one minute along with the main thread which processes our requests.

5. Database Calls

Python supports a variety of databases, SQLite is perhaps the easiest database to connect to with a Python application since it does not require the installation of any other Python SQL components. Each time whenever we need to map the details, We can just say pet name , and a call is made to the database with the inputs and the required information from the database is returned, which is then processed by the next program.

6. Data Extraction(Web Scraping)

Data extraction is the process of extracting structured data from unstructured or semi-structured machine-readable resources. Product tracking and getting product reports use web scraping to get the data from different websites.

7. Face Recognition

Facial recognition is a means of identifying or verifying an individual's identification by utilizing their face. The Haar Cascade classifier algorithm is used in our system for real time face recognition. The model is trained for some faces and only those people can access our system’s assistant to perform tasks.

III. IMPLEMENTATION AND RESULTS

A. Pre- Requisites

We need to train our model for some faces which can have access to our system’s assistant.

We can login to use assistant only when 80% or above accuracy is obtained which is acceptable using harcassade’s python algorithm. Details of the persons with whom we contact frequently are inserted into the database. Contact Details such as their pet name, full name, phone number, whatsapp number, email id, date of birth are stored so that we need not remember them everytime. We also created tables to store the details of the products which we are going to track in the future, which will contain the details such as name of the product, link of the product, set price for price drop alert, and the name of the website from which the data is being tracked.

To run our program we need to start our main program, face recognition takes place and then our personal assistant starts waiting for a hot word which can be personalized, and then it starts listening to our command. It waits for important words in the command to be said by the user to perform the required task.

If it is not able to match for any of the commands it asks the users to repeat the command again.

11. Send WhatsApp Message

Users can send Whatsapp messages through the voice assistant using python's pywhatkit library. The user just needs to say what message needs to be sent and to whom to send and the message will be delivered.

As the user need not remember the full name or number of the person to send a message, the user can just send a message by saying the pet name of the user and it will directly map from the contacts stored. If the user doesn’t find any name or number, the assistant returns an error and asks again for the same command.

To initiate this function the user has to say send whatsapp message, whom to send, and then the message to be sended.

12. Listening to Music

Whenever you want to listen to a song you can say the name of the song and it will check for the songs available in your system, if it finds one it plays on the OS, else it uses the pywhatkit module of python and plays that song on youtube. The Function playonyt() opens YouTube in your default browser and plays the video you mentioned in the function. If you pass the topic name as a parameter, it plays a random video on it.

To initiate this function the user has to say play <song-name>.

13. Read PDF

You can ask WorkBuddy to read the pdf, it will inform you of the number of pages in the pdf, and start reading from the page number you have requested. It uses the pypdf python library to extract text from the pdf and read it for the user.
To initiate this function the user has to say read <pdf-name>; and it will read the pdf.

Conclusion

Voice assistants can be helpful in this digital era where we want to reduce our manual work and also want our work to be done automatically with just some effort. It can also be helpful to blind and illiterate people who do not know how to use the desktop to perform some tasks. Because this is a personal assistant, users\' inquiries and responses are personalized. This personal desktop assistant makes our lives easier as we have tried to automate all the manual-based tasks performed by the user on a daily basis, which saves more time and also increases the efficiency of the output with very less effort. Good speakers and microphones are required. Data shared through the internet in searching is reduced. Assistant can do almost all the daily life tasks which we perform daily, but the only problem with it is we need to remember the specific command to initiate those functions even though they are normal commands which we use in our daily life, which might become difficult to do. So, adding AI and ML and training the model can solve that problem. The interface of WorkBuddy can be more user-friendly as a chatbot type of application can be created, which can give us outputs in the form of voice as well as text which can be more UI friendly than the console based outputs. We can also customize our program by adding birthdays and important events with personalized messages and can schedule the messages to be sent. We can integrate our google calendar into our code, and then our assistant can just send a customized message via email or WhatsApp just through voice without any manual work. We can also extend the scope of our project to home applications. We can add different languages through which WorkBuddy can understand and perform the task accordingly.

References

[1] Purington, A., Taft, J.G., Sannon, S., Bazarova, N.N., Taylor, S.H.: Alexa is my new BFF: social roles, user satisfaction, and personification of the amazon echo. ACM, 6–11 May 2017. ISBN 978-1-4503-4656-6/17/05 [2] Google, \"Google Assistant. ,\" https://assistant.google.com/ [3] Bellegarda, J.R. (2014). Spoken Language Understanding for Natural Interaction: The Siri Experience. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_1 [4] C. -R. Yoo, S.-H. Kim, and J.-W. Kim, “A Comparative Study of the Use of Intelligent Personal Assistant Services Experiences: Siri, Google Assistant, Bixby,” Korean Society for Emotion and Sensibility , vol. 23, no. 1. Korean Society for Emotion and Sensibility, pp. 69–78, 31-Mar-2020. [5] Zhao, Y., Li, J., Zhang, S., Chen, L., Gong, Y.: Domain and speaker adaptation for Cortana speech recognition. In: ICASSP [6] Vora, Jash, Deepak Yadav, Ronak Jain, and Jaya Gupta. \"JARVIS: A PC Voice Assistant.\" (2021). [7] S. Malodia, N. Islam, P. Kaur and A. Dhir, \"Why Do People Use Artificial Intelligence (AI)-Enabled Voice Assistants?,\" in IEEE Transactions on Engineering Management, doi: 10.1109/TEM.2021.3117884. [8] Këpuska, V. and Bohouta, G., 2017. Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl, 7(03), pp.20-24. [9] Nivedita Singh, Dr. Diwakar Yagyasen, Mr. Surya Vikram Singh. Gaurav Kumar, Harshit Agrawal Department of CSE. Babu Banarasi Das National Institute of Technology and Management, Lucknow India. [10] Sangpal, R., Gawand, T., Vaykar, S. and Madhavi, N., 2019, July. JARVIS: An interpretation of AIML with integration of gTTS and Python. In 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) (Vol. 1, pp. 486-489). IEEE. [11] P. Meliorate?, S. School ¨ ?, G. Chollet?, J. Boudy, A. Esposito G. Pelosi “BUILDING THE NEXT GENERATION OF PERSONAL DIGITAL ASSISTANTS” ATSIP\'2014 March 17-19, 2014, Sousse, Tunisia ©2014 IEEE [12] A.M. Weeratunga, S.A.U. Jayawardena, Hasindu P.M.A.K., W.P.M. Prashan and S, Thelijjagoda\" Project Nethra - An Intelligent Assistant for the Visually Disabled to Interact with Internet Services\" 978-1-4799-1876-8/15/531.00 2015 IEEE. [13] Prajyot Mane, Shubham Senone\". Nachiket Gaikwad and Prof. Jyoti Ramteke “Smart Personal Assistant using Machine Learning\"978-1-5386-1887-5/17/$31.00 2017 IEE. [14] Veton Kepuska, Gamal Bohouta\" Next-Generation of Virtual Personal Assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home)\"978-1-5386- 4649-6/18/$31.00 €2018 IEEE [15] T. Kim, \"Short Research on Voice Control System Based on Artificial Intelligence Assistant,\" 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, Spain, 2020, pp. 1-2, doi: 10.1109/ICEIC49074.2020.9051160. [16] D. M. Thomas and S. Mathur, \"Data Analysis by Web Scraping using Python,\" 2019 3rd International Conference on Electronics, Communication, and Aerospace Technology (ICECA), Coimbatore, India, 2019, pp. 450- 454, doi: 10.1109/ICECA.2019.8822022. [17] K. N., R. V., S. S. S., and D. R., \"Intelligent Personal Assistant - Implementing Voice Commands enabling Speech Recognition,\" 2020 International Conference on System, Computation, Automation, and Networking (ICSCAN), Pondicherry, India, 2020, pp. 1-5, doi: 10.1109/ICSCAN49426.2020.9262279. [18] S. Kumari, Z. Naikwadi, A. Akole, and P. Darshankar, \"Enhancing College Chat Bot Assistant with the Help of Richer Human- Computer Interaction and Speech Recognition,\" 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2020, pp.427433, doi:10.1109/ICESC48915.202 0.9155951.

Copyright

Copyright © 2023 P. Pooja, Mehak Jain, K. Sahithya, K. Sravani, U. Jayasri. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET54225

Publish Date : 2023-06-18

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here