Air Doodle: A Realtime Virtual Drawing Tool

Authors: Soham Pardeshi, Madhuvanti Apar, Chaitanya Khot, Atharv Deshmukh

DOI Link: https://doi.org/10.22214/ijraset.2022.40919

Abstract

The most fascinating and challenging research area in the field of image processing and pattern recognition in upcoming years is hype for drawing characters or visualizing characters in real-time. A few projects in the respective field have been constructed but the focus over time is to increase the accuracy and resolution with reduced tension on the timing of producing the resulting image by the system. Air doodle is another project in the respective field where the user can draw characters in real-time with the help of a pre-defined object by feeding it to the system about the object to track in order to let the user draw characters in real-time. The project proposes to reduce the usage of papers, reduce the discomfort of marking an important part in a presentation, and much more. We will be using computer vision in the open cv to build the project. The required language for this project is python, which has exhaustive libraries that would help us attain the desired result.

Introduction

I. INTRODUCTION

With the evolution in technology, we will slowly move from the traditional pen and paper method to the more advanced human computer interaction systems. Due to covid 19 situation we have come up with this idea as we faced ample difficulties while studying online where teachers had to write on the paper and teach or while sharing screen, they had to draw on MS paint which was a difficult task. The aim behind our project is to build the hand movement recognition system to write digitally where drawing in air is made possible. Our system has the potential to challenge the old methods. Digital art includes many ways of writing like by using keyboard, touch-screen surface, digital pen, etc.

But in this system, we are using hand movement recognition with the use of python programming language, which creates natural interaction between man and machine. Here, we have built an Air Doodle which can draw the output as expected by the user on the screen by just capturing the motion of the coloured marker with the help of camera. A marker will be used as an object with a specific colour, where it can either be a pen, or a small cloth attached to the finger of the user. It works by creating a mask around the path that the user creates, which resets when the user clears the canvas and finally shows the output on the screen. For implementing this project, we will be using the Python language due to its exhaustive libraries and easy to make use of the syntax. But first, we need a good understanding of the basics of python and OpenCV. So, mainly focusing on its interface created to detect the solid colours from the environment such that its reference point is taken as input. This project’s execution is not exclusive to Python language. It can be implemented in any OpenCV supported languages. Along with OpenCV, we will also use some Python modules and some Scientific Python packages and libraries like NumPy, Tkinter, PyAutoGUI is one of the automation library used for the movement of cursor, OpenCV has a function to read video where cv2 library is used, The Deque library has the methods for adding and removing elements which can be invoked directly with arguments, machine learning language is also used for taking screenshots by using tkinter library which provides fast and easy way to create graphic user interface applications. Air Doodle also has a feature named ‘Clear All’ which will reset the output screen when the user clicks on it with the marker so once the program runs, there is no need to touch the computer screen. All the features will be accessible to the user just by using their coloured marker in the air as a navigation tool.

This paper's remainder is categorized as follows: The first section consists of the Abstract and Introduction followed by the second Section presents the other pieces of literature that we referred to before working on this project. In the next section i.e. Section three the challenges we faced while making this system are mentioned. In Section four, we define the problem statement we were solving and in Section five the system methodology and workflow that we followed in given. The subsections of section five include -Colour Recognition Dataset Creation and Colour Recognition Model Training. Finally, in Section six, information about algorithm of workflow is mentioned followed by the last section seven which concludes this paper.

II. LITERATURE REVIEW

A. Title: Interactive Object Registration and Recognition for Augmented Desk Interface

Authors: Takahiro Nishi, Yoichi Sato, Hideki Koike

In the previously conducted research identification of the human hand or the object which is supposed to be tracked played an important role in human-computer interaction. It is stated that for human-computer interaction the identification of an object is necessary and in the following research paper, they have proposed a new way for human-computer interaction without using any tags like the previous projects. In other words, the augmented desk interface, which is also called an Enhanced desk, means the user of the system can use system applications and physical objects. The proposal is to directly use human hands to interact with the system to make the system perform the tasks that user wants. In earlier projects, a tag

was being used to track or locate the object but sometimes it was difficult to tag the object as the object can be really big or too small to apply the tag on it and that is the reason this research being conducted to find the solution for the tag problem. So, the research starts with registering the object, the object or human is supposed to be placed in 60X60 pixel boxes provided and then the system will take a snapshot of it in order to register and recognize the object to be tracked or located. The image taken by the system is called a reference image. Then a histogram is created for the colour of the reference image for future tracking of the object. After the setup process now to test the system user can just put an object to be recognized onto the table provided and point to it using a single finger. If a single finger is noticed by the system pointing to an object for a certain duration of time, the system will try to capture a snapshot of that object and then it will compare the image taken to the reference image. In comparison, it will check the histogram of colour for both objects, and then it will provide the result of whether the objects are matching or not, mostly in terms of percentage. Sometimes a wrong image can also have the highest matching percentage for various reasons such as inadequate lightning or irregular background. So, to overcome this problem they started taking feedback from the user to check if the answer provided by the system is right or wrong. In the end, it was concluded that the research was conducted successfully and got the optimal results which were required.

B. Title: Hand Tracking and Gesture Recognition Using Lensless Smart Sensors

Authors: Lizy Abraham, Andrea Urru, Niccolò Normani, Mariusz P. Wilk ID, Michael Walsh and Brendan O’Flynn

The research was based on creating an end product that will be low on cost and affordable which will use LSS that is a Lensless Smart sensor, that will capture the information in a tiny or segmented form which requires the use of optical sensing. The product uses different algorithms which will give the results in millimetre level accuracy. The algorithms which were used are totally formula-based that helps the research team to track the finger movements with the help of LEDs fitted on the human hand. If the LEDs fitted on the human hand tracking of the hand with the help of geometrical algorithms becomes easier. The Human hand is one of the most complex and beautiful pieces of the human body which can move at significantly high speed so tracking of the human hand is not an easy task. Various researches have been conducted so far based on the main aim that is hand recognition and for those different techniques and technologies were applied in order to achieve the optimal result. Till this point the researches which were conducted using bulky cameras or cameras with low-resolution qualities were holding back a lot of limitations and in order to make the camera resolution better, the system became bulkier. Some research was also conducted with fixed setup and background in order to get better results in movement tracking. The author suggested that the major drawback was the camera, the camera being used in the research till this point is only able to capture 2D images and not a 3D scenario of the object. So, they decided to construct the hand in 3D to get a better idea about the research which would, in turn, provide the optimal result. Various techs like infrared cameras or sensor gloves or accelerometers were used for better tracking of the hand but still, it didn’t provide a satisfactory result. For 3D capturing VR, gloves were used along with highly effective sensors but still there was a limitation that the system was not providing the exact location of the hand which is being tracked. Since the first research conducted in this field, there has been a lot of improvement in the technology with respect to open-source softwares like OpenCV, Leap Motion, Kinect Motion sensor etc. The authors made Lensless Smart Sensor that is LSS, this LSS is their fundamental part that could track optical elements. It makes predictable hand patterns with the help of sensor plus it captures the smallest data possible for object tracking. But to track the object in such a depth means that the system is capturing a lot of information, so the Team had to feed in the perfect amount of data to the algorithms to achieve the result. As said earlier LEDs were being used for better finger movement tracking along with that a setup environment was created for better results. A lot of arithmetic operations were conducted, the object is tracked from different positions in different setups, a lot of different technologies were tried to get the better result. It gets concluded with tracking of an object or a human hand with better accuracy with the help of arithmetic formula, setup environment and LEDs mounted on the hand to get the result.

C. Title: Motion Detection Based on Frame Difference Method

Author: Nishu Singla

The research which was conducted mainly focused on the concept of building a system that can detect motions in the frame. The research team’s study is inspired by the concept of Human-computer interaction and computer vision. Recent studies in computer vision are based on human motion analysis, which is very popular, where the system tries to provide an output post analysing the gestures provided by humans as an input and this concept was part of the research conducted before. The cost of hardware being used such as cameras and a variety of new applications like person identification and visual surveillances can vary. The objective of the research being conducted is to find the motion of objects in the given set of pictures. The main goal of the research is to recognize pixels belonging to the same object that was being tracked. The research is based on some assumptions which include a well-fixed camera, a stable light without flickering, a contrasting background, and a camera with a high frame rate as well as resolution. In the end, the research was successful but there were a few limitations. One of the limitations was that during the tracking of a motion of a particular object by the system there was some hindrance in the required result due to the movement of air, later on the team noticed that the system was also capturing the movement of air in some sets.

D. Title: Robust Hand Recognition with a web camera

Authors: Zhou Ren, Junsong Yuan, Zhengyou Zhang, Jingjing Meng

Hand recognition was very important research conducted which led to the various discoveries and development in the field of technology. Hand recognition was among the pioneer projects which led to the concept called as Human-Computer Interaction and it also led to various applications in the field of technology especially in the context of tracking an object. Till this point whatever researches were conducted on this particular segment never provided satisfactory results and that is why it was not a smooth transition like expected.

This research was conducted to understand more about Human-Computer Interaction and increase in the transition for the similar types of projects. In the conducted research, Hand gesture recognition was divided in 2 parts 1> To locate the hand or the object and 2> To understand the gesture shown by the developer or the user. A few problems faced during the research conducted were, sufficient amount of lighting not present to conduct the practice which is a necessary requirement, the user should have proper amount of lighting to track or locate the hand or object. In low lighting, it becomes very difficult for the camera to track the moment or locate the hand.

The research was conducted after fulfilling the requirement of adequate lighting. In the related work system and published works a major problem that was identified was to locate the object with utmost clarity, some of the times it was due to technical limitations that the camera being used has a quality that is not up to the mark. While tracking the hand using a Kinect sensor the user has quite a few options to track the object, like the user can specify the colour they want to locate, or they can specify the shape of the user’s hand. During the process, the sensor of the camera being used first scans for a specified colour in the whole frame, the other option is that the camera can capture a picture then it will scan for a specified shape of a hand, the segmented handshapes are represented as time-series curves.

With the Kinect sensor, it is very easy to track objects which are huge in size like a human hand with great accuracy, but it becomes difficult to track a small-sized object like a button or maybe a cap of a pen or a marker. Due to which, the researching team came up with a solution of using FEMD that is Finger-Earth Mover’s Distance which tracks the human hand with accuracy that is required, and it uses all the data provided to it as an input to recognize hands of different types or different shapes. The result of the research conducted gave 90% accuracy on the tests performed on datasets which the team used. It worked accurately in an uncontrolled environment with the help of background clusters. Two applications were created with the help of end product 1> Arithmetic computation system: where the system will recognize the hand gestures to perform arithmetic operations on numbers provided, the system can take up to 14 different inputs then the system will evaluate the answer and display it 2> Rock Paper Scissor Game: same application but the difference here is that it is a game and the system decides the winner and the loser depending on the gestures of 2 players. They concluded it with the successful robust result for human hand computer interaction.

III. PROBLEM DEFINITION

To develop an interface between the system and its environment such that the system could identify colours to take input as a referral point that can interact with the system to perform some simple tasks such as doodling in the drawing area.

Air Doodle has the potential to solve some major real-world problems like:

A. Reduce Paper Wastage

Paper waste is generated on a large scale. Paper is often wasted in scribbling, drawing, painting, etc. We require large amount of wood and water to make paper which is harmful for the environment. This is where Air Doodle comes in picture, one can perform these simple tasks like scribbling, drawing, and painting with the help of our tool. This would be resulting in less paper wastage and will help also be eco-friendly in nature.

B. As an E-learning tool for the teachers and Professional

During the lockdown due to the COVID-19, all the working as well as teaching was shifted to online mode and resulted in online meetings and teaching. In such situation Air Doodle can be a great e-learning tool for the host of the meeting as the host has a variety of options to mark, point, or draw and take screenshots of the screen with the help of Air Doodle.

C. Substitute for Keyboard and Mouse

Air Doodle has the potential to replace computer hardware like mouse and keyboard. Due to this, the cost of computer hardware can be reduced. A large portion of e-waste is generated from computer hardware. With the help of this application, the e-waste generated due to these hardware components can be reduced.

IV. CHALLENGES IDENTIFIED

Our end problem description is to create a frontier with which user can interact with an object provided and the data will be loaded into a proposed system to get the required diagram. In the work system a major problem that was identified was to locate the object with utmost clarity, some of the times it was due to technical limitations that is the camera quality not being up to the mark. There were instances that the object to be tracked is too small to be traced or too big to be traced. Tiny object is too complex to track because it can be hard for camera to locate the object that small from the whole frame. The resolution of the camera also matters to track the object. In finger movement tracking it was identified that adequate light is also an important factor, eventually tracking becomes difficult if enough light is not present. In the practical work sometimes defining what type of object to track makes it easier for system to track it, so the system can directly look for the preferred object without hindering in the whole screen and compiling it in much quicker sense.

The other brief limitations which were faced are webcam not being of the required quality, object not defined, technical requirements not matched, system errors, system limitations, memory limitations, object tracking difficulties due to inadequate light or too bright colour of the object that is supposed to be tracked, object shape changed due to which tracking becomes difficult, system not recognizing the object even though the object being inside the screen input, too many objects of similar type inside the input screen that are getting tracked, user being too quick in context to give the input and system not bring able to track the whole path in such a quick succession.

V. METHODOLOGY

While working on the practical solution for the problems provided the very first thing is to correct your technical aspects and cover all the technical requirements specially the webcam resolution quality which is the most important part to track the object and draw the images on the canvas provided. Webcam needs to have clear projection of the input object to provide the result. The object being used to track through the webcam should be of ideal size and along with that the colour for that preferred object should be bright enough to be tracked. The object to be tracked should be specified properly to get the input in the form the developer requires. The perfect amount of light in the room is another aspect which should be covered in order of object tracking and image drawing. But too much light can reflect of the object so it should be in the perfect required amount also the position of source of the light is also important.

We are going to use technologies such as OpenCV, Gesture recognition, Tkinter, Human-Computer Interaction (HCI) for this project. OpenCV will be used for image processing, video capture and analysis including features like face detection and object detection. To understand gestures and execute commands based on those gestures we are going to use colour recognition. For enabling the computer to take a screenshot of the user’s doodle, we have used some libraries of the Tkinter module. As this project require a lot of input from the surrounding environment we are going to study about Human-Computer Interaction (HCI) to get a good grasp of what we need to do for our project to make it more efficient.

Here is how the Air Doodle works, as you can see the frames are capture and then the captured frames are converted to HSV colour space which makes it easy for colour detection.

In the above image Fig. 1. you can see that the colour detection algorithm is in effect. Currently, the colour detection algorithm is detecting blue colour from the surrounding. The black and white screen here shows us the detection of counters. With the help of NumPy the trajectory of the solid colour tip is tracked, and the input is taken. As you can see the algorithm is identifying the solid colour, it is possible because of colour detection algorithm in our code.

VI. ALGORITHM OF WORKFLOW

This is the part where the user interacts with the application. It is by far the most exciting and fascinating part of the application. Here, the user can use all the on-screen functions and features provided just by using the solid-coloured cap that would be used for drawing and navigation. The main feature of the application includes as shown in Fig. 2. below-

Writing / Drawing feature: The application will map and store the path drawn by the user in air with the help of the given solid-coloured cap in the desired colour.
Colour change feature: The user has the option to change the colour of the pen in which s/he can draw from one of the four provided colour options.
Screenshot feature: The user can take a screenshot of the whole window and save it to their desired location just by hovering their fingertip over the on-screen screenshot button or by pressing ‘s’ on their keyboard.
Clear screen feature: In case the user wants to clean the existing doodles, they can use the clear screen button which will clean all the doodles on the screen and make the canvas good as new.
Quitting the application: The user can quit the application by pressing ‘q’ on their keyboard shutting down the program immediately.

VII. ACKNOWLEDGMENT

We offer profound gratitude to Dr S. N. Gujar (Head of the department, Computer Engineering) for providing us with all the excellent academic facilities required to complete this work. We would like to thank him for simulating suggestions and encouragement, along with areas for improvement which helped us for the implementation and writing of this dissertation.

Conclusion

The project has the perspective to change the traditional ways of interpreting the information. It annihilates the need to carry a notebook to jot down notes, providing a simple solution of saving a soft copy. It increases the easiness to draw characters or write something in real-time without the need to jot down something in a notebook and then share the book. It also helps in drawing characters without the need of using a mouse or input pen. Drawing in real-time can also be made practicable which in turn would make the work for the user easier. In the future, the functionality of the system can be made better by introducing hand gestures with a pause, that can be used to control the real-time system instead of using an object.

References

[1] X. Liu, Y. Huang, X. Zhang, and L. Jin. \"Fingertip in the eye: A cascaded CNN pipeline for the real-time fingertip detection in egocentric videos,\" CoRR, abs/1511.02282, 2015. [2] Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky, \"Visual Hand Tracking Using Nonparametric Belief Propagation\", MIT Laboratory For Information & Decision Systems Technical Report P-2603, Presented at IEEE CVPR Workshop On Generative Model-Based Vision, Pp. 1-9, 2004 [3] Ruiduo Yang, Sudeep Sarkar, \"Coupled grouping and matching for sign and gesture recognition\", Computer Vision and Image Understanding, Elsevier, 2008 [4] Maryam Khosravi Nahouji, \"2D Finger Motion Tracking, Implementation For Android Based Smartphones\", Master\'s Thesis, CHALMERS Applied Information Technology,2012, pp 1-48 [5] Kenji Oka, Yoichi Sato, and Hideki Koike, \"Real-Time Fingertip Tracking and Gesture Recognition,\" IEEE Computer Graphics and Applications, 2002, pp.64-71. [6] Y. Huang, X. Liu, X. Zhang, and L. Jin, \"A Pointing Gesture Based Egocentric Interaction System: Dataset, Approach, and Application,\" 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, pp. 370-377, 2016. [7] P. Ramasamy, G. Prabhu, and R. Srinivasan, \"An economical air writing system is converting finger movements to text using a web camera,\" 2016 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, pp. 1-6, 2016. [8] Saira Beg, M. Fahad Khan and Faisal Baig, \"Text Writing in Air,\" Journal of Information Display Volume 14, Issue 4, 2013 [9] Alper Yilmaz, Omar Javed, Mubarak Shah, \"Object Tracking: A Survey\", ACM Computer Survey. Vol. 38, Issue. 4, Article 13, Pp. 1-45, 2006 [10] X. Liu, Y. Huang, X. Zhang, and L. Jin. \"Fingertip in the eye: A cascaded CNN pipeline for the real-time fingertip detection in egocentric videos,\" CoRR, abs/1511.02282, 2015.

Copyright

Copyright © 2022 Soham Pardeshi, Madhuvanti Apar, Chaitanya Khot, Atharv Deshmukh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET40919

Publish Date : 2022-03-22

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here