Face Mask Detection Using Machine Learning

Authors: Chandan S, Lohith S, Yamini G B, Nithin Gowda, Shruthi N

DOI Link: https://doi.org/10.22214/ijraset.2022.43727

Abstract

COVID-19 pandemic has rapidly affected our day-to-day life disrupting the world trade and movements. Wearing a protective face mask has become a new normal. In the near future, many public service providers will ask the customers to wear masks correctly to avail of their services. Therefore, face mask detection has become a crucial task to help global society. This paper presents a simplified approach to achieve this purpose using some basic Machine Learning packages like TensorFlow, Keras and OpenCV. The application of “machine learning” and “artificial intelligence” has become popular within the last decade. Both terms are frequently used in science and media, sometimes interchangeably, sometimes with different meanings. In this work, we specify the contribution of machine learning to artificial intelligence. We review relevant literature and present a conceptual framework which clarifies the role of machine learning to build (artificial) intelligent agents. The proposed method detects the face from the image correctly and then identifies if it has a mask on it or not. As a surveillance task performer, it can also detect a face along with a mask in motion. The method attains accuracy up to 95.77% and 94.58% respectively on two different datasets. We explore optimized values of parameters using the mobileNetV2 which is a Convolutional Neural Network architecture to detect the presence of masks correctly without causing over- fitting.

Introduction

I. INTRODUCTION

Coronavirus disease (COVID-19) is a newly found coronavirus that causes an infectious disease.

The COVID-19 virus causes mild to moderate respiratory disease in the majority of people who are infected and recuperate without the need for extra care. People in their eighties and nineties, as well as those with underlying medical conditions

Cardiovascular disease, diabetes, chronic obstructive pulmonary disease, and cancer are all on the rise. There's a good chance you'll get sick.

The best way to prevent and slow down transmission is to be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol based rub frequently and not touching your face. When an infected individual coughs or sneezes, the COVID-19 virus transmits predominantly through droplets of saliva or discharge from the nose, therefore respiratory etiquette is particularly vital.

Masks are an important tool for preventing transmission and saving lives. Physical separation, avoiding crowded, closed and close contact settings, sufficient ventilation, wiping hands, covering sneezes and coughs, and more should all be part of a complete do-it-all approach that includes masks.

Masks can be used to protect healthy people or to prevent forward transmission, depending on the type COVID-19 pandemic has rapidly affected our day-to-day life disrupting the world trade and movements. Wearing a protective face mask has become a new normal. In the near future, many public service providers will ask the customers to wear masks correctly to avail of their services. Therefore, face mask detection has become a crucial task to help global society.

II. CONCEPTS TO BE UNDERSTOOD MACHINE LEARNING

A. What is Machine Learning?

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy. Machine learning is an important component of the growing field of data science.

Through the use of statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights within data mining projects.

These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. As big data continues to expand and grow, the market demand for data scientists will increase, requiring them to assist in the identification of the most relevant business questions and subsequently the data to answer them.

B. Machine Learning vs. Deep Learning

The way in which deep learning and machine learning differ is in how each algorithm learns. Deep learning automates much of the feature extraction piece of the process, eliminating some of the manual human intervention required and enabling the use of larger data sets. You can think of deep learning as "scalable machine learning". Classical, or "non-deep", machine learning is more dependent on human intervention to learn. Deep learning (also called deep machine learning) can leverage labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. It can ingest unstructured data in its raw form (e.g. text, images), and it can automatically determine the set of features which distinguish different categories of data from one another.

Unlike machine learning, it doesn't require human intervention to process data, allowing us to scale machine learning in more interesting ways

We break out the learning system of a machine learning algorithm into three main parts:

A Decision Process: In general, machine learning algorithms are used to make a prediction or classification. Based on some input data, which can be labelled or unlabeled, your algorithm will produce an estimate about a pattern in the data.
An Error Function: An error function serves to evaluate the prediction of the model. If there are known examples, an error function can make a comparison to assess the accuracy of the model
Model Optimization Process: If the model can fit better to the data points in the training set, then weights are adjusted to reduce the discrepancy between the known example and the model estimate. The algorithm will repeat this evaluate and optimize process, updating weights autonomously until a threshold of accuracy has been met.

Machine learning classifiers fall into three primary categories: ?

a. Supervised Machine Learning: Supervised learning, also known as supervised machine learning, is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately.

b. Unsupervised Machine Learning: Unsupervised learning, also known as unsupervised machine learning, uses machine learning algorithms to analyze and cluster unlabeled datasets.

c. Semi-supervised Learning: Semi-supervised learning offers a happy medium between supervised and unsupervised learning.

Here are just a few examples of machine learning you might encounter every day: ?

Speech Recognition

It is also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text, and it is a capability which uses natural language processing (NLP) to process human speech into a written format. Many mobile devices incorporate speech recognition into their systems to conduct voice search—e.g. Siri—or provide more accessibility around texting.

Customer Service

Online chatbots are replacing human agents along the customer journey. They answer frequently asked questions (FAQs) around topics, like shipping, or provide personalized advice, cross-selling products or suggesting sizes for users, changing the way we think about customer engagement across websites and social media platforms. Examples include messaging bots on e-commerce sites with virtual agents, messaging apps, such as Slack and Facebook Messenger, and tasks usually done by virtual assistants and voice assistants.

Computer Vision

This AI technology enables computers and systems to derive meaningful information from digital images, videos and other visual inputs, and based on those inputs, it can take action. This ability to provide recommendations distinguishes it from image recognition tasks. Powered by convolutional neural networks, computer vision has applications within photo tagging in social media and self- driving cars within the automotive industry.

III. SOFTWARES

A. PyCharm

B. Anaconda

C. Incorporated packages

TensorFlow: TensorFlow, an interface for expressing machine learning algorithms, is utilized for implementing ML systems into fabrication over a bunch of areas of computer science, including sentiment analysis, voice recognition, geographic information extraction, computer vision, text summarization, information retrieval, computational drug discovery and flaw detection to pursue research . In the proposed model, the whole Sequential CNN architecture (consists of several layers) uses TensorFlow at backend. It is also used to reshape the data (image) in the data processing.
Keras: Keras gives fundamental reflections and building units for creation and transportation of ML arrangements with high iteration velocity. It takes full advantage of the scalability and cross- platform capabilities of TensorFlow. The core data structures of Keras are layers and models [19]. All the layers used in the CNN model are implemented using Keras. Along with the conversion of the class vector to the binary class matrix in data processing, it helps to compile the overall model.
OpenCV: OpenCV (Open Source Computer Vision Library), an open-source computer vision and ML software library, is utilized to differentiate and recognize faces, recognize objects, group movements in recordings, trace progressive modules, follow eye gesture, track camera actions, expel red eyes from pictures taken utilizing flash, find comparative pictures from an image database, perceive landscape and set up markers to overlay it with increased reality and so forth [20]. The proposed method makes use of these features of OpenCV in resizing and color conversion of data images.

IV. METHODOLOGY

We are using two stage detectors to detect the face and apply the model. two-stage detectors follow a long line of reasoning in computer vision for the prediction and classification of region proposals. They first predict proposals in an image and then apply a classifier to these regions to classify potential detection. Various two-stage region proposal models have been proposed in past by researchers. Region-based convolutional neural network also abbreviated as R-CNN described in 2014 by Ross Girshicket al. It may have been one of the first large-scale applications of CNN to the problem of object localization and recognition.

The model generated state-of-the-art results when tested on benchmark datasets such as VOC-2012 and ILSVRC-2013. R-CNN uses a selective search technique to extract a set of item proposals at first, and then uses an SVM (Support Vector Machine) classifier to forecast objects and associated classes subsequently. SPPNet takes features from several area suggestions and feeds them into a fully connected layer for classification (modifies R-CNN with an SPP layer).

The ability of SPNN to construct feature maps of the entire image in a single shot resulted in a nearly 20-fold increase in object detection time over R-CNN. Then there's Fast R- CNN, which is a combination of R-CNN and SPPNet. To fine-tune the model, it adds a new layer called Region of Interest (ROI) pooling layer between shared convolutional layers. It also allows you to train a detector and a regressor at the same time without having to change the network setups.

First we turn on the anaconda prompt and direct the directory to the folder containing the “face_mask_detector” python script
we give the command python detect_mask_video.py in order to run the python code
The camera starts and detects faces on the screen
The faces are assigned bounding box with annotations based on our machine learning model
It will check whether the person is wearing mask or not if he is wearing it will assign green bounding box otherwise it will assign red bounding box

The process will end by pressing the q button.

V. ALGORITHM

A. Input

Dataset including faces with and without masks OUTPUT: Categorized image depicting the presence of face mask

Initially there is the necessity to train the model in order for it to detect whether the person is wearing a mask or not.
Each image is visualize the image in two categories and label them
Perform one hot encoding on the given data set
split the dataset into training dataset and testing dataset.
Construct the training image generator for data augmentation
load the mobileNetV2 network, create the base model and the head model
compile our model
serialize the model to disk
plot the training loss and accuracy
import the necessary packages for detecting a face in the given livestream
grab the dimensions of the frame and then construct a blob from it
pass the blob through the network and obtain the face detections
loop over the detections
load the face mask detector model from disk
initialize the video stream
loop over the frames from the video stream
detect whether the person is wearing a mask or not and show the percentage of mask being worn in the ROI (region of interest)

VI. WORKING

A. Data Processing

Data preprocessing involves conversion of data from a given format to much more user friendly, desired and meaningful format. It can be in any form like tables, images, videos, graphs, etc. This organized information fit in with an information model or composition and captures relationship between different entities. The proposed method deals with image and video data using Numpy and opencv.

B. Data Visualization

Data visualization is the process of transforming abstract data to meaningful representations using knowledge communication and insight discovery through encodings. It is helpful to study a particular pattern in the dataset

The total number of images in the dataset is visualized in both categories – ‘with mask’ and ‘without mask’. This categorizes the list of directories in the specified data path. The variable categories now look like: [‘with mask’, ‘without mask’]

Now, each category is mapped to its respective label using which at first returns an iterator of tuples in the form of zip object where the items in each passed iterator is paired together consequently. The mapped variable looks like: {‘with mask’: 0, ‘without mask’: 1}

C. Image Reshaping

The input during relegation of an image is a three- dimensional tensor, where each channel has a prominent unique pixel. All the images must have identically tantamount size corresponding to 3D feature tensor. However, neither images are customarily coextensive nor their corresponding feature tensors. Most CNNs can only accept fine- tuned images. This engenders several problems throughout data collection and implementation of model. However, reconfiguring the input images before augmenting them into the network can help to surmount this constraint.

The images are normalized to converge the pixel range between 0 and 1. As, the final layer of the neural network has 2 outputs – with mask and without mask i.e. It has categorical representationthe data is converted to categorical labels.

D. Building the model using CNN architecture

CNN has become ascendant in miscellaneous computer vision tasks. The current method makesuse of Sequential CNN. The First Convolution layer is followed by Rectified Linear Unit (relu) and maxpooling layers.

The Convolution layer learns from 128 filters. Kernel size is set to 7 x 7 which specifies the height and width of the 2D convolution window. As the model should be aware of the shape of the input expected, the first layer in the model needs to be provided with information about input shape. Following layers can perform instinctive shape reckoning. Default padding is “valid” where the spatial dimensions are sanctioned to truncate and the input volume is non- zero padded. The activation parameter to the Conv2D class is set as “relu”. It represents an approximately linear function that possesses all the assets of linear models that can easily be optimized with gradient- descent methods. Considering the performance and generalization in deep learning, it is better compared to other activation functions. Max Pooling is used to reduce the spatial dimensions of the output volume. Pool_size is set to 7 x 7 To reduce overfitting a Dropout layer with a 50% chance of setting inputs to zero is added to the model. The final layer (Dense) with two outputs for two categories uses the Softmax activation function.

E. Splitting the data and training the CNN model

After setting the blueprint to analyze the data, the model needs to be trained using a specific dataset and then to be tested against a different dataset. A proper model and optimized train_test_split help to produce accurate results while making a prediction. The test_size is set to 0.2 i.e. 80% data of the dataset undergoes training and the rest 20% goes for testing purposes. The validation loss is monitored using modelcheckpoint. Next, the images in the training set and the test set are fitted to the Sequential model. Here, 20% of the training data is used as validation data. The model is trained for 20 epochs (iterations) which maintains a trade-off between accuracy and chances of overfitting. With our deep learning models now in memory, our next step is to load and pre-process an input image Upon loading our image from disk, we make a copy and grab frame dimensions for future scaling and display purposes. Pre-processing is handled by opencv’s blob from image function. As shown in the parameters, we resize to 224×224 pixels and perform mean subtraction.

Once we know where each face is predicted to be, we’ll ensure they meet the threshold before we extract the face ROI We then compute bounding box value for a particular face and ensure that the box falls within the boundaries of the image

The face ROI through our mask-net model in this block, we:

Extract the face ROI using numpy Slicing Preprocess the ROI the same way we during training Perform mask detection to predict with mask or without mask. From here, we will annotate and display the result First we determine the class label based on probabilities returned by the mask detector model and then assign an associated color for the annotation the color will be “green” for with mask and “red” for without mask. We draw the label text (including class and probability), as well as a bounding box rectangle for the face, using opencv drawing functions

VII. EXPECTED RESULT
As you can see in the graph obtained from training the Model as the Epochs are increased the accuracy of the Model is increased and when we use the model we will Get an accuracy of upto 95% of whether the person has The mask or not.

In the proposed project, as discussed above, here we can see the bounding box along with the annotations is visible and the model is detecting that the person is wearing a mask or not with accuracy in the below figure.

A. Application

The Face Mask Detection System can be used at airports to detect travelers without masks.
Using Face Mask Detection System, Hospitals can monitor if their staff is wearing masks during their shift or not
The Face Mask Detection System can be used at office premises to detect if employees are maintaining safety standards at work

VIII. FUTURE OBJECTIVE FUTURE OBJECTIVE

Integrate the proposed model with public CCTV cameras to detect people not wearing masks in public places. For this we need wide range of dataset in order to train the model. Integrate the proposed model with thermal sensor cameras to detect both temperature of the person inside the frame and also detect whether the person is wearing mask or not.

Conclusion

The pandemic is not yet over and the government has lifted the lockdown and soon schools, colleges and offices will reopen. In these times we have to constantly monitor if everybody has worn their masks and are practicing social distancing. The proposed project with a few modifications can help battle the covid-19 and monitor people in public places and thus reduce the spread of this dangerous virus.

References

[1] Author: arjya das, mohammad wasif Ansari- https://ieeexplore.ieee.org/document/9342585 [2] Author: samuel adu sanjaya, suryo adi rakhmawan- https://ieeexplore.ieee.org/document/9325631 [3] Author: Adrian rosebrock- https://www.pyimagesearch.com/2020/05/04/covid-19-face-mask-detector-with-opencv-keras-tensorflow-and- deep-learning [4] Author: guru charan MK- https://towardsdatascience.com/covid-19-face-mask- detection-using--tensorflow-and-opencv-702dd833515b

Copyright

Copyright © 2022 Chandan S, Lohith S, Yamini G B, Nithin Gowda, Shruthi N. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET43727

Publish Date : 2022-06-02

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here