Due to covid nearly 27 crore people were affected by this pandemic including over 5 Lakh deaths as per WHO statistics. Covid disease is considered a pandemic when it is spread all over the world. The covid disease is being spread because of the contact of infected people with others. Therefore, to detain the spread of the virus we require an effective monitoring system that monitors people in public places. This model proposes a approach for social distancing detection using deep learning to calculate the distance between people to reduce the spread of this coronavirus. The detection tool was developed to assess a video feed and show the violations. The video captured using the camera was given as input, and the open-source object detection pre-trained model known as YOLOv3 (Transfer Learning) was used for detection because from an experimental analysis, it is proved that YOLO v3 tracking methods shows reliable results with better mAP and FPS score to detect people or objects in real-time. The camera captures the video. Using this project, we can make sure that people follow the rules of socialization.
Introduction
I. INTRODUCTION
It is essential to implement social isolation during the current world epidemic. Your chances of coming into contact with others are decreased by keeping a certain distance from other people, which also slows the transmission of diseases like the corona virus. COVID-19 also known as corona was firstly identified in Wuhan, China, in late December 2019. [1-2]. It can be difficult and time-consuming to precisely calculate the distance between individuals [10] in any setting. Therefore, it is the responsibility of every member of society to abide the social distance laws as prescribed by governing organisations in order to safeguard the entire neighbourhood. A Social Distancing Detector that helps the respective bodies in determining whether people are complying to the social distancing standards can be built [11] to support government efforts[3] and the provided public health guidelines [24].
Table 1: Comparison of different object detection models
Model
TT(in sec)
mAP
NoI
TL
FPS
SSD
9651
0.969
12135
0.02
3
Faster RCNN
2124
0.691
1200
0.22
10
YOLOv3
5659
0.846
7560
0.87
23
From the Table.1 we can notice that YOLOv3 gives better results when compared to SSD and Faster RCNN[4].
II. PROPOSED APPROACH
The proposed model makes use of the Yolo V3 neural network, a fully convolutional neural network technique, to identify persons in video frames [5-7]. YoloV3 is then fed input frames from pictures or videos taken by cameras. Following that, the system will present the findings for the specific video or photographs. Based on the COCO dataset, YOLO is capable of detecting 80 COCO item classes, including people, bicycles, cars, motorcycles, and more. The convolutional layers' learnt characteristics [8] are handed on to a classifier, as is customary for object detectors, which predicts the detection. In YOLO, a convolutional layer [14-15] with 11 convolutions uses as the foundation for the prediction. The YOLO method, which stands for "you only look once," employs an 1x1 prediction formula. This indicates that the prediction map's size is the same as the size of the feature map that came before it. Darknet-53 is currently used by YOLOv3. The YOLO founders Joseph Redmon and Ali Farhadi [10, 16, 24] also created the backbone known as Darknet-53. YOLO has three different version models they are v1, v2, and v3. The firstly developed YOLO v1 is 24 convolutional layers and 2 fully linked layers are used to make up this network. YOLO v1 only have one reduction layer next, YOLO v2 is developed with the which has better accuracy [10]. YOLO v2 makes use of Darknet-19, a backbone network made up of 19 convolution layers [12] along with five max pooling layers, an output softmax layer, and a layer for object categorization are also included.
“The performance of the next version YOLO v2 was significantly better than that of YOLO v1 thanks to improvements in mAP, FPS, and object categorization score. Instead of utilising softmax [9] like it did in YOLO v1 and v2, YOLO v3 does multi-label classification with the aid of logistic classifiers.” [10] Darknet-53 was the backbone architecture Redmon suggested for YOLO v3 for extracting feature maps for categorization. Darknet-53, is opposite of “Darknet-19, is made up of residual blocks (short links) and upsampling layers for concatenation and further network depth.”[10] The issue of not being able to detect small items or objects effectively was solved in YOLO v3 [13], which creates three predictions for each position in various sizes of an image. Below equation presents a schematic breakdown of the YOLOv3 design. 1) Loss function: The total loss entropy, and confidence loss for classification score sums up the overall loss function of YOLO v3.
III. IMPLEMENTATION
The below sequence diagram describes how the proposed model starts with obtaining the video or images followed by object detection model, which is YOLO v3. Firstly, the object detection models starts, using the pre-trained models it captures a video using a camera. Later, the video was transformed into top-down approach to measure distance starting from the 2D plane[19] , it is then followed by checking the social distance of all the people that are in the video we captured. The frames obtained from the video we record are shown as they are following the covid rules or not.
IV. RESULTS
Here are a few images of the working model's results. In the first step, the video input is used to identify pedestrians, after which red and blue colour boxes are used to signal infractions and the overall count is shown in the bottom right corner. Furthermore, when compared with other detection models like SSD, YOLO v3 gained a far better results even with better mAP, training time scores, and also FPS score. This model can be used further to monitor social distancing violations.
Conclusion
This model proposes an effective object recognition and tracking model based on real-time deep learning methods, where people are detected in real-time with the use of bounding boxes, to automate the process of monitoring the social distance. In order to find groups of persons that fulfil the closeness property calculated using a pairwise vectorised technique, bounding boxes are constructed. The number of violations is calculated by counting the number of groups that were present, and the violation index term is calculated as the proportion of individuals to groups. Modern object detection methods Faster RCNN, SSD, and YOLO v3 were used in the lengthy trials, and YOLO v3 demonstrated effective performance with good FPS and mAP score. This method may be fine-tuned to better adapt to the matching field of vision since it is extremely sensitive to the dimensional placement of the camera.
References
[1] W. H. Organization, “WHO corona-viruses (COVID19),”https://www.who.int/emergencies/diseases/novel-corona-virus-2019, 2020, [Online; accessed May 02, 2020].
[2] WHO, “Who director-general’s opening remarks at the media briefing on covid-19-11 march2020.”https://www.who.int/dg/speeches/detail/,2020, [Online; accessed March 12, 2020]
[3] D. Cunado, M. S. Nixon, and J. N. Carter, “Using gait as a biometric, via phase-weighted magnitude spectra” in International Conference on Audio-and Video-Based Biometric Person Authentication. Springer, 1997, pp. 93–102
[4] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE international conference on image processing (ICIP). IEEE, 2017, pp. 3645–3649.
[5] N. Wojke and A. Bewley, “Deep cosine metric learning for person re-identification,” in 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 2018, pp. 748–756.
[6] D. Cunado, M. S. Nixon, and J. N. Carter, “Using gait as a biometric, via phase-weighted magnitude spectra”,in International Conference on Audio-and Video-Bas.
[7] A. Samal and P. A. Iyengar, “Automatic recognition and analysis of human faces and facial expressions: A survey,” Pattern recognition, vol. 25, no. 1, pp. 65–77, 1992.
[8] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local svm approach,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 3. IEEE, 2004,pp. 32–36.
[9] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” Acm computing surveys (CSUR), vol. 38, no. 4, pp. 13–es, 2006.
[10] M. Andriluka, S. Roth, and B. Schiele, “People-tracking-by-detection and people detection-by-tracking,” in 2008 IEEE Conference on computer vision and pattern recognition. IEEE, 2008, pp. 1–8.
[11] Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali Agarwal and Gaurav Rai “Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLOv3 and Deepsort techniques” [accessed online at] https://arxiv.org/abs/2005.01385 , https://www.readkong.com/page/monitoring-covid-19-social-distancing-with-person-detection-8003222, https://www.arxiv-vanity.com/papers/2005.01385/
[12] M. Robakowska, A. Tyranska-Fobke, J. Nowak,D.Slezak,P.Zuratynski, P. Robakowski, K. Nadolny, and J. R. ?adny, “The use of drones during mass events,” Disaster and Emergency Medicine Journal, vol. 2, no. 3, pp. 129–134, 2017.
[13] N. Sulman, T. Sanocki, D. Goldgof, and R. Kasturi, “How effective is human video surveillance performance?” in 2008 19th International Conference on Pattern Recognition. IEEE, 2008, pp. 1–3.
[14] X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern recognition letters, vol. 34, no. 1, pp. 3–19, 2013.
[15] O. Javed and M. Shah, “Tracking and object classification for automated surveillance,” in European Conference on Computer Vision. Springer, 2002, pp. 343–357.
[16] K. A. Joshi and D. G. Thakore, “A survey on moving object detection and tracking in video surveillance system,” International Journal of Soft Computing and Engineering, vol. 2, no. 3, pp. 44–48, 2012.
[17] N. Wojke and A. Bewley, “Deep cosine metric learning for person re-identification,” in 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 2018, pp. 748–756.
[18] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L. Shyu, S.-C. Chen, and S. Iyengar, “A survey on deep learning: Algorithms, techniques, and applications,” ACM Computing Surveys (CSUR), vol. 51,no. 5, pp. 1–36, 2018.
[19] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and deep learning techniques for pedestrian detection and tracking: A survey,” Neurocomputing, vol. 300, pp. 17–33, 2018.
[20] N. S. Punn and S. Agarwal, “Crowd analysis for congestion control early warning system on foot over bridge,” in 2019 Twelfth International Conference on Contemporary Computing (IC3). IEEE, 2019, pp. 1–6.
[21] Pias, “Object detection and distance measurement,”https://github.com/paulpias/object-Detection-and-DistanceMeasurement, 2020, [Online; accessed 01 March-2020].
[22] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
[23] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A.C.Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[24] J. R. A. Farhadi and J. Redmon, “Yolov3: An incremental improvement,” Retrieved September, vol. 17, p. 2018, 2018.
[25] Gaurav Mathurkar, Chinmay Parkhi, Manish Utekar and Ms. Pallavi H. Chitte, “Ensuring social distancing using machine learning” published by EDP Sciences, 2021