In recent years, deep learning has had a significant impact on “how the world is adjusting to artificial intelligence”. Region-based Convolutional Neural Networks (RCNN), Faster R-CNN, Single Shot Detector (SSD), and You Only Look Once (YOLO) are a few of the well-known object identification techniques. When speed is prioritized above accuracy, YOLO outperforms others, with Faster-RCNN and SSD having greater accuracy. In order to execute detection and tracking efficiently, deep learning blends SSD and Mobile Nets. This method detects objects effectively without sacrificing speed.
Introduction
I. INTRODUCTION
Since AlexNet took the research community by storm in 2012, ImageNet has significantly outperformed the most conventional artificial vision techniques utilized in literature for detecting in-depth learning. The neural convolution networks stand out in artificial vision for classifying pictures.
The fundamental block diagram of detection and tracking is shown in Fig. 1. In this, identification and tracking algorithms based on MobileNets and SSD are implemented in a Python context. Object detection includes identifying an object's region of interest within a class of images.
Frame differencing, optical flow, and background removal are some other techniques. This is a technique for using a camera to identify and locate an item that is moving. For security applications, detection and tracking algorithms are defined by taking the attributes of image and video [3] [7] [8]. CNN and deep learning are used to extract features [9]. Classifiers are used to categorize and count images [6]. For feature extraction and classification, the YOLO-based approach with the GMM model will provide high accuracy [10]. The SSD and MobileNets algorithms are described in Section II, the implementation strategy is explained in Section III, and the simulation results and analysis are discussed in Section IV.
II. ALGORITHMS FOR OBJECT DETECTION AND TRACKING
A. A Single Shot Detector (SSD) Algorithm
A well-known object identification technique called SSD was created by Google Inc. [1]. It has a VGG-16 architectural foundation. SSD is therefore straightforward and simpler to deploy.
VGG 16 SSD model is depicted in Fig. 2. Convolutionally passing across numerous feature maps is done using a collection of default boxes. A score is produced if an item observed during prediction is one of the object classifiers. The form of the item is modified to fit the localization box. Shape offsets and confidence levels are forecasted for each box.
The ground truth boxes and the default boxes are matched during training. The SSD design ignores the completely linked layers. Confidence loss and localization loss are weighted together to get the model loss. Localization loss is a measurement of the difference between the anticipated box and the actual box.
Confidence is a gauge of how confident a system is that a predicted thing is indeed the expected object. It is straightforward to train with MobileNets because of the elimination of feature resampling and SSD's encapsulation of all computing in a single network. SSD is quicker than YOLO and uses a mechanism that does explicit region suggestions and pooling (including Faster R-CNN).
III. IMPLEMENTATION METHODS
A. Detection of Objects
Frame Variation:At regular intervals, the camera records a frame. From the following frames, the difference is estimated.
Visual Flow: Using an optical flow algorithm, this approach calculates and estimates the optical flow field. Then, to improve it, a local mean method is employed. A self-adaptive algorithm is used to filter the noise. It is practical in eliminating time-consuming and difficult preprocessing procedures and has a wide range of adaptation to the quantity and size of the items.
Background Subtraction:The background subtraction (BS) approach is a quick way to identify moving objects in a video taken by a stationary camera. An elaborate vision system has this as its first stage. This kind of image processing divides the background from the foreground item sequentially.
Figure 3 shows person detection using background removal. The image's foreground or subject is recognised and distinguished from the background for additional preprocessing. The separation effect is demonstrated step by step, and then the region of interest is localized.
4. Tracking of Objects:The goal is to track an object's route and speed using video feeds from surveillance cameras and other security systems. By using object tracking and performing classification in a small number of frames taken over a set period of time, the rate of real-time detection may be enhanced. When hunting for objects to lock onto, object detection may proceed at a sluggish frame rate. Once those items are found and locked, object tracking may proceed at a quicker frame rate.
Figure 4 depicts an automobile being tracked. In the example above, there are two approaches to track the object: (1) Tracking in a sequence of detection. This approach involves recording a CCTV video series of moving traffic. If someone wanted to monitor the passage of a car or a person in this area, he would snap various pictures or frames at various intervals of time. These photos can be used to target an item, such as a person or automobile. I can then track my item by looking at how it has changed in the various video frames. The displacement of the item may be verified using several frames captured at various time intervals, and the velocity of the object can then be determined."Detection using dynamics" is an improved approach. This approach involves estimating the trajectory or movement of the automobile. By determining its location at a specific moment, let's say "t," and predicting its location at a later time, let's say "t+10." With the use of estimate, a suggested image of the automobile at time "t+10" may be generated from this real image.
IV. SIMULATION RESULTS AND ANALYSIS
A Python programme for the technique was created and implemented in OpenCV based on the SSD algorithm [5]. Ubuntu IDE is used to run OpenCV. This model was trained on a total of 21 items. After successfully scanning, detecting, and tracking the video sequence that the camera gave, the findings are as follows.
Real-time detection of a bicycle, bus, train, and dog is shown in figures 6 to 8 with confidence levels of 86%, 68%, 90%, and 77%, respectively. The model was taught to identify 21 classes of items with a 90% accuracy rate, such as a dog, motorbike, human, potted plant, bird, car, cat, sofa, sheep, bottle, chair, airplane, train, and bicycle.
Conclusion
In real-time circumstances, objects are recognised using the SSD method. Additionally, SSD has demonstrated findings with a high degree of confidence. The primary goal of the SSD algorithm is to identify numerous objects in a real-time video stream and to follow them. The trained item produced good detection and tracking results, and this model may be used in other contexts to find, follow, and react to the targeted objects in the video surveillance. This ecosystem analysis in real time, which enables security, order, and utility for any organization, may produce excellent outcomes. expanding the work to include the detection of weapons and ammunition to raise the alert in the event of terrorist strikes.
References
[1] Wei Liu and Alexander C. Berg, “SSD: Single Shot MultiBox Detector”, Google Inc., Dec 2016.
[2] Andrew G. Howard, and Hartwig Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, Google Inc., 17 Apr 2017.
[3] Justin Lai, Sydney Maples, “Ammunition Detection: Developing a RealTime Gun Detection Classifier”, Stanford University, Feb 2017
[4] Shreyamsh Kamate, “UAV: Application of Object Detection and Tracking Techniques for Unmanned Aerial Vehicles”, Texas A&M University, 2015.
[5] Adrian Rosebrock, “Object detection with deep learning and OpenCV”, pyimage search.
[6] Mohana and H. V. R. Aradhya, \"Elegant and efficient algorithms for real time object detection, counting and classification for video surveillance applications from single fixed camera,\" 2016 International Conference on Circuits, Controls, Communications and Computing (I4C), Bangalore, 2016, pp. 1-7.
[7] Akshay Mangawati, Mohana, Mohammed Leesan, H. V. Ravish Aradhya, “Object Tracking Algorithms for video surveillance applications” International conference on communication and signal processing (ICCSP), India, 2018, pp. 0676-0680.
[8] Apoorva Raghunandan, Mohana, Pakala Raghav and H. V. Ravish Aradhya, “Object Detection Algorithms for video surveillance applications” International conference on communication and signal processing (ICCSP), India, 2018, pp. 0570-0575.
[9] Manjunath Jogin, Mohana, “Feature extraction using Convolution Neural Networks (CNN) and Deep Learning” 2018 IEEE International Conference On Recent Trends In Electronics Information Communication Technology,(RTEICT) 2018, India.
[10] Arka Prava Jana, Abhiraj Biswas, Mohana, “YOLO based Detection and Classification of Objects in video records” 2018 IEEE International Conference On Recent Trends In Electronics Information Communication Technology,(RTEICT) 2018, India.
[11] Chandan, G., Jain, A., Jain, H., & Mohana. (2018). Real Time Object Detection and Tracking Using Deep Learning and OpenCV. 2018 International Conference on Inventive Research in Computing Applications (ICIRCA).