Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Mr. CH. Vijayananda Ratnam, Pradhyumna Sai Maddigunta, Leela Siddardha Annappureddy, Venkata Krishna Kolli, Saiteja Gurrala
DOI Link: https://doi.org/10.22214/ijraset.2024.59201
Certificate: View Certificate
As the number of vehicles on the road increases, it becomes essential to quickly and accurately identify them to better detect traffic congestion and improve traffic management. DeepSORT (simple online and real-time tracking with a deep association metric) multitarget tracking algorithm in vehicle tracking. Due to the strong dependence of the DeepSORT algorithm on target detection, a YOLOv8 vehicle detection algorithm was proposed based on YOLOv7, which provides accurate and fast vehicle detection data to the DeepSORT algorithm. DeepSORT introduces deep learning into the SORT algorithm by adding an appearance descriptor to reduce identity switches, making tracking more efficient. DeepSORT uses a better association metric that combines both motion and appearance descriptors. DeepSORT can be defined as the tracking algorithm that tracks objects not only based on the velocity and motion of the object but also the appearance of the object. This vehicle detection also uses the DeepSORT algorithm to help count the number of vehicles that pass in the video effectively. YOLOv8 and the DeepSORT algorithm collaborate to identify and follow vehicles creating a model, for vehicle recognition that showcases their ability to track cars efficiently. From this paper, the model Yolov8 has achieved state-of-the-art results.
I. INTRODUCTION
Vehicle data recognition is used by intelligent traffic and vehicle information systems. Academic interest in it has grown significantly since the start of this decade due to advancements in digital imaging technology and increases in processing power. Every year that goes by, the burden on people in charge of overseeing the population and the infrastructure that goes along with it gets heavier. The production of automobiles and other mechanical devices soared as a result. One such example is traffic jams on major roads and in large cities. A traffic monitoring system's main goal is to identify moving objects in a video, determine their location and speed, and offer a thorough framework for analyzing traffic conditions. Vehicle detection is a crucial step in the identification of the type of target objects and their localization inside a video frame. To precisely recognize vehicles in adverse conditions, the research uses deep learning-based object detection algorithms, specifically YOLOv8. DeepSORT is an improved version of the SORT algorithm which is one of the most popular state-of-the-art object-tracking frameworks which can be used for vehicle tracking. The DeepSORT tracking cannot track the object if the YOLOv8 cannot detect any bounding box of this object which leads to the degradation of the object tracking concerning the identity switches. Facing these situations, the missed object from YOLOv8 will get a new identity, and the same object will be detected again in the subsequent frames. Car counting can be achieved by using object detection, tracking, and classification to count the number of objects that meet specific criteria. Vehicle counting can be performed by tracking the number of unique vehicles detected or by analyzing the flow of vehicles across predefined boundaries.It is challenging to effectively find and categorize these cars during traffic flows, especially in complicated settings with several models and high density.
II. LITERATURE SURVEY
This literature review explains the latest advancements in real-time car detection and tracking while focusing on the integration of YOLOv8 (You Only Look Once version 8) for object detection and DeepSORT (Simple Online and real-time tracking with a Deep Association Metric) for multi-object tracking. YOLOv8 is an advanced object detection algorithm that is known for its speed and accuracy. YOLO (you only look once) makes it possible to detect objects in real-time in a single pass, including cars. Recent studies (Ali Osman Gökcan, Resul Çöteli, Derya Avc?, 2023) have shown that YOLOv8 is particularly effective at handling challenges such as occlusions, varying scales, and complex backgrounds, making it a great choice for real-time car detection applications. A study conducted by Fuheng Guo and Yi Xu on May 20, 2022, has successfully integrated DeepSORT with YOLOv5, which has resulted in improved tracking accuracy and robustness.
Several state-of-the-art object detection and tracking algorithms including SORT and DeepSORT were deployed by V.Mandal and Y. Gyamfi in [10] to detect and track different classes of vehicles in their region of interest and it has been stated that the trackers did not perform ideally at predicting vehicle trajectories which resulted in ID switches during occlusions. The combination of YOLOv8 and DeepSORT algorithms will resolve issues of tracking the non-linear motion vehicles and tracking through occlusions with a reduced number of ID switches. According to studies, these two algorithms work seamlessly together, with YOLOv8 providing precise detection and DeepSORT offering effective tracking across video sequences. The integration of YOLOv8-DeepSORT represents a significant advancement, there are limitations in terms of handling occlusions, scale variations, and diverse environmental scenarios. In conclusion, YOLOv8 coupled with DeepSORT provides a powerful solution for real-time automobile recognition and tracking. The study intends to provide a comprehensive solution that integrates accurate detection, robust tracking, and reliable counting of vehicles in real-time scenarios.
III. PROPOSED SYSTEM
A. YOLOv8
(You only look once) YOLOv8 is an object detection algorithm with modifications like spatial attention, feature fusion, and context aggregation. It uses CNN, which consists of two parts head and the backbone. CSPDarknet53 forms the backbone of the YOLOv8 model and the head consists of multiple CNN layers. The algorithm learns to predict bounding boxes and class probabilities by minimizing the loss function which consists of localization and classification errors where the loss function used for calculating localization error is the mean square error and Yolo makes use of binary cross entropy for predicting different classes. In the initial step, YOLOv8 divides the input image into n grid cells and each grid cell is responsible for predicting bounding boxes. These boxes are parameterized by their coordinates (x, y, width, height) relative to the grid cell. The model utilizes a feature pyramid network that detects larger or smaller objects within an image. Lastly, the NMS (Non-Maximum suppression) helps to filter out the redundant bounding boxes or overlapping predictions. Here NMS makes use of IOU (Intersection Over Union).
IOU= Area of Intersection/Area of Union
B. DeepSORT
It is an extension of SORT with deep association metrics. After the detection step, the Kalman filter is used for the state estimation. Kalman filter is employed with computer vision techniques which helps in estimating the scale area, and aspect ratio based on the observed measurements.
The next step involves target association where the assignment of existing targets to future predictions takes place with the help of the Hungarian algorithm. Whenever the objects are entering and leaving the image unique identities should be created or destroyed accordingly. For this trackers are created and the detections with an overlap less than IOUmin are considered. DeepSORT introduced another distance metric based on the appearance of the object “The Appearance feature vector”.
IV. METHODOLOGY
A. Dataset Structure
The COCO dataset is divided into three subsets, each serving distinct purposes in training and evaluating computer vision models. The "Train2017" subset comprises 118,000 images and is utilized for training models in object detection, segmentation, and captioning. The "Val2017" subset, consisting of 5,000 images, is specifically designated for validation during the training process, helping practitioners assess the model's performance on unseen data.
The "Test2017" subset, comprising 20,000 images, serves as a benchmark for evaluating trained models. It is crucial to note that ground truth annotations for the Test2017 subset are not publicly available. Instead, model results are submitted to the COCO evaluation server for performance assessment.
B. Model Training
The model training method starts with initializing the YOLOv8 architecture using pre-trained weights from the COCO dataset. The model is then fine-tuned based on the acquired and pre-processed data. During training, the model learns to recognize and categorize vehicles in pictures or video frames by improving its parameters using techniques such as backpropagation and gradient descent.
C. Object Tracking
Utilize the YOLOv8 model that has already been trained for car detection. Take a series of consecutive frames from a video or image stream, and apply the trained YOLOv8 model to each frame to detect the presence and location of cars. Obtain bounding boxes around identified cars in each frame. Implement tracking techniques to follow identified cars across frames. To estimate the state of each tracked car, use the Kalman filter, a recursive mathematical algorithm. Predict the next position of a car based on its previous state, and update the prediction with the actual measurement from the current frame. Alternatively, you can use the DeepSORT (Deep Simple Online and Realtime Tracking) algorithm for object tracking, which enhances tracking accuracy by associating detections with existing tracks and handling identity switches. Use the chosen tracking technique (Kalman filter or DeepSORT) to maintain continuity in tracking. Associate detections in each frame with existing tracks, ensuring a consistent and accurate tracking path for each identified car.
D. Evaluation metrics
The YOLOv8 algorithm, designed to detect cars, undergoes thorough testing to check how well it works. The evaluation of YOLOv8's performance on the COCO dataset usually involves metrics such as precision, recall, and mean Average Precision(mAP). These metrics are used to assess the performance of the YOLOv8 model in object detection tasks on the COCO dataset. Precision, labeled as P, indicates the correctness of detected cars, showcasing the ratio of accurate identifications among all instances highlighted by the model. Conversely, recall (R) evaluates the model's capability to detect all instances of cars within the images, offering an understanding of its thoroughness. mAP50 denotes the mean average precision determined with an IoU threshold of 0.50, evaluating the model's precision in detecting cars that are considered straightforward instances. Conversely, mAP50-95 represents the mean of the average precision calculated across different IoU thresholds ranging from 0.50 to 0.95.
E. Optimization
To make YOLOv8 work better, we can use optimization techniques. We can apply data augmentation and regularization techniques at the time of training. Implementing feature pyramids in YOLOv8 which helps in capturing multi-scale features. Including post-processing steps like NMS thresholds for refined detections and you can adjust the hyperparameters for increasing the detection accuracy.
V. RESULTS AND DISCUSSION
The pre-trained YOLOv8 works on the COCO dataset by utilizing a deep neural network architecture that has been trained on a large number of labeled images. This model can identify and find objects in images by splitting the image into a grid and predicting bounding boxes and class probabilities for each grid cell. YOLOv8+DeepSORT combines the YOLOv8 object detection model with the DeepSORT tracking algorithm to perform car detection and tracking on the COCO dataset. YOLOv8 is used to detect cars in each frame of a video or image, providing bounding box coordinates.
DeepSORT then associates these bounding boxes with unique IDs to track the detected cars across frames, maintaining their identities. This combination allows for real-time detection and tracking of cars in complex environments like the COCO dataset.
The tracking mechanism effectively handled identity switches and maintained consistent tracks for each car throughout the video sequence. YOLOv8, known for its real-time processing capabilities, ensured efficient and rapid detection of cars in each frame. DeepSort's ability to handle occlusion contributed to maintaining accurate tracks when cars temporarily disappeared from the field of view. The combined use of YOLOv8 for car detection and DeepSort for object tracking proved to be effective for real-time and accurate monitoring of car movements. YOLOv8 can be readily adapted to various types of vehicles not only cars, making it a flexible solution for different applications. The results and discussion here explain the efficacy of YOLOv8 for vehicle detection in the case of cars.
VI. ACKNOWLEDGMENT
we would like to acknowledge our sincere gratitude to our guide, whose guidance helped us from time to time with the completion of this project successfully, and we appreciate each one of those who have contributed to the fulfillment of this project.
In this paper, the model used for car detection, tracking, and counting using YOLOv8 and Deepsort showcased adaptable performance. Newer versions of YOLO are still being released, but considering the YOLOv8 among previous versions like YOLOv4, YOLOv5, and YOLOv7 the YOLOv8 algorithm with DeepSORT algorithm allowed for faster convergence and more precise identification of obstructed vehicle objects and tiny vehicle objects and it provides important observations regarding traffic patterns on highways, fostering transportation authorities to enhance the effectiveness and safety of the transportation network by making well-informed choices. On the other hand, the algorithm defined here does not take into account, detecting cars in different weather environments such as foggy days, rainy days, and in case of blurred video conditions. The system that was proposed here needs to be trained with more car videos based on dynamic situations. Further, we intend to add more features and reliable algorithms to improve the classification accuracy of cars and increase the efficiency of the system.
[1] Lixiong Lin, Hongqin He, Zhiping Xu, Dongjie Wu, “Realtime Vehicle Tracking Method Based on YOLOv5 + DeepSORT,” June 15, 2023. [2] “Ultralytics.GitHub.”https://github.com/ultralytics (accessed Jul. 07, 2023). [3] N. Sharma, S. Baral, M. P. Paing, and R. Chawuthai, “Parking Time Violation Tracking Using YOLOv8 and Tracking Algorithms,” Sensors, vol. 23, no. 13, p. 5843, Jun. 2023, doi: 10.3390/s23135843. [4] N. Kavitha, D. Chandrappa, “YOLOv2 based vehicle classification and tracking for an intelligent transportation system,” April 1, 2021. [5] Yuhan Wang, Hanlong Yang, “Multi-target Pedestrian Tracking Based on YOLOv5 and DeepSORT,” April 14, 2022. [6] J. Xiang, “Vehicle Counting with YOLO and DeepSORT.” [Online]. Available: https://github.com/ultralytics/yolov5 [7] Zhuang Li, Xincheng Tian, Yan Liu, Xiaorui Shi, “Vehicle Tracking Method Based on Attention YOLOv5 and Optimized DeepSort Models,” August 3, 2022. [8] Muhammad Azhad Bin Zuraimi, F. H. Kamaru Zaman, “Vehicle Detection and Tracking using YOLO and DeepSORT,” April 3, 2021. [9] J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement, https://arxiv.org/pdf/1804. 02767.pdf, Accessed in February 2020
Copyright © 2024 Mr. CH. Vijayananda Ratnam, Pradhyumna Sai Maddigunta, Leela Siddardha Annappureddy, Venkata Krishna Kolli, Saiteja Gurrala. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET59201
Publish Date : 2024-03-20
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here