People Counting and Tracking System in Real-Time Using Deep Learning Techniques

Authors: O Obulesu, A. Hanshika, A. Lahari, P. Arunodaya, R. Ashwini

DOI Link: https://doi.org/10.22214/ijraset.2023.54182

Abstract

The aim of project is to automatically estimate the number of people at indoor and outdoor places. People counting systems can be used in retail environment such as determining conversion ratio, advertising and promotional evaluation. This system can be used for transportation management system and video surveillance. The number of customers is indispensable data for management and decision making in public places like large-scale markets, shopping centers, airports, stations, museums, laboratories, classrooms, cafeteria etc. In this system, firstly extract the frames from the video, then draw a desired reference line on the input frame, detect the people using MobileNet-SSD object detection model, mark the centroid on the detected person, track the movement of that marked centroid and calculate the direction of centroid movement whether it is moving upwards or downwards. If the centroid movement is downward direction, then increment in counter, else if the centroid movement is upward direction, then increment out counter. People counting is essential for retailers of any size, but it\'s especially important for small businesses that don’t have the benefit of assaying data from multitudinous locales whenmaking pivotal opinions. When used intelligently, people counting can shape businesses in multitudinous ways other than just furnishing information on nethermost business.

Introduction

I. INTRODUCTION

The analysis of crowds is a popular research topic that has gained attention due to its various practical applications. Automated people counting systems are designed to estimate the number of individuals present in both indoor andoutdoor environments. These countingtechnologies serve as the foundation for a rangeof advanced solutions, such as retail analytics, queue management, and space utilization. The use of real-time information on people flow is especially beneficial for applications related to crowd management and security, including the management of pedestrian traffic and visitor flow evaluations. It is widely acknowledged that tracking and monitoring the movement of peoplein an area is of significant importance. People counting and tracking systems have been used for many years in various applications. For example, in the early 20th century, manual people counting systems were used in retail stores to track the number of customers. Later, inthe 1960s and 70s, mechanical and electronic systems were developed to automate people counting.

In the 1990s, video-based systems were introduced, and since then, there have been numerous technological advancements in computer vision and artificial Intelligence that have greatly improved the accuracy and efficiency of these systems. In today's world, it is crucial for companies to consider potential solutions that can ensure the safety of both staffand customers. This is especially important as new legislation is being introduced worldwide to regulate occupancy and social distancing. The importance of people has never been greater, andit is now possible to accurately count visitors using technology. Sensors connected to software can provide real-time occupancy data, whichhelps determine whether a location can safely accommodate more people or not. By using thistechnology, overcrowding can be prevented before it becomes a serious issue, ensuring that alocation does not exceed its capacity. This is essential in keeping both customers and employees safe. Collecting data on theefficiency of a shop's operations is essential for maximizing a company's potential. By monitoring customer satisfaction, companies can implement ideas to enhance the consumer experience, which can lead to increased revenues and customer loyalty. People counters are a practical solution for counting and monitoring the flow of visitors in and out of a facility. The information gathered by these counters can be segmented and used to determine visitor arrival and departure times, aswell as the level of engagement customers have with products and services. People counting technology can also reveal where visitors spendthe most time in a shop and which products are most popular.

This information is valuable for companies seeking to optimize their marketing efforts and increase sales. In the retail industry, people counting technology is used to determine the precise number of consumers who visit a shop. People counting technology can providevaluable insights to executives and organizers ofevents and exhibits. By monitoring foot traffic and fluctuations throughout the day and week, executives can design effective marketing programs and plan for the future.

For event organizers, people counting technology can be used to report the total number of participants and identify the event's busiest hours. This information allows management to assign staff members more effectively, based on the numberof people attending the event. These insights canhelp ensure that events and exhibits run smoothly and provide a positive experience for attendees. People counting technology has beenaround for several decades, and its developmenthas been shaped by various technological advances and market demands. The evolution ofpeople counting technology has been quite significant, from manual tallying methods to modern sensor-based systems, and has been driven by the need for more accurate and real- time data on human behavior. The key drivers behind the adoption of people counting technology have been the increasing demand fordata-driven decision-making, the need for efficient resource allocation, and the desire to enhance customer and patient experience.

There are numerous sorts of human beings counting technology, every with its personal benefits and disadvantages. Thermal imaging, infrared sensors, video analytics, and RFID (radio-frequency identification) are some of the most common types of people counting technology. Thermal imaging is based on detecting body heat,while infrared sensors use beams of light to detect human presence. Video analytics use cameras and software to analyze video footage, while RFID relies on radio signalsto detect the presence of a tag or card. Each of these technologies has its own unique features and is suitable for different applications. Peoplecounting technology generates large amounts ofdata, which must be collected, processed and analyzed to derive insights. There are different methods of data collection, such as cloud-based systems and edge computing. Cloud-based systems store data on remote servers, while edgecomputing processes data on local devices, suchas sensors or cameras. The tools and techniquesused for data analysis include machine learning and predictive analytics, which help to identify patterns and trends in the data and make predictions about future behavior accuracy andreliability of people counting technology are crucial for its usefulness and adoption. Several factors influence accuracy and reliability, such as sensor placement, lighting conditions, and occlusions. The methods used to validate the accuracy of people counting systems includemanual counting, video verification, and statistical analysis. Validation is essential to ensure that the data generated by people counting technology is accurate and reliable, andcan be used for decision making.

II. RELATED WORKS

The task of people counting and tracking is ofgreat importance in many fields, including security, retail, transportation, and crowd management. Over the years, various methods have been proposed for people counting andtracking, ranging from traditional approachessuch as video analysis and sensor networks to more advanced techniques based on deep learning. One of the earliest and most widely used approaches for people counting and tracking is video analysis. Video analysis techniques involve processing video data to detect and track people in real-time. These techniques can be classified into two categories:frame-based and pixel-based. Frame-based techniques, such as background subtraction, template matching, and optical flow, analyze each frame of the video independently to detect and track people. Pixel-based techniques, such as blob analysis and contour tracking, analyze the movement of pixels over time to detect and track people. While these techniques can be effective in simple scenarios, they are often limited by issues such as occlusions, changes in lighting conditions, and the presence of clutter inthe environment. To overcome the limitations oftraditional video analysis techniques, researchers have proposed various deep learning-based approaches for people counting and tracking. Deep learning techniques are basedon the use of neural networks to automatically learn features from large amounts of data. Thesetechniques have shown promising results in a variety of computer vision tasks, including object detection, recognition, and tracking. Some of the most commonly used deep learningtechniques for people counting and tracking include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. One of the most popular deep learningtechniques for object detection and recognition is Convolutional Neural Networks (CNNs). CNNs have been successfully applied to various computer vision tasks, including object detectionand tracking.

D. Zhang et al. [1] reviewed the existing techniques for people counting and crowd density estimation, including traditional methods and recent advances in deep learning- based approaches and provided a comprehensiveoverview of the various techniques, their strengths and limitations, and their applications in different scenarios. He further highlights the importance of accurate people counting and tracking in crowd management.

S. L. Choi et al. [2] developed a multi-camerapeople counting system using deep learning-based object detection and tracking techniques.

The system uses a combination of CNNs and LSTMs to detect and track people across multiple cameras in real-time. The system is evaluated on a public dataset and achieves high real-time performance.

K. Raja et al. [3] provided an overview of various deep learning techniques for object detection and recognition, including their strengths and limitations.

He discussed about some popular techniques such as Faster R-CNN,YOLO, and SSD, and compares their performance on different datasets, but does not focus specifically on people counting andtracking, it provides a valuable overview of the state-of-the-art deep learning techniques in the field of computer vision. Another deep learningtechnique that has been used for people countingand tracking is Recurrent Neural Networks (RNNs). RNNs are designed to handlesequential data, making them well-suited for tasks such as tracking objects over time.

J. Guo et al. [4] propose a pedestrian detectionsystem using RNNs. The system is designed to detect pedestrians in video sequences and track them over time. An analysis of the existing works shows the effectiveness of deep learning techniques for people counting and tracking.These techniques have been applied successfullyin various scenarios, including object detection,recognition, and tracking. The use of multiple cameras and the integration of different deep learning techniques have also been investigated.While these techniques have shown promising results, there are still challenges to be addressed,such as occlusions, changes in lighting conditions, and the presence of clutter in the environment. Future research in this area shouldfocus on addressing these challenges and developing more robust people counting andtracking systems.

Joseph Redmon et al. [5] proposed a real-timeobject detection system using CNNs. The systemis called YOLO (You Only Look Once) and is designed to detect objects in real-time. The system is evaluated on several datasets and achieves high performance in real-time.

III. PROPOSED METHODOLOGY

The objective of this project is to develop a system capable of tracking and keeping a recordof the number of individuals present in a given location. The system can track the movement of people entering and leaving a building and stores this information. If the number of people exceeds a specific threshold, an alert message is sent to the user. A webcam is used as the video input source. The FPS throughput rate is determined, and the frames are converted to RGB and resized for image processing and computer vision using OpenCV. The OpenCV library is also used to perform deep neural network inference, to show output frames on thescreen, and to open and write video files. The proposed system employs Object Detection and Object Tracking in two phases to increase accuracy. For Object Detection, Mobile-Net Single Shot Detector (SSD) is used, which is a pre-trained deep learning model. We only run itonce every N frames due to its computational cost. To track detected objects, we use a combination of Correlations filters and centroid tracking algorithm.

To track objects, the coordinates of the bounding boxes are used to determine the center or centroid. We then calculate the Euclidean distance between existing and new centroids andassign unique object IDs. Objects that enter the field are registered, and those that leave arederegistered. The Dlib library is used for implementing object tracking. OpenCV and Python are utilized for the implementation and prediction of crowd count.

This system has various practicalapplications, such as monitoring the number of people in public spaces like shopping centers, museums, and libraries, and alerting authorities when the number of people exceeds the permitted threshold. It can also be used for security purposes in restricted areas where only a limited number of people are allowed to enter.Furthermore, the system could be used to monitor and manage large crowds at events,concerts, or festivals, ensuring that the capacity of the venue is not exceeded.

A. Description of MobileNet-SSD
MobileNet SSD is an item detection version that computes the output bounding field and itemelegance from the enter image. This Single ShotDetector (SSD) item detection version makes use of MobileNet as a spine and might reap rapiditem detection optimized for cell devices. The architecture is shown in Fig 1 MobileNet SSD layered Architecture.

This model is optimized for mobile devices and provides fast and accurate object detection. The architecture of MobileNet SSD consists of two main components: MobileNet and Single Shot Detector (SSD). MobileNet is a lightweightdeep neural network architecture that is designedfor mobile devices with limited computational resources. It is built upon depth-wise separable convolutions, which decompose a standardconvolution into a depth-wise convolution and apoint-wise convolution. The depth-wise convolution applies a single filter to each input channel, while the point-wise convolution applies a 1x1 filter to each output channel of thedepth-wise convolution. This approach significantly reduces the number of parameters in the network, resulting in a smaller and more efficient model.

MobileNet is used as a feature extractor in theMobileNet SSD architecture. The input image ispassed through several convolutional layers, with the depth-wise separable convolutions applied at each layer. The output of the last convolutional layer is a feature map, which contains high-level features that are useful for object detection. The SSD component of the architecture takes the feature map generated by MobileNet and performs object detection on it. The SSD approach is a single-shot detector that predicts the object class and location in a single forward pass of the network. The SSD approachinvolves dividing the input image into a grid of cells, with each cell responsible for predicting the presence of objects within it.

For each grid cell, the SSD predicts a set of bounding boxes, each with a different aspect ratio and size. These bounding boxes are definedas offsets from the coordinates of the grid cell. The SSD also predicts the confidence score for each bounding box, which represents the likelihood that the object is present in that box. The confidence score is calculated based on the overlap between the predicted bounding box andthe ground truth bounding box. The SSD also predicts the class of the object in each boundingbox. The class prediction is based on a set of learned features that are specific to each object class. These features are computed from the feature map generated by MobileNet, using a setof convolutional layers that are specific to each class.

The final step of the MobileNet SSD architecture is non-maximum suppression (NMS). NMS is used to eliminate redundant bounding boxes that overlap significantly with each other. The algorithm works by selecting thebounding box with the highest confidence scoreand removing all other boxes that overlap significantly with it. Repeat this procedure until there are no longer any overlapping boxes.

B. Description of Centroid Tracker

The Centroid Tracker algorithm works by maintaining a list of the most recent centroids foreach tracked object and computing the Euclideandistance between the centroids of the new detections and the previous ones. Centroid Tracker is a simple object tracking algorithm that uses the centroid of objects to track their motion. It is particularly useful for tracking objects in real-time video streams. The algorithm works by associating centroids in subsequent frames of the video stream, using a distance-based matching algorithm.

In the context of object detection, the Centroid Tracker is typically used to track the motion of objects that have been detected by a detection algorithm such as MobileNet SSD.Once objects are detected in a video frame, the Centroid Tracker uses the coordinates of the bounding boxes around the objects to calculate their centroids. It then associates these centroidswith objects detected in subsequent frames, based on their distance from the centroids in theprevious frame. One of the advantages of the Centroid Tracker algorithm is that it is computationally efficient and can be implemented in real-time. It is also able to handle cases where objects move in and out of the frame, and where objects are partially occluded by other objects in the frame.

To use the Centroid Tracker, we first need to initialize it by creating an instance of the class and specifying the maximum number of frames to keep track of. Next, we iterate over each frameof the video stream, perform object detection oneach frame using MobileNet SSD, and pass the resulting bounding box coordinates to the Centroid Tracker. The Centroid Tracker then uses these coordinates to calculate the centroidsof the detected objects and track their motion over time. If the distance is below a certain threshold, the new detection is considered a match and associated with the correspondingtracked object. If the distance is above the threshold, a new object is created and tracked separately.

IV. RESULTS AND DISCUSSION

This section discusses the results given by MobileNet-SSD and Centroid-Tracking algorithms. The major input for this system is a video. This research project aimed to develop a people counter and location tracker to monitor crowds in real-time. The system was designed tobe used in video surveillance settings, particularly in security and traffic monitoring applications, where people counting and trackingare crucial components. The system's ability to detect objects in real-time video and track them using background subtraction has made it a popular choice for real-time security applications. The camera's range determines thearea that can be monitored, and organizations such as banking institutions, jewelry stores, andthe military can benefit greatly from this technology.
The process of detecting people in crowded scenarios involves feeding input images as shown in Fig 2 to our model. However, the inputimage needs to be preprocessed to meet therequirements of our model. The frame is resizedto a maximum width of 500 pixels and changed from BGR to RGB to accomplish this.

It will detect the people in the images and givebounding box coordinates to each of the detectedobject along with the confidence value.

These bounding box coordinates are then passed as input to our centroid-based object tracker, which uses them to derive the centroidsof the detected objects. By tracking these centroids, we can accurately count the number ofpeople in a crowded scenario. Overall, this process involves image preprocessing, object detection using a deep neural network, and objecttracking using centroid-based methods to accurately detect and count people in crowded scenarios. The CentroidTracker is a simple yet effective object tracking algorithm that uses the Euclidean distance between object centroids to track their motion across frames. The Algorithm maintains a set of previously detected object centroids and matches them with the current setof detected centroids using the Hungarianalgorithm, which finds the optimal assignment of object centroids based on minimizing the sumof their Euclidean distances.

The bounding box coordinates returned by theMobileNet SSD model are used to derive the object centroids, which are then tracked using the CentroidTracker module. The number of people in the room is determined by counting thenumber of unique object IDs that have been tracked over time as shown in the Fig.4 and Fig.5.

Once the object centroids have been matched,the CentroidTracker updates the position of the tracked objects and adds any newly detectedobjects to the set.

Conclusion

In conclusion, people counting and trackingsystems can provide valuable insights into howa business is operating, allowing for thedevelopment and implementation of strategies tooptimize performance. These systems can countand monitor visitors as they enter and exit aphysical space, providing separate reports foreach. This data is especially useful for businessleaders seeking to understand visitor behavior,including when they enter, how they movethrough the space, and when they exit.Additionally, people counting technology canshow where visitors are spending their time andwhich products are attracting the most attention.People counting systems in the retail sector can determine the precise number of visitors andmonitor the proportion of those visitors who make purchases. This information is essential for understanding how effective marketing campaigns are attracting visitors and converting them into customers. In shoppingmalls, people counting technology can helpexecutives understand which areas of the mallare the most attractive and how traffic changesover time. This information can be used to shapemarketing campaigns and plan ahead for peakperiods. People counting systems are also usefulin exhibitions and events, where they can helpmanagers understand the peak hours of the eventand optimize staff allocation accordingly.Additionally, people counting systems canprovide total visitor numbers for events, whichis essential for evaluating the success of the event and making plans for future events. Object detection and recognition are critical components of people counting and tracking systems. These technologies are essential forreal-time applications and can be used to detect any type of object in a physical space. By combining object detection and recognition withcentroid-based object tracking, people counting and tracking systems can accurately track the movements of people in a video and count the number of people entering and leaving in real- time. This technology can be applied to a varietyof scenarios, such as in retail stores, airports, and other public spaces where monitoring and tracking people\'s movements is necessary.

References

[1] D. Zhang, X. Du, and L. Zhang, “A Review on people counting and crowd density estimation”. [2] S. L. Choi et al, “Multi camera people counting with deep learning-based object detection” [3] K. Raja, S. S. S. Soumya,“A survey pf deep learning techniques for object detection and recognition”. [4] J. Guo, “pedestrian detection using recurrent neuralnetworks”.a [5] Joseph Redmon et al, “real time object detection with yolo”. [6] Afiq Harith Ahamad, Norliza Zaini, Mohd Fuad Abdul Latip, “Person Detection for Social Distancing and Safety Violation Alert based on Segmented ROI”, 10th IEEE International Conference on Control System, Computing andEngineering (ICCSCE), 2020 [7] D. Gozler, B. Isik and C. Topal, \"A Real-time Queue Tracking Method for Waiting Time Estimation,\" 2022 30th Signal Processing and Communications Applications Conference (SIU), 2022 [8] J. Pegoraro and M. Rossi, \"Real-Time People Tracking and Identification From Sparse mm- Wave Radar Point- Clouds,\" in IEEE Access, vol. 9,pp. 78504-78520, 2021. [9] Jia Wan, Qingzhong Wang, Antoni B. Chan, “Kernel-Based Density Map Generation for Dense Object Counting”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, Issue 3, 2022. [10] Kai Li, Chau Yuen, Salil S. Kanhere, Kun Hu, WeiZhang, Fan Jiang, Xiang Liu, “An Experimental Study for Tracking Crowd in Smart Cities”, IEEE Systems Journal, Volume 13, Issue 3, 2019 [11] M.I.H.Azhar, F. H. K. Zaman, N. M. Tahir and H. Hashim, \"People Tracking System Using DeepSort,” 2020 10th IEEE InternationalConference on Control System, Computing andEngineering (ICCSCE), 2020 [12] Shijie Sun, Naveed Akhtar, Huansheng Song, Chaoyang Zhang, Jianxin Li, Ajmal Mian, “Benchmark Data and Method for Real-Time People Counting in Cluttered Scenes Using Depth Sensors”, IEEE Transactions on Intelligent Transportation Systems, Volume: 20, Issue: 10, 2019 [13] Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang, Antoni B. Chan, “Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets”, IEEE Transactions on Image Processing (Volume: 30), 2020 [14] Xiaoheng Jiang, Li Zhang, Pei Lv, Yibo Guo, Ruijie Zhu, Yafei Li, Yanwei Pang; Xi Li, Bing Zhou, Mingliang Xu, “Learning Multi-Level Density Maps for Crowd Counting”, IEEE Transactions on Neural Networks and LearningSystems (Volume: 31, Issue: 8), 2019 [15] Yuren Zhou, Billy Pik Lik Lau, Zann Koh, Chau Yuen, Benny Kai Kiat Ng, “Understanding CrowdBehaviors in a Social Event by Passive WiFi Sensing and Data Mining”, IEEE Internet of Things Journal. Vol.7, Issue: 5, 2020.

Copyright

Copyright © 2023 O Obulesu, A. Hanshika, A. Lahari, P. Arunodaya, R. Ashwini. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET54182

Publish Date : 2023-06-18

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here