Abnormal driving behaviour detection helps to ensure safety of driver and passenger. Recent studies have concluded that talking on phone while driving distracts driver attention upto 20%, which leads to accidents. Deep learning models can be used to find these distracted actions. In this system the abnormal behaviour of the driver like reaching behind, hair and make-up, drinking, texting etc. are detected through deep learning. Densely connected Convolutional Neural Networks, Residual Networks are used for detection. AWGRD model, the most sophisticated model of ResNet, is formed by superpositions of previous layers, and is used to detect the driver behaviour. The input to the proposed system will be a video file. The video input can be a real time live video or it can be uploaded. In output the system detects the activity or behaviour of the driver and warns them.
Introduction
I. INTRODUCTION
For the time being, detection of a distracted driver is becoming increasingly popular because it is extremely useful for ensuring the safety of both the driver and the passengers in the vehicle.
Researchers found in a recent practice that making use of a phone during driving- either by making a call or by sending a text can cause a driver to lose a quarter of their attention, making them twenty three times more likely to get into a fatal automobile accident than normal attentive drivers. According to research, distracted driving is responsible for nearly 3,000 accidents each year on the road. Only about 8% of them end in death.
A driver may be tipsy, or may be on a call, conversing with a traveler, working the radio, doing hair and cosmetics, messaging on phone and so on. They could die from a small mistake. The significant objective is to ensure that the driver is cautioned by the framework at whatever point he/she is showing occupied conduct. An alarm will sound to notify the driver from the system[1].
As a result, our goal for this paper is to define the means to detect the aforementioned abnormal behaviors.
II. OBJECTIVES
The following are the paper's goals:
To group the actions in the video file that was uploaded.
To design a user interface.
To classify the actions in real-time live video based on how people act while distracted.
To incorporate project modules.
III. LITERATURE SURVEY
High-resolution videos are now present in an increasing number of visual applications, something that is widely acknowledged. For example, in video reconnaissance[2], numerous high-goal cameras are important to be put at various areas. In order to identify[3], and re-identify, and then track the target which is moving, they collaborate, making it easier to conduct subsequent high-level analyses of the target which is moving (such as potential intention or even behavior)[4]. In profound calculation, high-goal cameras should be used to catch both self-evident and fine changes of feelings of the objective individual progressively which has huge effects in security these days. It is not difficult to see from the above portrayals that, procuring and putting away an enormous volume of high-goal recordings are frequently quite easy to be acknowledged for the present[5]. The main problem, though, is how to use those large volumes of low- level video clips to make accurate high-level decisions in an efficient and effective manner. High-quality videos of drivers taken while inside of vehicles are considered in this study. Identifying drivers' abnormal driving behavior—also known as patterns—is the central decision here. There are some research studies conducted already. The first study, for example, is based on the identification of anatomical waves of human beings such as, electrooculogram, electro-encephalogram, changes in respiratory actions, changes in blood pressure flow, etc, utilizing a variety of sensors[6].
The next one depends on the subtleties of the face.The third is based on the movement characteristics of the steering wheel and can detect the steering time, braking behavior, driver's hand pressure and other factors. It is important to note that the detection of human anatomical signals is very accurate and real-time, which is good, its main advantage, which affects the normal driving of drivers, cannot be ignored. Besides, physiological signs of people change extraordinarily because of the physiological contrast in every unique individual and her/his ecological circumstances. In this manner, giving quantitative and objective principles to distinguishing people's physiological signs too is testing.
IV. PROPOSED SYSTEM
A. Methodology
The working of the project can be understood and analysed through the following steps:
Step-1: Generate and load AWGRD Model.
Step-2: Upload a recorded MP4 Video file as input
Step-3: Access the input video stream using the VideoCapture class of OpenCV.
Step-4: Loop over the number of required sample frames and read them.
Step-5: Start behaviour monitoring.
Step-6: For real time live camera detection click on the Live Camera Button.
Step-7: Predict and classify the actions.
B. Architecture
Figure 1 shows the architecture of proposed system. The input to the system is uploaded recorded video or live camera input. Firstly frames are extracted from the input video file and then feature learning is performed on the frames. In case if the input is recorded video then AWGRD Model of Densenet is used and if it is live camera input then Resnet model is used. After this driver’s distracted behavior is monitored and an alarm is provided in case he is displaying activities that can lead to fatal accidents.
C. Algorithm AWGRD Densenet
Densenet, a fundamental deep learning model has inspired three very novel deep learning-based models. In general, this vast DenseNet family includes a number of well- established models, including Highway Network, GoogleNet, and others. When we compare Resnet and Densenet with each other it has been found that the most efficient one is Densenet[8]. An explanation is that the ResNet just adds the results of the two contiguous layers while the DenseNe model needs to append the current layer to all of its previous layers. For example, when there are many layers, ResNet tends to have associations that are direct in nature. One of the major advantages of DenseNet is that, it deals with the problem of vanishing gradient efficiently[9],[10].
Resnet
ResNet or in other words Residual networks is being recognized these days as one among the most powerful learning-based models currently[11],[12].Residual networks has gained a lot of success because they efficiently solves the well-known issue of vanishing gradients, which can be found in a lot of very deep models. The basic idea is including an identity mapping parallelly in the basic network, which is useful for creating a learning structure which is residual in nature. Based on a possible nonlinear mapping such as a stacked network which is nonlinear, it can technically be constructed to represent a residual mapping. Optimizing non linear mapping is difficult whereas optimizing residual mapping is easy. This makes Residual networks useful and important[13].
V. IMPLEMENTATION
A. Dataset
State Farm Distracted Driver Dataset
The dataset for distracted driving detection that is made publicly available is State Farm Distracted Driver Dataset. There were 44 distinct drivers in the sample. 29 of the 44 drivers were men and 15 were women. Some drivers took part in many recording sessions with various driving conditions and times of day. There are 102150 photos in total[15]. 79726 of these are training photos, and 22424 are validation images.640 x 480 pixels is the size of each photo. Each photo falls into one of the ten categories listed below.
The 10 classes are:
Class 0: Safe Drive
Class 1: Texting on phone - Right hand
Class 2: Talking on phone - Right hand
Class 3: Texting on phone- Left hand
Class 4: Talking on phone - Left Hand
Class 5: Radio operating
Class 6: Drinking while driving
Class 7: Turning Behind
Class 8: Makeup and Hair
Class 9: Communicating with a passenger.
2. Kinetics Human Action Dataset
To implement live video, kinetics human-action dataset from Kaggle was utilised. The Kinetics dataset is used to detect human actions. Around 500,000 video clips totaling 400 human activities make up the dataset, with at least 400 clips for every class. The length of each video is 10 seconds and has one class labeled to it. The videos have human to object interactions like playing musical tools and also communication between humans like handshakes. The Kinetics dataset's major goal is to transform into ImageNet's counterpart of video data.
Conclusion
The study of video recording-based detection of abnormal driving behavior has attracted the attention of many researchers and has become more important today than ever, due to the increasing count of accidents occurring worldwide. This is an automated and reliable system to ensure safety of the drivers and passengers, thus reducing the risks of accidents related deaths. It is receiving so much popularity these days as it has become a very crucial step to realise fully automated driving. The deep learning-based fusion model is used in the proposed method for the purpose of abnormal driving detection. The proposed model is very recently developed DenseNet and Resnet. The model is efficient because in the case of AWGRD Densenet model the previous layers are added. This method is very beneficial in the abnormal driving behaviour detection task in a video-based format, as spatial and temporary hidden information can be comprehensively described by superpositions of previous layers. The AWGRD model classified the driver actions with the accuracy 85%. For executing instantaneous live camera detection of distracted driving, Residual networks (Resnet) has been used. In the future, efficient and effective deep learning models can be realised via mobile chips to investigate abnormal driving behaviour detection. Provision of email notification, full body behaviour monitoring can also be implemented.
References
[1] W. Cao, J. Yuan, Z. He, Z. Zhang, and Z. He, ‘‘Fast deep neural networks with knowledge guided training and predicted regions of interests for real time video object detection,’’ IEEE Access, vol. 6, pp. 8990–8999, 2021.
[2] H. Shuai, Q. Liu, K. Zhang, J. Yang, and J. Deng, ‘‘Cascaded regional spatio-temporal feature-routing networks for video object detection,’’ IEEE Access, vol. 6, pp. 3096–3106, 2021.
[3] S. Masood, A. Rai, A. Aggarwal, M. N. Doja, and M. Ahmad, “Detecting distraction of drivers using convolutional neural network,” Pattern Recognit. Lett., vol. 139, p. 79, Nov. 2020.
[4] K. R. Dhakate and R. Dash, “Distracted driver detection using stacking ensemble,” in Proc. IEEE Int. Students Conf. Elect., Electron. Computer. Sci. (SCEECS), Feb. 2020.
[5] A. Nanda, P. K. Sa, S. K. Choudhury, S. Bakshi, and B. Majhi, ‘‘A neuromorphic person re-identification framework for video surveillance,’’ IEEE Access, vol. 5, pp. 6471–6482, 2019.
[6] L. Sun, Z. Jiang, H. Song, Q. Lu, and A. Men, ‘‘Semi- coupled dictionary learning with relaxation label space transformation for video-based person re-identification,’’ IEEE Access, vol. 6, pp. 12587–12597, 2018.
[7] Y. Wu, Y. Sui, and G. Wang, ‘‘Vision-based real-time aerial object localization and tracking for UAV sensing system,’’ IEEE Access, vol. 5, pp. 23969–23978, 2018.
[8] S.-H. Lee, M.-Y. Kim, and S.-H. Bae, ‘‘Learning discriminative appearance models for online multi-object tracking with appearance discriminability measures,’’ IEEE Access, vol. 6, pp. 67316–67328, 2018.
[9] M. S. Hossain and G. Muhammad, ‘‘An emotion recognition system for mobile applications,’’ IEEE Access, vol. 5, pp. 2281–2287, 2017.
[10] Z. Pan, X. Yi, and L. Chen, ‘‘Motion and disparity vectors early determination for texture video in 3D-HEVC,’’ Multimedia Tools Appl., to be published. doi: 10.1007/s11042-018-6830-7, 2017
[11] J. Wang, Z. Zhang, B. Li, S. Lee, and R. S. Sherratt, ‘‘An enhanced fall detection system for elderly person monitoring using consumer home networks,’’ IEEE Trans. Consum. Electron., vol. 60, no. 1, pp. 23–29, Feb. 2017.
[12] R. P. A. S. Murtadha D Hssayeni, Sagar Saxena. Distracted driver detection: Deep learning vs handcrafted features. IS&T International Symposium on Electronic Imaging, pages 20–26, 2017
[13] Y. Abouelnaga, H. M. Eraqi, and M. N. Moustafa. Real-time distracted driver posture classification. CoRR, abs/1706.09498, 2017