The proposed project aims to detect alcohol consumption from real time video footage using machine learning and show the message \"Alcohol consumption is injurious to health\".
Introduction
I. INTRODUCTION
Alcohol consumption is a significant public health concern worldwide, leading to various social, economic, and health-related issues. Detecting instances of alcohol consumption from video content can aid in enforcing regulations, promoting responsible drinking behaviour, and raising awareness about the dangers of alcohol abuse. In this report, we discuss the application of machine learning techniques for the automated detection of alcohol consumption from video data.
II. OBJECTIVE
The primary objective of this project is to develop a machine learning model capable of identifying instances of alcohol consumption from video content accurately. By leveraging computer vision and machine learning algorithms, we aim to create a robust system capable of detecting subtle cues indicating alcohol use in diverse video contexts.
III. METHODOLOGY
First let's see how our model will work:
To make this system work, the first thing we require is a vast amount of data. To gather the dataset we aim to use various scenes from movies, cctvs, google depicting scenes where any person is consuming alcohol . Now after getting the dataset we are going to use YOLO(You Only Look Once), a popular object detection model.
The YOLO algorithm takes an image as input and then uses a simple deep convolutional neural network to detect objects in the image. YOLO divides an input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the predicted box is. YOLO predicts multiple bounding boxes per grid cell. At training time, we only want one bounding box predictor to be responsible for each object. YOLO assigns one predictor to be “responsible” for predicting an object based on which prediction has the highest current IOU with the ground truth. This leads to specialization between the bounding box predictors. Each predictor gets better at forecasting certain sizes, aspect ratios, or classes of objects, improving the overall recall score.
One key technique used in the YOLO models is non-maximum suppression (NMS). NMS is a post-processing step that is used to improve the accuracy and efficiency of object detection. In object detection, it is common for multiple bounding boxes to be generated for a single object in an image. These bounding boxes may overlap or be located at different positions, but they all represent the same object. NMS is used to identify and remove redundant or incorrect bounding boxes and to output a single bounding box for each object in the image.[1]
Moreover we are gonna use OpenCV, which is a open-sourced library for computer vision, containing almost every possible image-processing algorithm. Object Detection is a computer technology related to computer vision, image processing, and deep learning that deals with detecting instances of objects in images and videos. We will do object detection in this project using something known as haar cascades. Haar Cascade classifiers are an effective way for object detection. This method was proposed by Paul Viola and Michael Jones in their paper Rapid Object Detection using a Boosted Cascade of Simple Features. Haar Cascade is a machine learning-based approach where a lot of positive and negative images are used to train the classifier. Positive images – These images contain the images that we want our classifier to identify. Negative Images – Images of everything else, which do not contain the object we want to detect.[2]
We also use Convolutional Neural Network (CNN) which is a deep learning algorithm specially suited for image recognition.
Then we train the selected model using the preprocessed video data and corresponding labels indicating the presence or absence of alcohol consumption. We emply techniques such as data augmentation and transfer learning to improve the model's performance and generalization capabilities. After that we fine-tune the model parameters and architecture based on the evaluation results to optimize its performance. We then deploy the trained model into a real-world application, where it can automatically analyze video content in real-time and flag instances of alcohol consumption for further review or action.
IV. FEATURES
Real-Time Detection: The system is designed to identify alcohol consumption scenes in videos as they occur, ensuring immediate detection and display of warnings or educational content, promoting timely viewer awareness.
Adaptability to Varied Environments: The algorithm is versatile, capable of accurately recognizing alcohol consumption scenes across diverse video settings, accommodating different contexts, lighting conditions, and scenarios related to alcohol portrayal in media. This adaptability enhances the project's effectiveness across a wide range of video content.
Conclusion
The proposed learning model achieves in detecting instances of alcohol consumption from real life data. The model demonstrated high accuracy and robustness across different video scenarios and environmental conditions. By accurately identifying alcohol consumption cues, the system can assist in monitoring and enforcing regulations related to alcohol use in various settings.