This paper introduces an automated system for gym exercise form detection, leveraging MediaPipe[1] for real-time pose estimation and OpenCV[2] for computer vision processing. The system analyzes key body landmarks during exercises like squats, deadlifts, and bicep curls, providing immediate feedback on form accuracy. By detecting incorrect postures, such as improper knee alignment or back curvature, the system aims to reduce the risk of injury and enhance workout effectiveness. The proposed approach is designed to be lightweight, accessible, and capable of running on consumer-grade hardware, making it practical for widespread use. Experimental results demonstrate high accuracy in detecting common form errors, showcasing the potential of this system as a cost-effective alternative to traditional personal training. This work contributes to the growing field of automated fitness monitoring and highlights the role of computer vision in improving exercise safety and performance.
Introduction
I. INTRODUCTION
Maintaining proper exercise form is essential for maximizing the benefits of a workout and reducing the risk of injury. Whether it's squats, deadlifts, or bicep curls, correct posture ensures that the targeted muscles are effectively engaged while preventing undue stress on joints and ligaments. However, achieving and maintaining proper form can be challenging, especially for beginners and those training without professional supervision. Traditionally, personal trainers have provided the necessary guidance and corrections, but this approach is often costly and not accessible to everyone. The increasing prevalence of digital fitness solutions has highlighted the need for automated systems that can monitor and correct exercise form in real-time, providing an affordable and scalable alternative to traditional training methods.
This paper presents an innovative approach to gym exercise form detection using a combination of MediaPipe, OpenCV, and machine learning techniques. MediaPipe is a powerful framework for real-time pose estimation, capable of detecting and tracking key body landmarks during physical activities. OpenCV, a widely used computer vision library, processes the landmark data to analyze posture and movement patterns. Together, these tools allow for the detection of common form errors such as improper knee alignment during squats, incorrect back curvature during deadlifts, and flawed arm positioning during bicep curls. The system provides immediate feedback, enabling users to correct their form on the spot, thereby improving workout effectiveness and reducing injury risk.
In addition to real-time pose estimation and computer vision processing, our system incorporates a machine learning model trained on a labeled dataset of workout exercise images obtained from Kaggle. This dataset, containing a wide range of exercise images, allowed us to develop and fine-tune a model capable of accurately classifying different exercises and identifying form errors. By integrating this dataset, our system achieves a higher level of precision in detecting both correct and incorrect exercise forms across various user profiles. The model leverages the power of deep learning to analyze subtle variations in posture, enhancing the system’s ability to provide detailed feedback.
One of the key advantages of our approach is its accessibility. Designed to run on consumer-grade hardware, the system does not require specialized equipment, making it practical for home users, gym enthusiasts, and personal trainers alike. This lightweight and efficient solution offers a cost-effective alternative to traditional personal training, democratizing access to high-quality exercise form guidance.
This paper details the design, implementation, and evaluation of the system. We discuss the integration of MediaPipe and OpenCV for pose estimation and vision processing, the use of the Kaggle dataset for model training, and the experimental results demonstrating the system's accuracy and effectiveness. Our findings suggest that this approach has significant potential to enhance workout safety and performance, offering a scalable solution to the challenge of maintaining proper exercise form.
II. METHODOLOGY
The methodology of this project involves the integration of computer vision, machine learning, and real-time pose estimation techniques to detect and analyze gym exercise form. The system is built using MediaPipe for pose estimation, OpenCV for image processing, and a machine learning model trained on a dataset of workout exercises obtained from Kaggle. This section outlines the key components, algorithms, and formulas used in the system.
A. Data Collection and Preprocessing
The dataset used for training the machine learning model was obtained from Kaggle[3]. It contains images of various workout exercises, including different poses and variations. The dataset was preprocessed to ensure consistency in image dimensions and quality. This preprocessing involved resizing the images, normalizing pixel values, and augmenting the dataset to create a more robust model capable of handling diverse input conditions.
The images were resized to a standard dimension (e.g., 224x224 pixels) and normalized by scaling the pixel values to a range of [0, 1]. Data augmentation techniques[4], such as rotation, flipping, and zooming, were applied to increase the diversity of the training data and prevent overfitting.
B. Pose Estimation with MediaPipe
MediaPipe was employed to detect and track key body landmarks in real-time. The pose estimation model generates 33 key points (landmarks) on the body, such as shoulders, elbows, hips, knees, and ankles. These landmarks are used to determine the position and orientation of different body parts during exercise.
For each frame of the video, MediaPipe provides the (x, y, z) coordinates of these key landmarks. These coordinates are normalized with respect to the dimensions of the input frame, enabling consistent analysis across varying camera perspectives.
Fig. 1 Landmark detection using the MediaPipe library.
C. Feature Extraction with OpenCV
Once the landmarks are detected, OpenCV processes these coordinates to extract features that are critical for evaluating exercise form. The system calculates angles between key joints to determine the correctness of the posture. For example, during a squat, the angle between the hip, knee, and ankle is calculated to assess whether the knees are properly aligned.
Angle Calculation Formula:
The angle between three points A (x1, y1), B (x2, y2), and C (x3, y3) can be calculated using the following formula:
Fig. 2[5] Angle between two Vectors Formula.
Where:
- AB is the vector from point A to B
- BC is the vector from point B to C
- Θ is the angle at point B
D. Error Detection Algorithm
The system applies threshold-based rules to these calculated angles to determine if the form is correct. For example, if the angle at the knee during a squat exceeds or falls short of a predefined range (e.g., 70-90 degrees), the system flags it as incorrect form.
In addition to angle-based checks, distances between key points, such as the vertical distance between the knee and the foot during a deadlift, are also monitored. Form errors are identified based on deviations from the optimal ranges.
E. Machine Learning Model
A convolutional neural network (CNN) was trained using the preprocessed Kaggle dataset to classify different exercises and detect form errors. The CNN architecture consists of multiple convolutional layers, followed by pooling layers and fully connected layers. The model outputs class probabilities indicating the type of exercise being performed and whether the form is correct.
The loss function used for training is categorical cross-entropy, defined as:
Fig. 3[6] Cross-Entropy Loss Formula.
Where:
- yi is the true label
- (yi)hat is the predicted probability for class i
- N is the number of classes
Fig. 4 Validation and Training Loss over Epochs
F. Real-Time Feedback
During live video analysis, the system processes each frame, applies the trained model, and calculates joint angles using the formulas mentioned. Based on the classification and calculated angles, the system provides real-time feedback on whether the form is correct or needs adjustment.
G. System Implementation
The entire system was implemented using Python, with MediaPipe and OpenCV as the core libraries. The model was trained using TensorFlow and Keras. The system is optimized for real-time performance and can run on consumer-grade hardware.
III. RESULT
The system demonstrated high accuracy in detecting correct and incorrect exercise forms across multiple workout types, including squats, deadlifts, and bicep curls. The machine learning model, trained on the Kaggle workout exercise dataset, achieved a classification accuracy of 92% in identifying proper form. The real-time feedback provided by the system effectively highlighted common errors, such as improper knee alignment and incorrect back posture. Additionally, the system was able to run efficiently on consumer-grade hardware, making it accessible and practical for home use. These results validate the system’s potential as a cost-effective alternative to personal training.
Fig. 5 Validation and Training Accuracy over Epochs
Conclusion
In conclusion, this paper presents a comprehensive system for gym exercise form detection that leverages the strengths of MediaPipe for real-time pose estimation, OpenCV for feature extraction, and a machine learning model trained on a diverse workout exercise dataset from Kaggle. The system successfully identifies and corrects common exercise form errors, providing users with immediate feedback to enhance their workout safety and effectiveness. The combination of angle-based analysis and deep learning classification enables accurate detection across various exercises, while its lightweight design ensures that it can run efficiently on consumer-grade hardware. This solution offers a practical and cost-effective alternative to traditional personal training, making high-quality form guidance accessible to a broader audience. Future work could involve expanding the system’s capabilities to support a wider range of exercises and improving the feedback mechanism to offer more personalized and adaptive recommendations based on user progress.
References
[1] Google. (n.d.). MediaPipe (Version x.x.x) [Software]. Python Package Index (PyPI). https://pypi.org/project/mediapipe/
[2] OpenCV.org. (n.d.). OpenCV. https://opencv.org/
[3] Abdillah, H. (n.d.). Workout exercises images [Data set]. Kaggle. Retrieved August 24, 2024, from
https://www.kaggle.com/datasets/hasyimabdillah/workoutexercises-images/data
[4] van Dyk, D. A., & Meng, X.-L. (2012). The art of data augmentation. *Journal of Computational and Graphical Statistics*, *10*(1), 1-50. https://doi.org/10.1198/10618600152418584
[5] Coumans, E. (2017). Physics simulation workflow [Figure]. In *Game Physics Cookbook*. Packt Publishing. https://www.oreilly.com/library/view/game-physics-cookbook/9781787123663/ch01s08.html
[6] PyTorch Forums. (n.d.). CrossEntropyLoss getting value 1. In *PyTorch Forums*. Retrieved August 24, 2024, from
https://discuss.pytorch.org/t/crossentropyloss-getting-value-1/188115