Human pose estimation localizes body key points to accurately recognizing the postures of individuals given an image. This step is a crucial prerequisite to multiple tasks of computer vision which include human action recognition, human tracking, human-computer interaction, gaming, sign languages, and video surveillance.
Fitness exercises are very beneficial to personal health and fitness; however, they can also be ineffective and potentially dangerous if performed incorrectly by the user. Exercise mistakes are made when the user does not use the proper form, or pose. In our work, we introduce Pose Trainer, an application that detects the user’s exercise pose and provides personalized, detailed recommendations on how the user can improve their form. Pose Trainer uses the state of the art in pose estimation to detect a user’s pose, then evaluates the vector geometry of the pose through an exercise to provide useful feedback. We record a dataset of over 100 exercise videos of correct and incorrect form, based on personal training guidelines, and build geometric heuristic and machine learning algorithms for evaluation. Pose Trainer works on four common exercises and supports any Windows or Linux computer with a GPU.
Introduction
I. INTRODUCTION
Fitness is a trend today. Every year the revenue of the fitness industry grows by 8.7%, according to the Wellness Creatives report, and fitness apps have not spared this field.
There are many cases of how technologies might help to improve your body - from tracking exercise activity to adjusting nutrition. The question is how much better can such apps help improve the performance of physical exercises compared to human coaches?
Artificial Intelligence (Al, a broad name for a group of advanced methods, tools, and algorithms for automatic execution of various tasks) has invaded practically all functional areas of business over the years. Pose estimation is among the most popular solutions that AI has to offer; it is used to determine the position and orientation of the human body given an image containing a person. Unsurprisingly, such a useful tool has found many use cases, for instance it can be used in avatar animation for Artificial Reality, marker less Motion Capture, worker pose analysis, and many more.
With the arrival of human pose estimation technology, the fitness technology market has been filling up with AI-based personal trainer apps. Being powered by computer vision, human pose estimation and natural language processing algorithms, these technologies lead end-users through a number of workouts and give real-time feedback.
Exercises such as squats, deadlifts, and shoulder presses are bene?cial to health and ?tness, but they can also be very dangerous if performed incorrectly. The heavy weights in-volved in these workouts can cause severe injuries to the muscles or ligaments. Many people work out and perform these exercises regularly but do not maintain the proper form (pose). This could be due to a lack of formal training through classes or a personal trainer, or could also be due to muscle fatigue or using too much weight. For our course project, we seek to aid people in performing the correct posture for exercises by building Pose Trainer, a software application that detects the user’s exercise pose and provides useful feedback on the user’s form, using a combination of the latest advances in pose estimation and machine learning. Our goal for Pose Trainer is to help prevent injuries and improve the quality of people’s workouts with just a computer and a webcam. The ?rst step of Pose Trainer uses human pose estimation, a dif?cult but highly applicable domain of computer vision.
II. LITERATURE REVIEW
Alexander Toshev Christian Szegedy, from google had published this paper. In this approach, pose estimation is formulated as a CNN-based regression problem towards body joints. The model consisted of an Alex Net backend (7 layers). The model is trained using L2, loss for regression. This approach reasons about pose in a holistic fashion, i.e., even if certain joints are hidden, they can be estimated if the pose is reasoned about holistically. The paper argues that CNNs naturally provide this sort of reasoning and demonstrate strong results.
Joao Carreira, Pulkit Agrawal, Katerina Frigidaria, Jitendra Malik. This is an interesting paper that follows method that predicts the estimates and then correct them iteratively. Instead of directly predicting the outputs in one go, they use a self-correcting model that progressively changes an initial solution by feeding back error predictions, and this process is called Iterative Error Feedback (IEF). It employs a standard Conv Net architecture pre-trained on ImageNet: the very deep google net.
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Christoph Bregler New York University. This approach generates heatmaps by running an image through multiple resolution banks in parallel to simultaneously capture features at a variety of scales. The output is a discrete heatmap instead of continuous regression. A heatmap predicts the probability of the joint occurring at each pixel. A multi-resolution CNN architecture (coarse heatmap model) is used to implement a sliding window detector to produce a course heatmap output.
Mihai Fieraru Anna Khoreva Leonid Pishchulin Bernt Schiele in their paper “Learning to Refine Human Pose Estimation” have given insights on refining the human pose estimation where It takes the help of RGB image and body pose estimate as input to render out the suitable output and Exploiting the dependencies between the image and the predicted body pose makes it easier for the model to identify the errors in the initial estimate and how to refine them along with methods to achieve state-of-the-art results on the challenging MPII Human Pose and Pose Track benchmarks.
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh in their paper “Open-Pose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields” suggested the usage of 2D pose of multiple people in an image where “OPENPOSE” 2D human pose estimation as an input for their systems, where they used COCO human pose datasets for training data sets and getting the desired output.
Simen Thys, Wiebe Van Ranst,Toon Goedem's in their paper “Fooling automated surveillance cameras: adversarial patches to attack person detection” Object detection of this paper they target the popular YOLOv2 object detector, An example of real-world adversarial attack. Sharif et al. demonstrate the use of printed eyeglasses that can be used to fool facial recognition systems for making the patches more robust. Thus, vulnerability of certain neural networks has been exposed.
Manisha Patel*, Dr Nilesh Kalani, basic understanding and elementary concept of CNN, deep learning approaches for human pose estimation. Open pose is a real-time multi-person system to detect the human body key points on a single image. CNN based regression problem towards body joints in a holistic fashion. To resolve the problem of vanishing gradient Intermediate supervision was applied at every stage in stacked hourglass
Bruno Artacho Andreas Savakis Rochester Institute of Technology Rochester, NY. The UniPose pipeline utilizes the WASP module that features a waterfall flow with a cascade of atrous convolutions and multi-scale representations. The large FOV of WASP obtains a better interpretation of the contextual information in the frame, and contributes to more accurate pose estimation. III. METHODOLOGY: DEEP LEARNING EQUIPPED FITNESS TRAINER
In general, our project aims to give a personal fitness training experience. It is powered with computer vision that captures the real time video feed of the person and processes and analyses it using deep learning techniques to estimate the body posture of the person and provides suitable and corrective feedback to the user to perform exercises efficiently. We achieve this by using estimating the human pose and then marking the landmarks on it. Once we have the landmarks, we then form vectors in order to analyze the posture of the person. We employ simple geometric heuristic approach in order to evaluate each exercise. We currently provide exercise module for bicep curl, front raise, squats, shoulder shrugs, lateral raise.
For the pose estimation component, we utilize a pre-trained real-time system, called Blaze Pose, that can detect human body key points in videos. (More detail on BlazePose and our reasoning for choosing this system can be found in Technical Approach.) This model is functional out of the box, and thus is very simple to install for users of our application. Using BlazePose allows us to take advantage of the state of the art in pose estimation algorithms for our task, and lets us focus on the actual evaluation of exercise posture. For the posture evaluation (pose training) component, we have recorded videos of ourselves performing exercises. Our videos include our best effort to correctly perform the exercise, as well as intentionally incorrect examples. The evaluation of our posture identifier is dependent on the performance of the pose estimator. We work under the assumption that the pose estimator is accurate a majority of the time, with small measurement deviances due to noise, which we correct for. We evaluate our posture identifier in different ways depending on the algorithm: for heuristic. algorithms, we feed in all videos for evaluation, while for machine learning algorithms, we evaluate by splitting our video dataset into train and test sets, and report results on the test set.
To design the basic Curl counter we have to select three points from the basic 33 key-points that are responsible for the movement of the body.
The main process to recognise the curl counter is to check the angle shift within these three points and a change of these angles from 180 degree to less than 45 degree and back from 45 degree to 180 degree will count as one curl.
There are two positions considered in this design:
UP State - 180 degree to 45 degree
DOWN State - 45 degree to 180 degrees
We select trigonometric function - tan for this recognition as it takes inputs opposite by adjacent. But during the process of using tan, we’ll have drawbacks when the opposite and hypotenuse is one by one. When the sin is positive and cos is zero, it becomes undefined.
IV. RESULTS
Successfully implemented detection of human pose landmarks using media-pipe.
Able to extract the pose landmarks to cartesian coordinates for analysis.
Implemented a simple yet effective mathematical technique to find the angles between three joints.
Applied simple heuristic approach to figure out a way to recognise the states of exercises.
Integrated a simple User Interface to display exercise count and state information
Conclusion
A. Successfully implemented a full-fledged basic gym trainer that can detect, analyse the human position.
B. Implemented a way to recognise the exercise and to count the repetitions in a more efficient way.
C. Further extended the work to implement even more advanced and capable techniques.
References
[1] Alexander Toshev Christian Szegedy Google DeepPose: Human Pose Estimation via Deep Neural Networks (CVPR’14).
[2] Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Christoph Bregler New York University Efficient Object Localization Using Convolutional Networks (CVPR’15)
[3] Joao Carreira,Pulkit Agrawal ,Katerina Fragkiadaki, Jitendra Malik.Human Pose Estimation with Iterative Error Feedback.
[4] Mihai Fieraru ,Anna Khoreva Leonid ,Pishchulin Bernt Schiele. ‘Learning to Refine Human Pose Estimation’, Max Planck Institute for Informatics,Saarland Informatics Campus,Saarbrücken, Germany.
[5] Zhe Cao, Student Member, IEEE, Gines Hidalgo, Student Member, IEEE,Tomas Simon, Shih-En Wei, and Yaser Sheikh. ‘OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields’.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE.
[6] Simen Thys,Wiebe Van Ranst,Toon Goedem´e. ‘Fooling automated surveillance cameras:adversarial patches to attack person detection’.EAVISE, Technology Campus De Nayer, KU Leuven, Belgium.
[7] Zhao Liu, Jianke Zhu, Jiajun Bu, Chun Chen. A survey of human pose estimation: The body parts parsing based Methods - Elsevier publications.
[8] Naimat Ullah Khan, Wanggen Wan - A Review of Human Pose Estimation from Single Image .
[9] Valentin Bazarevsky Ivan Grishchenko Karthik Raveendran Tyler Zhu