A 2019 Guide to Human Pose Estimation

By Derrick Mwiti, Data Scientist on August 28, 2019 in AI, Computer Vision, Image Recognition, Video recognition

Human pose estimation refers to the process of inferring poses in an image. Essentially, it entails predicting the positions of a person’s joints in an image or video. This problem is also sometimes referred to as the localization of human joints. It’s also important to note that pose estimation has various sub-tasks such as single pose estimation, estimating poses in an image with many people, estimating poses in crowded places, and estimating poses in videos.

Pose estimation can be performed in either 3D or 2D. Some of the applications of human pose estimation include:

Some of the approaches used in the papers we’ll highlight are bottom-up and top-down. Essentially, in a bottom-up approach, the processing is done from high to low resolutions, while in top-down processing is done from low to high resolutions.

The top-down approach starts by identifying and localizing individual person instances using a bounding box object detector. This is then followed by estimating the pose of a single person. The bottom-up approach starts by localizing identity-free semantic entities, then grouping them into person instances.

We’ll now look at some research that’s been conducted in an attempt to solve the problem of human pose estimation:

DeepPose: Human Pose Estimation via Deep Neural Networks (CVPR, 2014)

This paper proposes using deep neural networks(DNNs) to tackle this ML task. The authors of this paper are Alexander Toshev and Christian Szegedy from Google. The formulation of the pose estimation itself is a DNN-based regression on the joints. The authors achieve state of the art results on standard benchmarks such as the MPII, LSP, and FLIC datasets. They also analyze the effects of jointly training a multi-staged architecture with repeated intermediate supervision.

DeepPose: Human Pose Estimation via Deep Neural Networks
We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated…

The DNN is able to capture the content of all the joints and doesn’t require the use of graphical models. As seen below, the network is made up of seven layers. A pooling layer, a convolution layer, and a fully-connected layer form part of these layers.

The convolution layer and fully-connected layer are the only layers that have learnable parameters. They both contain linear transformations followed by a rectified linear unit. The network takes an input image of size 220 × 220 and the learning rate is set to 0.0005. The dropout regularization for the fully-connected layers is set to 0.6. Some of the datasets used in this model are Frames Labeled In Cinema (FLIC) and Leeds Sports Dataset.

A 2019 Guide to Human Pose Estimation

DeepPose: Human Pose Estimation via Deep Neural Networks (CVPR, 2014)

Efficient Object Localization Using Convolutional Networks (2015)

Human Pose Estimation with Iterative Error Feedback (2016)

Stacked Hourglass Networks for Human Pose Estimation (2016)

Convolutional Pose Machines (2016)

DeepCut: Joint Subset Partition and Labeling for Multi-Person Pose Estimation (CVPR 2016)

Simple Baselines for Human Pose Estimation and Tracking (EECV, 2018)

RMPE: Regional Multi-Person Pose Estimation (2018)

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (2019)

Human Pose Estimation for Real-World Crowded Scenarios (AVSS, 2019)

DensePose: Dense Human Pose Estimation In The Wild (2018)

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model (2018)

Conclusion

More On This Topic

Latest Posts

Top Posts