160. DeepPose

DeepPose

DeepPose is a research done by Google for human pose estimation.

Pose Vector

First, the paper encodes all “k” body joints into a pose vector. To avoid using absolute coordinates for the body joints like right now, the paper normalizes each joint’s coordinate with reference to a bounding box including the whole person or part of it, and uses the normalized coordinates paired with the corresponding image data for training.

Cascade Regressor

Since the input size is fixed, the network has limited capacity to look at the details of the image. So, the paper cascades the pose regressor to solve this problem.

In the initial stage, the model will output the joints as a pose vector. Then, in the 2nd stage, it will use the bounding box information outputted from the previous model and use that as the input for the current stage. This helps narrow down the area the model will be able to learn. By repeating this process the precision of the joints gradually increases.