346. CaDDN - My Blog

Depth Estimation

The main challenges in monocular 3D object detection lie in accurately predicting object depth. CaDDN(Categorical Depth Distribution Network) uses a predicted categorical depth distribution for each pixel to project appropriate depth in 3D space.

Approaches

There are several approaches to predicting the object’s depth.

The TOO BIG Approach
Projects each pixel to all possible depths. Since it is projecting the same depth to all of the 3d space, it makes this approach difficult to locate the accurate position of the objects within the 3d projection.
The TOO SMALL Approach
Projects depth to a single location within the 3d space. This suffers when the depth estimation is poor.

Considering the above, the paper proposes a JUST RIGHT approach which weighs the projection of the estimated depth probabilities within the 3d projection.

Architecture

The network takes the following steps.

Frustum Feat. Network
Predicts the Image Features and the Depth Distributions using convolutional networks. Then combines both outputs to create a Frustum Feature.
Voxel Trans
Frustum Features are sampled using trilinear interpolation to populate the Voxel Feature.
Voxel Collapse
The voxel is collapsed into a Birds Eye View Feature to be used as input for the 3D Object Detectors.
3D Object Detectors
Get the final output boxes.

Reference

Categorical Depth Distribution Network for Monocular 3D Object Detection

Depth Estimation

Approaches

Architecture

Reference

Related Posts

398. Findings Report

359. Future Frame Prediction For Anomaly Detection

358. Different Settings of Transfer Learning