Abstract
Point Pillar is an architecture proposed for 3D object detection using point clouds as inputs.
Architecture
The architecture consists of mainly 3 elements.
- Pillar Feature Net
- BackBone
- Detection Head
1. Pillar Feature Net
This phase takes the following steps.
- Divide point clouds into grids in the X-Y coordinates which creates a set of pillars.
- Most pillars will be sparse, so the network then creates a dense tensor by only including non-empty pillars.
- By using the dense tensor as input, the paper uses a simple version of PointNet to output a [C, P, N] shaped tensor and encodes the feature to a [C, P] shaped tensor.
- Scatter back the encoded feature to the original pillar location.
2. BackBone
This paper uses a 2D Conv backbone consisting of 2 subnetworks.
- A Top-down network that produces features at increasingly small operations.
- A Network that performs upsampling and concatenation of the top-down features.
3. Detection Head
Uses the Single Shot Detector setup to perform the 3D object detection.