347. Point Pillar: 3D Object Detection from Point Clouds

Abstract

Point Pillar is an architecture proposed for 3D object detection using point clouds as inputs.

Architecture

The architecture consists of mainly 3 elements.

  1. Pillar Feature Net
  2. BackBone
  3. Detection Head

1. Pillar Feature Net

This phase takes the following steps.

  1. Divide point clouds into grids in the X-Y coordinates which creates a set of pillars.
  2. Most pillars will be sparse, so the network then creates a dense tensor by only including non-empty pillars.
  3. By using the dense tensor as input, the paper uses a simple version of PointNet to output a [C, P, N] shaped tensor and encodes the feature to a [C, P] shaped tensor.
  4. Scatter back the encoded feature to the original pillar location.

2. BackBone

This paper uses a 2D Conv backbone consisting of 2 subnetworks.

  1. A Top-down network that produces features at increasingly small operations.
  2. A Network that performs upsampling and concatenation of the top-down features.

3. Detection Head

Uses the Single Shot Detector setup to perform the 3D object detection.

Reference

Point Pillar