I’ve learned about S3D(Separable 3D CNN) today so I like to share it here.
S3D helps solve three challenges for video analysis.
- How to understand spatial information. (Recognizing the appearance of an object)
- How to understand temporal information. (Such as context and causation through time)
- The trade-off between model complexity and inference speed.
Before this, 3DCNN was thought of as a promising way to tackle the challenges above, but it was prone to overfitting and it became complex and expensive.
By using a SEPARABLE CONVOLUTION in the earlier layers, S3D significantly improves performance compared with previous state-of-the-art 3D video classification models.