184. Pyramid Vision Transformers
Background When using Traditional CNN-backboned architecture models, due to the convolutional filter’s weights being fully fixed after training, they suffered to adapt to different inputs dynamically. Vision Transformers attempted to remove the convolution from the backbone, but since it is…