There are mainly 2 types of architectures in computer vision.
- Pyramid Architecture
The size/shape of the element is reduced: Ex. Traditional Convolution Networks - Isotropic Architecture
Have equal size and shape for all elements throughout the network: Ex. Transformers
Recent research discovers that isotropic architectures may reach state-of-the-art performance with lighter components.