Category AI

138. Variational Autoencoders

Autoencoders encode an input to a smaller representation vector (also called a latent vector) and decode that to restore the original input. For example, you can encode an image, send the encoded image to someone, and have them decode it…

137. Vision Transformers

Vision Transformers is inspired by Transformers for natural language processing. Unlike traditional convolution networks with pyramid architectures, ViT has an isotropic architecture, where the input does not downsize. The steps are the following. Split images into “patches” Flatten each path…

136. Computer Vision Architecture Types

There are mainly 2 types of architectures in computer vision. Pyramid Architecture The size/shape of the element is reduced: Ex. Traditional Convolution Networks Isotropic Architecture Have equal size and shape for all elements throughout the network: Ex. Transformers Recent research…

135. UNet

Unet may be one of the most basic researches on segmentation tasks. It consists of 3 parts: Encoding Phase (Apply Convolutions to classify object) -> Bridge -> Decoding Phase(Restore information so that the output would be 388×388). During the final…

134. HRNet

HRNet was a research done by Microsoft which lead to higher performance compared with state-of-the-art architectures. Traditional segmentation models utilize skip connections in order to recover spatial information from previous layers. The problem with this method is that it can’t…

133. FCN Upscaling

In order to classify images more precisely, the traditional way is to apply convolution and pooling to lower the dimension of the input so that the model can understand more complex features. This is ok for classification tasks because you…

132. Attention UNet

Attention Unet highlights only relevant activations during training. This can not only perform better when the target you want to detect is relatively tiny compared to the size of the picture, but it can also reduce unnecessary computations. The overall…