Category Computer Vision

156. Highway Networks

Highway Networks Training models with DEEP networks becomes difficult, even when using variance-preserving initialization. By adding an information highway (Learning how to route information through the network), it makes it easier to train models even when it is really DEEP.…

155. S3D (Separable 3D CNN)

I’ve learned about S3D(Separable 3D CNN) today so I like to share it here. S3D helps solve three challenges for video analysis. How to understand spatial information. (Recognizing the appearance of an object) How to understand temporal information. (Such as…

153. Non-Local Neural Networks

“Local” means only understanding the CURRENT “time” and “space”. To understand “non-local” nuances (What will the person in the image do next? Where will the soccer ball being kicked head towards?), if we were to use traditional methods such as…

140. Spatial Pyramid Pooling

Spatial Pyramid Pooling helps the network output the same shape regardless of any aspect ratio and input size. Instead of Pooling with a fixed filter size, it divides the input with different levels of ratio, so the output would not…

138. Variational Autoencoders

Autoencoders encode an input to a smaller representation vector (also called a latent vector) and decode that to restore the original input. For example, you can encode an image, send the encoded image to someone, and have them decode it…

137. Vision Transformers

Vision Transformers is inspired by Transformers for natural language processing. Unlike traditional convolution networks with pyramid architectures, ViT has an isotropic architecture, where the input does not downsize. The steps are the following. Split images into “patches” Flatten each path…

136. Computer Vision Architecture Types

There are mainly 2 types of architectures in computer vision. Pyramid Architecture The size/shape of the element is reduced: Ex. Traditional Convolution Networks Isotropic Architecture Have equal size and shape for all elements throughout the network: Ex. Transformers Recent research…

135. UNet

Unet may be one of the most basic researches on segmentation tasks. It consists of 3 parts: Encoding Phase (Apply Convolutions to classify object) -> Bridge -> Decoding Phase(Restore information so that the output would be 388×388). During the final…

134. HRNet

HRNet was a research done by Microsoft which lead to higher performance compared with state-of-the-art architectures. Traditional segmentation models utilize skip connections in order to recover spatial information from previous layers. The problem with this method is that it can’t…

133. FCN Upscaling

In order to classify images more precisely, the traditional way is to apply convolution and pooling to lower the dimension of the input so that the model can understand more complex features. This is ok for classification tasks because you…