Category AI

Computer Vision

138. Variational Autoencoders

Autoencoders encode an input to a smaller representation vector (also called a latent vector) and decode that to restore the original input. For example, you can encode an image, send the encoded image to someone, and have them decode it…

Kyosuke
June 28, 2022

Computer Vision, Deep Learning

137. Vision Transformers

Vision Transformers is inspired by Transformers for natural language processing. Unlike traditional convolution networks with pyramid architectures, ViT has an isotropic architecture, where the input does not downsize. The steps are the following. Split images into “patches” Flatten each path…

Kyosuke
June 27, 2022

Computer Vision

136. Computer Vision Architecture Types

There are mainly 2 types of architectures in computer vision. Pyramid Architecture The size/shape of the element is reduced: Ex. Traditional Convolution Networks Isotropic Architecture Have equal size and shape for all elements throughout the network: Ex. Transformers Recent research…

Kyosuke
June 26, 2022

Computer Vision, Image Segmentation

135. UNet

Unet may be one of the most basic researches on segmentation tasks. It consists of 3 parts: Encoding Phase (Apply Convolutions to classify object) -> Bridge -> Decoding Phase(Restore information so that the output would be 388×388). During the final…

Kyosuke
June 25, 2022

Computer Vision, Image Segmentation

134. HRNet

HRNet was a research done by Microsoft which lead to higher performance compared with state-of-the-art architectures. Traditional segmentation models utilize skip connections in order to recover spatial information from previous layers. The problem with this method is that it can’t…

Kyosuke
June 24, 2022

Computer Vision, Image Segmentation

133. FCN Upscaling

In order to classify images more precisely, the traditional way is to apply convolution and pooling to lower the dimension of the input so that the model can understand more complex features. This is ok for classification tasks because you…

Kyosuke
June 23, 2022

Computer Vision, Image Segmentation

132. Attention UNet

Attention Unet highlights only relevant activations during training. This can not only perform better when the target you want to detect is relatively tiny compared to the size of the picture, but it can also reduce unnecessary computations. The overall…

Kyosuke
June 22, 2022

131. Japanese Society for Artificial Intelligence National Convention

Once a year, researchers presents their proposals in traditional architectures in Japan and decide which research gets awarded. This year it is being held at Kyoto International Conference Center. There are 5 periods each day and each period gathers researches…

Kyosuke
June 17, 2022

Computer Vision, Image Segmentation

130. Loss Options for Semantic Segmentation

If I were to restart a semantic segmentation model project again, I would choose the loss function by considering the following. Is there any imbalance in your data? ⇒（NO）⇒ Binary Cross Entropy ⇓ (YES) ⇓ Is the area you want…

Kyosuke
June 16, 2022

Computer Vision, Object Detection

129. Semi-3D Training for Crack Detection

The traditional way to detect cracks inside concretes is to use A) Radar data of the section of the concrete and B) Label Image of the section to train a pix2pix model. But, this methodology struggles to detect depending on…

Kyosuke
June 15, 2022