Kyosuke

Kyosuke

142. Building a Second Brain

I really like the concept introduced in the book, Building a Second Brain by Tiago Forte, so I like to share it here. The author says that we try so many things to capture new information but rarely take the…

141. Caching Dataset For Faster Training

The time for training a computer vision model can be quite long which leads to a slower PDCA cycle. One way for speeding up training is caching the dataset before starting the training. When you load your data to the…

140. Spatial Pyramid Pooling

Spatial Pyramid Pooling helps the network output the same shape regardless of any aspect ratio and input size. Instead of Pooling with a fixed filter size, it divides the input with different levels of ratio, so the output would not…

139. Tools For Research

Here are some tools I use when doing research on academic papers. Arxiv-Sanity: This gives a preview of the research paper without the need of downloading the pdf. PaperWithCode:This site offers a link to code implementaion as well ConnectedPapers: This…

138. Variational Autoencoders

Autoencoders encode an input to a smaller representation vector (also called a latent vector) and decode that to restore the original input. For example, you can encode an image, send the encoded image to someone, and have them decode it…

137. Vision Transformers

Vision Transformers is inspired by Transformers for natural language processing. Unlike traditional convolution networks with pyramid architectures, ViT has an isotropic architecture, where the input does not downsize. The steps are the following. Split images into “patches” Flatten each path…

136. Computer Vision Architecture Types

There are mainly 2 types of architectures in computer vision. Pyramid Architecture The size/shape of the element is reduced: Ex. Traditional Convolution Networks Isotropic Architecture Have equal size and shape for all elements throughout the network: Ex. Transformers Recent research…

135. UNet

Unet may be one of the most basic researches on segmentation tasks. It consists of 3 parts: Encoding Phase (Apply Convolutions to classify object) -> Bridge -> Decoding Phase(Restore information so that the output would be 388×388). During the final…