Category Deep Learning

Deep Learning

163. Why Normalize Inputs?

Why Do we Normalize Inputs? When the input is not normalized, the shape of the cost function can become distorted like the diagram on the left. This leads to instability when optimizing the model. The training speed decreases depending on…

Kyosuke
July 24, 2022

Computer Vision, Research Paper

162. Residual Blocks

Why are residual blocks called “residual” blocks? The reason why I was confused was that the equation in the diagram explaining the residual blocks on the research paper was f(x) + x. So I thought, “Where is the residual..?” When…

Kyosuke
July 23, 2022

AI, Computer Vision

156. Highway Networks

Highway Networks Training models with DEEP networks becomes difficult, even when using variance-preserving initialization. By adding an information highway (Learning how to route information through the network), it makes it easier to train models even when it is really DEEP.…

Kyosuke
July 17, 2022

Computer Vision, Pytorch

155. S3D (Separable 3D CNN)

I’ve learned about S3D(Separable 3D CNN) today so I like to share it here. S3D helps solve three challenges for video analysis. How to understand spatial information. (Recognizing the appearance of an object) How to understand temporal information. (Such as…

Kyosuke
July 16, 2022

Computer Vision, Deep Learning

153. Non-Local Neural Networks

“Local” means only understanding the CURRENT “time” and “space”. To understand “non-local” nuances (What will the person in the image do next? Where will the soccer ball being kicked head towards?), if we were to use traditional methods such as…

Kyosuke
July 13, 2022

146. BERT

What is BERT? BERT is a deep learning architecture for natural language processing. If you stack the Transformer’s encoder, you get BERT. What can BERT Solve? Neural Machine Translation Question Answering Sentiment Analysis Text Summarization How to solve the problems…

Kyosuke
July 6, 2022

Deep Learning

144. Visualizing 3D Volume Data

You can use 3DSlicer to visualize 3d medical data. Note that you need to convert your data to a volume(nifti file) if your current data is a single section(such as dcm file).

Kyosuke
July 4, 2022

Image Segmentation

140. Spatial Pyramid Pooling

Spatial Pyramid Pooling helps the network output the same shape regardless of any aspect ratio and input size. Instead of Pooling with a fixed filter size, it divides the input with different levels of ratio, so the output would not…

Kyosuke
June 30, 2022

Computer Vision

138. Variational Autoencoders

Autoencoders encode an input to a smaller representation vector (also called a latent vector) and decode that to restore the original input. For example, you can encode an image, send the encoded image to someone, and have them decode it…

Kyosuke
June 28, 2022

Computer Vision, Deep Learning

137. Vision Transformers

Vision Transformers is inspired by Transformers for natural language processing. Unlike traditional convolution networks with pyramid architectures, ViT has an isotropic architecture, where the input does not downsize. The steps are the following. Split images into “patches” Flatten each path…

Kyosuke
June 27, 2022