Category Computer Vision

Computer Vision, Research Paper

196. Feature Pyramid Network

Feature Pyramid Network Feature pyramids are a basic component for detecting objects on different scales. Before this paper, a lot of research has been avoiding these pyramid structures due to their high computational and memory costs. Feature Pyramid Network tackles…

Kyosuke
August 30, 2022

Computer Vision, Research Paper

193. Triplet Networks: Deep Metric Learning

Deep Metric Learning Let’s say we want to create a model that can do face recognition(Face Identification and face verification). We CAN use traditional deep learning since it can perform really well but it requires a lot of training data.…

Kyosuke
August 23, 2022

Object Detection

183. Additional Parameters To Consider For 3D Object Detection

Compared with 2D Object Detection, there are several additional parameters to consider when it comes to 3D object detection 2D Object Detection: X coordinate for the center of the bounding box Y coordinate for the center of the bounding box…

Kyosuke
August 13, 2022

Image Segmentation, Research Paper

180. Polynomial Learning Rate

Polynomial Learning Rate For deep learning models, the learning rate is one of the most important hyper-parameters in any deep neural network optimization process. Polynomial Learning Rate is a proposed technique to apply learning rate decay and optimize such process.…

Kyosuke
August 10, 2022

Image Segmentation

179. Transfer Learning PIDNet

Today I tried to do transfer learning using PIDNet (Since I just learned about PIDNet). Compared to my first attempt, the output is getting slightly better but still not to the level where it is actually useful.

Kyosuke
August 9, 2022

Image Segmentation

178. Bilinear Interpolation For Images

Today I’ve learned about bilinear interpolation for images, so I’d like to share it here. To simplify the concept, here is an example if we were to upscale an image by a factor of 2 using bilinear interpolation. For semantic…

Kyosuke
August 8, 2022

Image Segmentation

176. CrossEntropyLoss for Segmentation Models

torch.nn.CrossEntropyLoss() Using torch.nn.CrossEntropyLoss() as a loss function for semantic segmentation models was first confusing for me, so I’d like to share it here. CrossEntropyLoss is for multi-class models and it expects at least 2 arguments. One for the model prediction…

Kyosuke
August 6, 2022

Object Detection

174. Non-Max Suppression

Non-Max Suppression is a post-processing method for object detection tasks. In most cases, an object detection model will predict multiple boxes for a single object like the picture in my note. However, we don’t want this crowded output. We instead…

Kyosuke
August 4, 2022

Object Detection

165. Selective Search

Selective Search Selective search is a region proposal method for object detection. It hierarchically groups similar regions based on color, texture, size, and shape. Selective Search uses over-segmented images as input. Then takes the following steps. 1. Add all bounding…

Kyosuke
July 26, 2022

Computer Vision, Research Paper

162. Residual Blocks

Why are residual blocks called “residual” blocks? The reason why I was confused was that the equation in the diagram explaining the residual blocks on the research paper was f(x) + x. So I thought, “Where is the residual..?” When…

Kyosuke
July 23, 2022