Category Computer Vision

196. Feature Pyramid Network

Feature Pyramid Network Feature pyramids are a basic component for detecting objects on different scales. Before this paper, a lot of research has been avoiding these pyramid structures due to their high computational and memory costs. Feature Pyramid Network tackles…

180. Polynomial Learning Rate

Polynomial Learning Rate For deep learning models, the learning rate is one of the most important hyper-parameters in any deep neural network optimization process. Polynomial Learning Rate is a proposed technique to apply learning rate decay and optimize such process.…

179. Transfer Learning PIDNet

Today I tried to do transfer learning using PIDNet (Since I just learned about PIDNet). Compared to my first attempt, the output is getting slightly better but still not to the level where it is actually useful.

176. CrossEntropyLoss for Segmentation Models

torch.nn.CrossEntropyLoss() Using torch.nn.CrossEntropyLoss() as a loss function for semantic segmentation models was first confusing for me, so I’d like to share it here. CrossEntropyLoss is for multi-class models and it expects at least 2 arguments. One for the model prediction…

174. Non-Max Suppression

Non-Max Suppression is a post-processing method for object detection tasks. In most cases, an object detection model will predict multiple boxes for a single object like the picture in my note. However, we don’t want this crowded output. We instead…

165. Selective Search

Selective Search Selective search is a region proposal method for object detection. It hierarchically groups similar regions based on color, texture, size, and shape. Selective Search uses over-segmented images as input. Then takes the following steps. 1. Add all bounding…

162. Residual Blocks

Why are residual blocks called “residual” blocks? The reason why I was confused was that the equation in the diagram explaining the residual blocks on the research paper was f(x) + x. So I thought, “Where is the residual..?” When…