Category AI

147. Why Squared Loss?

Why do we use squared loss instead of absolute loss? One reason is because by squaring the loss you can magnify it which can help train the model. Another reason is because absolute loss is not differentiable when equals 0.…

146. BERT

What is BERT? BERT is a deep learning architecture for natural language processing. If you stack the Transformer’s encoder, you get BERT. What can BERT Solve? Neural Machine Translation Question Answering Sentiment Analysis Text Summarization How to solve the problems…

141. Caching Dataset For Faster Training

The time for training a computer vision model can be quite long which leads to a slower PDCA cycle. One way for speeding up training is caching the dataset before starting the training. When you load your data to the…

140. Spatial Pyramid Pooling

Spatial Pyramid Pooling helps the network output the same shape regardless of any aspect ratio and input size. Instead of Pooling with a fixed filter size, it divides the input with different levels of ratio, so the output would not…

139. Tools For Research

Here are some tools I use when doing research on academic papers. Arxiv-Sanity: This gives a preview of the research paper without the need of downloading the pdf. PaperWithCode:This site offers a link to code implementaion as well ConnectedPapers: This…