154. Approaches For Tuning Models

There are mainly 2 approaches to tuning a model. Panda Approach: Tune 1 model at a time Caviar Approach: Tune Multiple model at once

153. Non-Local Neural Networks

“Local” means only understanding the CURRENT “time” and “space”. To understand “non-local” nuances (What will the person in the image do next? Where will the soccer ball being kicked head towards?), if we were to use traditional methods such as…

152. KL Divergence

KL Divergence measures the distance between 2 distributions. This can be used to understand Cross-Entropy and deep learning model architectures such as VAE. For Example, lets say there is a coin which has 50% chance of being HEADS and 50%…

150. The Mom Test

There are my key takeaways from the book The Mom Test by Rob Fitzpatrick The Mom Test: Talk about their life instead of your idea,Ask about specifics in the past instead of generics or opinions about the future, Talk less…

147. Why Squared Loss?

Why do we use squared loss instead of absolute loss? One reason is because by squaring the loss you can magnify it which can help train the model. Another reason is because absolute loss is not differentiable when equals 0.…

146. BERT

What is BERT? BERT is a deep learning architecture for natural language processing. If you stack the Transformer’s encoder, you get BERT. What can BERT Solve? Neural Machine Translation Question Answering Sentiment Analysis Text Summarization How to solve the problems…

145. Multi-Class vs Multi-Label / SoftMax vs Sigmoid

Multi-Class = 1 class per image Multi-Label = Includes multiple label in a single image Softmax = Scale output to 0~1 and make the sum equal to 1 so that it becomes probabilities. Useful for multi-class classification. Sigmoid = Scale…