188. KL Divergence

KL Divergence

KL Divergence measures the distance between probability distributions. This is used in various places such as the cross-entropy loss or as a loss function in VAE where you want to constrain the latent distribution to a standard distribution.

Example

What exactly does distance between probability distributions mean?
Let’s say we have 3 coins.
1. Coin A: Heads 50% Tails 50%
2. Coin B: Heads 55% Tails 45%
3. Coin C: Heads 90% Tails 10%

Which Coin’s distribution is closer to Coin A. Coin B or Coin C?

For this example, you can easily tell that Coin B is closer, but what happens if the comparing distributions are more complex and hard to tell the difference? If we can quantify the difference(distance) between distributions it will be a lot easier to decide.
KL Divergence does exactly that.

KL Divergence

Example

Related Posts

414. Graph Neural Network Basics

413. Tips For Developing Vector Databases

412. Augmenting LLMs with Private Data