188. KL Divergence

KL Divergence

KL Divergence measures the distance between probability distributions. This is used in various places such as the cross-entropy loss or as a loss function in VAE where you want to constrain the latent distribution to a standard distribution.

Example

What exactly does distance between probability distributions mean?
Let’s say we have 3 coins.
1. Coin A: Heads 50% Tails 50%
2. Coin B: Heads 55% Tails 45%
3. Coin C: Heads 90% Tails 10%

Which Coin’s distribution is closer to Coin A. Coin B or Coin C?

For this example, you can easily tell that Coin B is closer, but what happens if the comparing distributions are more complex and hard to tell the difference? If we can quantify the difference(distance) between distributions it will be a lot easier to decide.
KL Divergence does exactly that.