108. Binary Cross Entropy

I’ve learned that, for semantic segmentation, the Binary Cross Entropy as the loss function, instead of MSE, might be the one to first test out.

Motivation
When you check the derivative for MSE(Mean Squared Error)’s cost function, you’ll notice that it is dependent on sigma prime. Considering that the sigma prime will be distributed like the sigmoid function, if the observation point is too far away from 0, the slope would be really close to 0 as well. Therefore, leading to slow convergence.

Binary Cross Entropy resolves that problem. As you can see in the picture, the derivative for the BCE’s cost function is not dependent on sigma prime. This can help the model converge more faster during training.