Maximum Likelihood, as the naming goes, is maximizing the likelihood of your prediction.
Let’s say you have an input x and you want to predict y. In this scenario, you want to maximize the probability of Y given x and the bias (The Likelihood Function above).
To maximize the likelihood function, we need to calculate the derivatives. But, since we are multiplying every single probability for each sample data, it is going to be computationally expensive… So, you take the LOG for both sides and turn the multiplication to addition.