When training a model, we usually use a cost function to calculate how far away the predictions are from the actual results. By replacing that cost function with a function called energy function, we call that an energy-based model. What the energy function is trying to do, is to find out how compatible the input x(Observed variable) and y(Variable to be predicted) are. When y is compatible with x, the energy function would become low, if not, the energy function would become high.
The Figure above is from the actual paper by Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang.
So, whereas the cost function is used to train the model, the energy function is used for inference.