Unsupervised Pre-training is a method to initialize the weights for the hidden layers using unsupervised learning.
And the procedure to do that is GREEDY LAYER-WISE PRETRAINING.
In this procedure, the machine will learn the weights in an unsupervised manner 1 layer at a time, with the previous layers fixed. Then finally add another layer as an output layer and fine-tune the model with supervised learning. It’s called greedy because it optimizes each piece of solutions at a time instead of jointly optimizing all pieces. It’s called layer-wise because it trains 1 layer at a time.
Unsupervised learning can work as a regularizer, and most helpful when the labeled data you have is VERY small. Also note that this method is expecially useful when doing NLP, because word embeddings encode similarity between words by their distance.