Highway Networks
Training models with DEEP networks becomes difficult, even when using variance-preserving initialization. By adding an information highway (Learning how to route information through the network), it makes it easier to train models even when it is really DEEP.
Information Highway
In addition to a plain feedforward network’s lth layer’s output, the paper adds a transform gate(T(x,Wt)=sigmoid function: How much of the output is produced by transforming the output: U) and a carry gate(C(x,Wc)=1-T(x,Wt): How much of the input is being carried).
If T(x,Wt) = 0, it will pass the input directly as output, hence the term, information highway.