“Local” means only understanding the CURRENT “time” and “space”. To understand “non-local” nuances (What will the person in the image do next? Where will the soccer ball being kicked head towards?), if we were to use traditional methods such as convolutions, we will need to apply the convolution repeatedly. This can be inefficient, and it can cause optimization difficulties. Non-Local Neural Networks can help solve that problem.