209. Why Tree-Based Models Still Outperform Deep Learning On Tabular Data?

Findings for Tabular Medium-sized Datasets

While deep learning has enabled tremendous progress on text and image datasets, results show that tree-based models remain state-of-the-art on tabular medium-sized data(~10K samples).

Finding 1: NNs are biased to overly smooth solutions.

NN’s struggle to fit irregular functions, while decision-tree-based models don’t exhibit such bias because it learns piece-wise constant functions.

Finding 2: Uninformative Features affect MLP-Like NNs

Tabular Data contains many uninformative features and results show that MPL-like architectures are not robust to that

Finding 3: Data are non-invariant by rotation, so should be learning procedures.

Reference: Paper: Why do tree-based models still outperform deep
learning on tabular data?