Findings for Tabular Medium-sized Datasets
While deep learning has enabled tremendous progress on text and image datasets, results show that tree-based models remain state-of-the-art on tabular medium-sized data(~10K samples).
Finding 1: NNs are biased to overly smooth solutions.
NN’s struggle to fit irregular functions, while decision-tree-based models don’t exhibit such bias because it learns piece-wise constant functions.
Finding 2: Uninformative Features affect MLP-Like NNs
Tabular Data contains many uninformative features and results show that MPL-like architectures are not robust to that
Finding 3: Data are non-invariant by rotation, so should be learning procedures.
Reference: Paper: Why do tree-based models still outperform deep
learning on tabular data?