iLTM: Merging Decision Trees and Neural Networks

TL;DR

iLTM bridges tree-based methods and deep learning for tabular data, delivering stronger predictions on structured datasets.

In the sprawling landscape of machine learning, tabular data remains a cornerstone of real-world applications, from healthcare diagnostics to financial forecasting, yet it has stubbornly resisted the transformative advances seen in domains like natural language processing and computer vision. Gradient-boosted decision trees (GBDTs) have long dominated this space due to their robustness and ease of use, but they require extensive, task-specific tuning for each new dataset, a process that becomes prohibitively slow in large-scale or diverse settings. This gap has fueled a pressing need for foundation models in the tabular domain—systems that can leverage pre-trained knowledge to deliver strong performance across varied tasks with minimal adaptation. Enter iLTM, the Integrated Large Tabular Model, which unifies tree-based embeddings, meta-trained hypernetworks, multilayer perceptrons (MLPs), and retrieval mechanisms into a single architecture, promising to bridge the chasm between traditional s and modern deep learning.

To construct iLTM, the authors employed a meticulous ology centered on large-scale meta-training and architectural integration. The model begins with an embedding stage that processes raw tabular data through two paths: GBDT-based embeddings, which convert inputs into sparse binary representations using decision tree leaf indices, and a robust preprocessing pipeline that handles categorical encoding, missing values, and feature scaling. These are then projected into a fixed-size, dimensionality-agnostic representation via random features and principal component analysis (PCA), ensuring consistency across datasets of varying sizes. The core of iLTM is its hypernetwork, meta-trained on over 1,800 heterogeneous classification datasets from OpenML, which generates the weights of a main MLP network tailored to each new task. This hypernetwork leverages dataset-level embeddings and labels to produce specialized parameters, while a retrieval-augmented component enhances predictions by incorporating nearest-neighbor information from the training data, controlled by a weight parameter α. The training process, detailed in Algorithm 1 of the paper, involves optimizing the hypernetwork across this vast meta-collection using cross-entropy loss, with embeddings precomputed offline to streamline efficiency.

Of iLTM are compelling, demonstrating superior performance across multiple benchmarks. On the TabZilla Hard benchmark, which includes 36 challenging datasets selected for their resistance to simple models, iLTM achieved the best average AUC ranking, outperforming well-tuned GBDTs like XGBoost and CatBoost, as well as leading deep tabular models such as TabPFN and TabR. In high-dimensional settings, particularly biomedical datasets like SMK-CAN-187 and TOX-171, iLTM maintained its advantage, with AUC scores showing significant improvements over XGBoost. Remarkably, despite being pre-trained solely on classification tasks, iLTM transferred effectively to regression after light fine-tuning, achieving the top average rank on 18 public regression datasets and reducing root mean square error (RMSE) by over 46% compared to random initialization. Additional analyses, such as weight space visualizations, revealed that the hypernetwork generates clustered embeddings for similar tasks, and ensemble predictors diverge during fine-tuning, boosting diversity and performance without extensive retraining.

Of iLTM extend far beyond academic benchmarks, potentially reshaping how industries handle tabular data. By amortizing the cost of hyperparameter tuning through meta-learning, iLTM reduces deployment time and computational overhead, making it suitable for applications in finance, healthcare, and logistics where rapid adaptation is crucial. Its ability to unify tree-based inductive biases with neural network flexibility addresses long-standing limitations in tabular learning, offering a scalable framework that performs robustly from small to large datasets. This integration could accelerate the adoption of foundation models in tabular domains, fostering more automated and efficient machine learning pipelines. Moreover, the open-source release of iLTM's code and weights encourages further research and practical deployments, potentially leading to innovations in areas like personalized medicine and predictive maintenance.

However, iLTM is not without limitations. The reliance on GBDT embeddings introduces additional preprocessing steps that may increase latency in real-time scenarios, though the robust preprocessing option can mitigate this. The retrieval mechanism uses a fixed similarity metric, which might underperform if feature distributions drift over time, limiting its effectiveness in dynamic environments. Additionally, the model's pre-training on classification tasks alone constrains its generalization to other task types without fine-tuning, and the computational demands of meta-training require substantial GPU resources, potentially barring smaller organizations. Future work could explore extending pre-training to include regression and other tasks, dynamic retrieval adaptations, or lightweight attention mechanisms to enhance feature interactions, further solidifying iLTM's role in the evolution of tabular AI.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn