Auto-Scikit-DL: An automatic deep tabular learning package

Auto-Scikit-DL is a deep tabular learning package served as a complement to scikit-learn. It will contain classical and advanced deep model baselines in tabular (machine) learning, automatic feature engineering and model selection methods, flexible training paradigm customization. This project aims to provide unified baseline interface and benchmark usage for the academic community, convenient pipeline construction for the machine learning competition, and rapid engineering experiment for machine learning projects, helping people focus on the specific algorithm design.

It is currently under construction by LionSenSei. More baselines are coming soon. The project will be packaged for public use in the future. If there are any problems or suggestions, feel free to contact [email protected].

Baselines

Here is the baseline list we are going to include in this package (continue to update):

Paper	Baseline	Year	Link
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks	AutoInt	2019	arXiv
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data	NODE	2019	arXiv
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems	DCNv2	2020	arXiv
TabNet: Attentive Interpretable Tabular Learning	TabNet	2020	arXiv
Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain	VIME	2021	arXiv
Revisiting Deep Learning Models for Tabular Data	FT-Transformer	2021	arXiv
Saint: Improved neural networks for tabular data via row attention and contrastive pre-training	SAINT	2021	arXiv
T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction	T2G-Former	2022	arXiv
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second	TabPFN	2022	arXiv
ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data	ExcelFormer	2023	arXiv

Basic Framework

The project is organized into serveral parts:

data: to include in-built dataset and benchmark files, store dataset global settings and infomation, and common data preprocessing scripts.
models: to include baseline implementations, and contains an abstract class TabModel to organize the uniform deep tabular model interface and training paradigm.
configs: to include default hyper-parameter and hyper-parameter search spaces of baselines in the original paper.
utils: to include basic functionalities of: model, building baselines, tunning; deep, common deep learning functions and opitmizers; metrics, metric calculation.

Examples

Some basic usage examples are provided in examples directory, you can run the scripts with python examples/script_name.py. Before run the examples, you can download our preprared in-built datasets in the T2G-Former experiment from this link, then extract to data/datasets folder.

mkdir ./data/datasets # create the directory if it does not exist
tar -zxvf t2g-data0.tar.gz -C ./data/datasets

Add a custom dataset from a single csv file: If you want to load a csv file like in-built datasets, we provide the interface to automatically process from a raw csv file and store it in the package. Then you can load it easily.
Finetune a baseline: You can easily finetune a model by our fit and predict APIs.
Tune a baseline: We provide an end-to-end tune function to perform hyper-parameter search in spaces defined in configs. You can also define your own search spaces (refer to our config files).

Add your models

Currently, you can only achieve this by manually copying your model codes and integrating it into the models folder (refer to models/mlp.py for API alignment, we suggest to copy it and directly add your model codes). Then modify MODEL_CARDS in utils/model.py to add and import your model. We will support adding user models with simple scripts in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
examples		examples
image		image
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-Scikit-DL: An automatic deep tabular learning package

Baselines

Basic Framework

Examples

Add your models

About

Releases

Packages

Languages

License

pytabular-ai/auto-scikit-dl

Folders and files

Latest commit

History

Repository files navigation

Auto-Scikit-DL: An automatic deep tabular learning package

Baselines

Basic Framework

Examples

Add your models

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages