Knowledge-Enriched ML for Tabular Data

This is the official repository for "Knowledge-Enriched Machine Learning for Tabular Data" (NeuS '25). In this repository, you can find KE‑TALENT, a knowledge‑enriched tabular benchmark derived from TALENT, and reference implementations of four concept‑kernel–based approaches.

How to Run

Clone the repo and create the environment

$ git clone https://github.com/dalgu90/concept-kernels.git
$ cd concept-kernels
$ conda env create -f environment.yml

Dataset (KE-TALENT)

Choose one of the following to obtain KE-TALENT:

Option 1. Downloading KE-TALENT directly: Download the KE-TALENT archive from this link (69MB), and unzip it under data/talent/ so each dataset resides at, e.g. data/talent/Abalone_reg/.
Option 2. Manual preprocessing: Download the original TALENT benchmark from this repo, place the unzipped folders under data/talent/, and execute notebooks scripts/talent_data_preproc.ipynb and scripts/talent_column_embed.ipynb to generate the 11 KE-TALENT datasets.

Run concept kenrel approaches

Below is the commands to run the models with default hyper-parameters. Replace the argument "0" with the index of 11 datasets (0 to 10).

1. Smoothing (second arg: kernel_conv | rkhs_norm | laplacian)
$ bash run_talent_mlp.sh mlp_smooth kernel_conv mpnet 0

2. Value kernel
$ bash run_talent_mlp.sh mlp_gsp default mpnet 0 

3. Concept GAT
$ bash run_talent_gnn.sh gatconcept mpnet 0

4. Self-supervised learning (FT-Transformer)
$ bash run_talent_ssl4.sh ssl_ftt mpnet fixcorr InfoNCE false pretrain 0
$ bash run_talent_ssl4.sh ssl_ftt mpnet fixcorr InfoNCE false finetune 0

To perform hyper-parameter optimization (HPO), please run the below commands. Replace the argument "0" with the index of 11 datasets (0 to 10).

1. Smoothing (second arg: kernel_conv | rkhs_norm | laplacian)
$ bash hparam_search_talent_mlp.sh mlp_smooth kernel_conv mpnet 0 false

2. Value kernel
$ bash hparam_search_talent_mlp.sh mlp_gsp default mpnet 0 false

3. Concept GAT
$ bash hparam_search_talent_gnn.sh gatconcept2 mpnet 0 false gatconcept2

4. Self-supervised learning (FT-Transformer) - run once without HPO first
$ bash hparam_search_talent_ssl4.sh ssl_ftt mpnet fixcorr InfoNCE false 9 false

Cite the work

@InProceedings{juyong2025knowledge,
  title = {Knowledge-Enriched Machine Learning for Tabular Data},
  author = {Kim, Juyong and Squires, Chandler and Ravikumar, Pradeep},
  booktitle = {International Conference on Neuro-symbolic Systems},
  year = {2025},
  volume = {288},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
concept_kernels		concept_kernels
configs		configs
data/talent		data/talent
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
hparam_search.py		hparam_search.py
hparam_search_talent_gnn.sh		hparam_search_talent_gnn.sh
hparam_search_talent_mlp.sh		hparam_search_talent_mlp.sh
hparam_search_talent_ssl4.sh		hparam_search_talent_ssl4.sh
main.py		main.py
run_talent_gnn.sh		run_talent_gnn.sh
run_talent_mlp.sh		run_talent_mlp.sh
run_talent_ssl4.sh		run_talent_ssl4.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Knowledge-Enriched ML for Tabular Data

How to Run

Clone the repo and create the environment

Dataset (KE-TALENT)

Run concept kenrel approaches

Cite the work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dalgu90/concept-kernels

Folders and files

Latest commit

History

Repository files navigation

Knowledge-Enriched ML for Tabular Data

How to Run

Clone the repo and create the environment

Dataset (KE-TALENT)

Run concept kenrel approaches

Cite the work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages