This repository serves as a toolbox for working with the Ekstra Bladet News Recommendation Dataset (EB-NeRD)—a rich dataset designed to advance research and benchmarking in news recommendation systems.
EB-NeRD is based on user behavior logs from Ekstra Bladet, a classical Danish newspaper published by JP/Politikens Media Group in Copenhagen. The dataset was created as part of the 18th ACM Conference on Recommender Systems Challenge (RecSys'24 Challenge).
This repository provides:
- Starter notebooks for descriptive data analysis, data preprocessing, and baseline modeling.
- Examples of established models to kickstart experimentation.
- A step-by-step tutorial for running a CodaBench server locally, which is required to evaluate models on the hidden test set.
For more information about the dataset, the RecSys '24 Challenge, and its usage, please visit: recsys.eb.dk.
We recommend using conda for environment.
# 1. Create and activate a new conda environment
conda create -n <environment_name> python=3.11
conda activate <environment_name>
# 2. Clone this repo within VSCode or using command line:
git clone https://github.com/ebanalyse/ebnerd-benchmark.git
# 3. Install the core ebrec package to the enviroment:
pip install .
We have experienced issues installing tensorflow for M1 Macbooks (sys_platform == 'darwin'
) when using conda. To avoid this, we suggest to use venv if running on macbooks.
We have encountered issues installing TensorFlow on M1 MacBooks when using conda (i.e., sys_platform == 'darwin'
).
Workaround: Use venv
instead of conda
:
python3 -m venv .venv
source .venv/bin/activate
Alternatively, install .venv
directly in the project folder using conda:
conda create -p .venv python=3.11.8
conda activate ./.venv
To enable GPU support, install the appropriate TensorFlow package based on your platform:
# For Linux
pip install tensorflow-gpu
# For macOS
pip install tensorflow-macos
To get started quickly, we have implemented several news recommender systems, including:
Model | Notebook | Example |
---|---|---|
NRMS | NRMS Notebook | NRMS Example |
LSTUR | - | LSTUR Example |
NPA | - | NPA Example |
NAML | - | NAML Example |
NRMSDocVec | - | NRMSDocVec Example |
The implementations of NRMS, LSTUR, NPA, and NAML are adapted from the excellent recommenders repository, with all non-model-related code removed for simplicity. NRMSDocVec is our variation of NRMS where the NewsEncoder is initialized with document embeddings (i.e., article embeddings generated from a pretrained language model), rather than learning embeddings solely from scratch.
To help you get started, we have created a set of introductory notebooks designed for quick experimentation, including:
- ebnerd_descriptive_analysis: Basic descriptive analysis of EB-NeRD.
- ebnerd_overview: Demonstrates how to join user histories and create binary labels.
Note: These notebooks were developed on macOS. Small adjustments may be required for other operating systems.
Make sure you’ve installed the repository and dependencies. Then activate your environment:
Activate your enviroment:
conda activate <environment_name>
python examples/reproducibility_scripts/ebnerd_nrms.py
--datasplit ebnerd_small \
--epochs 5 \
--bs_train 32 \
--bs_test 32 \
--history_size 20 \
--npratio 4 \
--transformer_model_name FacebookAI/xlm-roberta-large \
--max_title_length 30 \
--head_num 20 \
--head_dim 20 \
--attention_hidden_dim 200 \
--learning_rate 1e-4 \
--dropout 0.20
tensorboard --logdir=ebnerd_predictions/runs
python examples/reproducibility_scripts/ebnerd_nrms_docvec.py \
--datasplit ebnerd_small \
--epochs 5 \
--bs_train 32 \
--history_size 20 \
--npratio 4 \
--document_embeddings Ekstra_Bladet_contrastive_vector/contrastive_vector.parquet \
--head_num 16 \
--head_dim 16 \
--attention_hidden_dim 200 \
--newsencoder_units_per_layer 512 512 512 \
--learning_rate 1e-4 \
--dropout 0.2 \
--newsencoder_l2_regularization 1e-4
tensorboard --logdir=ebnerd_predictions/runs