Skip to content

ebanalyse/ebnerd-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors

Ekstra Bladet News Recommendation Dataset (EB-NeRD)

This repository serves as a toolbox for working with the Ekstra Bladet News Recommendation Dataset (EB-NeRD)—a rich dataset designed to advance research and benchmarking in news recommendation systems.

EB-NeRD is based on user behavior logs from Ekstra Bladet, a classical Danish newspaper published by JP/Politikens Media Group in Copenhagen. The dataset was created as part of the 18th ACM Conference on Recommender Systems Challenge (RecSys'24 Challenge).

What You'll Find Here

This repository provides:

  • Starter notebooks for descriptive data analysis, data preprocessing, and baseline modeling.
  • Examples of established models to kickstart experimentation.
  • A step-by-step tutorial for running a CodaBench server locally, which is required to evaluate models on the hidden test set.

Useful Links

For more information about the dataset, the RecSys '24 Challenge, and its usage, please visit: recsys.eb.dk.

CodaBench

Papers


Getting Started

We recommend using conda for environment.

Installation

# 1. Create and activate a new conda environment
conda create -n <environment_name> python=3.11
conda activate <environment_name>

# 2. Clone this repo within VSCode or using command line:
git clone https://github.com/ebanalyse/ebnerd-benchmark.git

# 3. Install the core ebrec package to the enviroment:
pip install .

M1 Mac Users

We have experienced issues installing tensorflow for M1 Macbooks (sys_platform == 'darwin') when using conda. To avoid this, we suggest to use venv if running on macbooks.

We have encountered issues installing TensorFlow on M1 MacBooks when using conda (i.e., sys_platform == 'darwin'). Workaround: Use venv instead of conda:

python3 -m venv .venv
source .venv/bin/activate

Alternatively, install .venv directly in the project folder using conda:

conda create -p .venv python=3.11.8
conda activate ./.venv

GPU Support

To enable GPU support, install the appropriate TensorFlow package based on your platform:

# For Linux
pip install tensorflow-gpu
# For macOS
pip install tensorflow-macos

Algorithms

To get started quickly, we have implemented several news recommender systems, including:

Model Notebook Example
NRMS NRMS Notebook NRMS Example
LSTUR - LSTUR Example
NPA - NPA Example
NAML - NAML Example
NRMSDocVec - NRMSDocVec Example

The implementations of NRMS, LSTUR, NPA, and NAML are adapted from the excellent recommenders repository, with all non-model-related code removed for simplicity. NRMSDocVec is our variation of NRMS where the NewsEncoder is initialized with document embeddings (i.e., article embeddings generated from a pretrained language model), rather than learning embeddings solely from scratch.


Data Manipulation & Enrichment

To help you get started, we have created a set of introductory notebooks designed for quick experimentation, including:

Note: These notebooks were developed on macOS. Small adjustments may be required for other operating systems.


Reproduce EB-NeRD Experiments

Make sure you’ve installed the repository and dependencies. Then activate your environment:

Activate your enviroment:

conda activate <environment_name>
python examples/reproducibility_scripts/ebnerd_nrms.py
  --datasplit ebnerd_small \
  --epochs 5 \
  --bs_train 32 \
  --bs_test 32 \
  --history_size 20 \
  --npratio 4 \
  --transformer_model_name FacebookAI/xlm-roberta-large \
  --max_title_length 30 \
  --head_num 20 \
  --head_dim 20 \
  --attention_hidden_dim 200 \
  --learning_rate 1e-4 \
  --dropout 0.20

Tensorboards:

tensorboard --logdir=ebnerd_predictions/runs
python examples/reproducibility_scripts/ebnerd_nrms_docvec.py \
  --datasplit ebnerd_small \
  --epochs 5 \
  --bs_train 32 \
  --history_size 20 \
  --npratio 4 \
  --document_embeddings Ekstra_Bladet_contrastive_vector/contrastive_vector.parquet \
  --head_num 16 \
  --head_dim 16 \
  --attention_hidden_dim 200 \
  --newsencoder_units_per_layer 512 512 512 \
  --learning_rate 1e-4 \
  --dropout 0.2 \
  --newsencoder_l2_regularization 1e-4

Tensorboards:

tensorboard --logdir=ebnerd_predictions/runs

About

Ekstra Bladet Recommender System repository for benchmarking the EBNeRD dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published