Note
This repository contains the official implementation of BartABSA++, our XLLM@ACL 2025 paper that revisits the BartABSA framework, achieves state-of-the-art results on aspect-based sentiment analysis and experiments with bridging the gap between modern decoder-only LLMs and encoder-decoder pointer networks.
This is a complete from-scratch reimplementation of the original BART-ABSA work using modern libraries (PyTorch Lightning, updated Transformers) with significant enhancements for modularity, extensibility, and multi-task support. Our contributions include:
-
Enhanced Architecture: Improved pointer networks with:
- Feature normalization for training stability with larger models
- Parametrized gating mechanisms replacing static hyperparameters
- Additional cross-attention mechanisms reusing pretrained weights
-
Multi-Architecture Support: Extended beyond BART to support encoder-decoder combinations following Rothe et al. (2020):
- BERT, RoBERTa, GPT-2 combinations
- Scaling experiments up to 3.6B parameters
- Systematic evaluation of encoder vs decoder contributions
-
Multi-Task Framework: Support for seven different structured prediction tasks beyond ABSA
-
Modern Implementation: Updated codebase with better reproducibility, logging, and scalability
-
Comprehensive Evaluation: Extensive experiments showing structured approaches remain competitive with modern LLMs
Important
Key Finding from Our Work: Our experiments demonstrate that structured approaches like pointer networks remain highly competitive with modern LLMs for tasks requiring precise relational information extraction. The quality of token-level representations (encoder) is far more important than generative capabilities (decoder) for these structured prediction tasks.
This implementation offers enhanced experimental capabilities, including:
- Comprehensive logging and metrics tracking via Weights & Biases
- Parameter heatmap visualizations (enable with
experiment.write_heatmaps
) - Prediction output in multiple formats including JSON and XMI (enable with
experiment.write_predictions
) - Multi-architecture support (BART, BERT, RoBERTa, GPT combinations)
- Cluster deployment support for both Kubernetes and SLURM
This implementation supports the following structured prediction tasks:
- ABSA (Aspect-based Sentiment Analysis) - Main focus, includes datasets: 14lap, 14res, 15res, 16res
- SSA (Structured Sentiment Analysis)
- SRE (Sentiment Relationship Extraction)
- DEFT (Definition Extraction from Free Text)
- SpaceEval (Space Evaluation)
- GABSA (German Aspect-based Sentiment Analysis)
- GNER (German Named Entity Recognition)
Each task has its own dataset structure and configuration. For more details on the data structure, refer to the data README.
When running the code locally, you should be able to install all necessary dependencies in a virtual environment using the following commands:
cd code
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
As this is an anonymized version of the code, you will need to make several updates to the codebase before everything works. (only update whats needed for your use case)
- Most importantly: Weights & Biases: Update
entity
incode/conf/config.yaml
with your W&B team name, otherwise only offline logging will work.
For remote running, you will need to update the following:
- Email notifications: Replace email addresses in SLURM scripts (
code/slurm/sbatch_*.sh
) - Container registry: Update Docker/Apptainer image URLs in cluster scripts
- File paths: Update all
/home/your_user/
paths to match your system - Cluster configuration: Update server names, namespaces, and resource specifications
cd code
source venv/bin/activate
# Train BartABSA++ on ABSA (default)
python src/run.py
# Train on different tasks
python src/run.py --config-name ssa.yaml # Structured Sentiment Analysis
python src/run.py --config-name deft.yaml # Definition Extraction
# Override specific parameters
python src/run.py model.use_enhanced_architecture=true dataset.name='14res'
Configurations are done using Hydra and stored in the /code/conf/
directory. Some of the most important options are:
Parameter | Description | Default |
---|---|---|
model.use_enhanced_architecture |
Enable our architectural improvements | true |
model.encoder_name |
Backbone encoder model | facebook/bart-base |
dataset.name |
Dataset (14lap, 14res, 15res, 16res) | 14lap |
training.max_epochs |
Maximum training epochs | 200 |
training.batch_size |
Batch size | 16 |
training.learning_rate |
Learning rate | 5e-5 |
Note
See /code/conf/config.yaml
for all configuration options. Each task has its own config file in /code/conf/
.
The codebase (code/
) is structured as follows:
src/
: Main source code directoryrun.py
: Entry point running a single experiment including testingmodel/
: Contains the BartABSA++ model implementation with our enhancementsdataset/
: Data loading and preprocessing modules for all tasksmetrics/
: Task-specific evaluation metricsutils/
: Utility functions and helper classes
conf/
: Hydra config files for running experimentsk8s/
: Kubernetes-related files for cluster deploymentslurm/
: SLURM-related files for cluster deployment
Key components:
model.py
: The main model class implementing our enhanced architecturemodule.py
: The PyTorch Lightning module for the modelmapping_tokenizer.py
: The label conversion for generating the pointer labels
The training script can be easily configured using Hydra via the config files in the conf/
directory or by directly passing parameters to the script. By default the config file config.yaml
is used, which works for the ABSA task on a local machine.
cd code
source venv/bin/activate
python src/run.py
# Optionally pass a different config file to the script (especially needed for the non-absa tasks)
python src/run.py --config-name other_config.yaml
# Or directly pass parameters to the script
python src/run.py dataset.name='other_dataset' experiment.run_name='other_run'
Each task has its own configuration file in the code/conf/
directory. For example ssa.yaml
for Structured Sentiment Analysis.
Make sure to use the appropriate configuration file when running experiments for a specific task (see Training for how to specify this using hydra).
Since the special tokens differ from task to task, they are stored in JSON files in the data/special_tokens_mappings
directory. Ensure that directories.special_tokens_mappings
in the config points to the correct directory and dataset.special_tokens_file
points to the correct file for each task.
Our implementation includes several key improvements over the original:
- Feature Normalization: L2 normalization of embedding spaces + RMSNorm for training stability
- Parametrized Gating: Learnable gates replacing static hyperparameters
- Additional Attention: Cross-attention mechanism reusing BART's pretrained weights
Inspired by Rothe et al. (2020), we support:
- Pure encoder-decoder models (BART)
- Synthetic combinations (BERT2GPT, RoBERTa2RoBERTa, GPT22GPT2, etc.)
- Scaling experiments with various model sizes
# Enhanced BartABSA++ (our main contribution)
python src/run.py model.gating_mode=full_gating model.normalize_encoder_outputs=true model.attention_mechanism=bart model.use_final_layer_norm=true
# State-of-the-art with BART-Large
python src/run.py model.base_model=facebook/bart-large model.gating_mode=full_gating model.normalize_encoder_outputs=true model.attention_mechanism=bart model.use_final_layer_norm=true
# Multi-architecture experiments (e.g., RoBERTa2GPT-2)
python src/run.py model.base_model=FacebookAI/roberta-base model.decoder_model=gpt2 model.gating_mode=full_gating model.normalize_encoder_outputs=true model.attention_mechanism=custom model.use_final_layer_norm=true
For comprehensive experiments including architecture ablations, scaling studies, and multi-task evaluation, see the batch experiment scripts in code/slurm/experimental_runs/
.
The following hyperparameters are based on the original implementation:
Hyperparameter | Value |
---|---|
Batch Size | 16 |
Learning Rate | 5e-5 |
Max Epochs | 200 |
Early Stopping | 30 |
Optimizer | AdamW |
Gradient Clip Val | 5 |
Warmup Steps | 1% of total steps |
LR Scheduler | Linear Scheduler |
Weight Decay | 1e-2 |
Sampler | Bucket (based on source sequence length) |
Decoding Strategy | Beam Search (beam size 4) |
Warning
- The og implementation used beam search for decoding. Since the decoding is implemented manually in this version, currently only greedy decoding is supported.
- The og implementation has a length penalty of 1.0. Since it's not mentioned in the original paper, it was removed.
- The linear scheduler from the og implementation was replaced with the default, pytorch polynomial decay scheduler, as it seemed to perform better.
- The og implementation uses a custom sampler, which was recreated using the source sequence length as the the metric.
- Additionally to the
pengb
dataset the implementation also supports theAstev2
dataset, which can be specified via thedataset.source
parameter.
These settings were used as a starting point for the experiments and may be adjusted for optimal performance in different environments.
See CLUSTER_RUNNING.md for more information on how to run experiments on a SLURM or Kubernetes cluster.
If you use this code in your research, please cite our paper:
@inproceedings{pfister-etal-2025-bartabsa,
title = "{BARTABSA}++: Revisiting {BARTABSA} with Decoder {LLM}s",
author = {Pfister, Jan and
V{\"o}lker, Tom and
Vlasjuk, Anton and
Hotho, Andreas},
editor = "Fei, Hao and
Tu, Kewei and
Zhang, Yuhui and
Hu, Xiang and
Han, Wenjuan and
Jia, Zixia and
Zheng, Zilong and
Cao, Yixin and
Zhang, Meishan and
Lu, Wei and
Siddharth, N. and
{\O}vrelid, Lilja and
Xue, Nianwen and
Zhang, Yue",
booktitle = "Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)",
month = aug,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.xllm-1.13/",
doi = "10.18653/v1/2025.xllm-1.13",
pages = "115--128",
ISBN = "979-8-89176-286-2",
abstract = "We revisit the BARTABSA framework for aspect-based sentiment analysis with modern decoder LLMs to assess the importance of explicit structure modeling today. Our updated implementation - BARTABSA++ - features architectural enhancements that boost performance and training stability.Systematic testing with various encoder-decoder architectures shows that BARTABSA++ with BART-Large achieves state-of-the-art results, even surpassing a finetuned GPT-4o model.Our analysis indicates the encoder{'}s representational quality is vital, while the decoder{'}s role is minimal, explaining the limited benefits of scaling decoder-only LLMs for this task. These findings underscore the complementary roles of explicit structured modeling and large language models, indicating structured approaches remain competitive for tasks requiring precise relational information extraction."
}
Also consider citing the original BART-ABSA work:
@inproceedings{yan-etal-2021-unified,
title = "A Unified Generative Framework for Aspect-based Sentiment Analysis",
author = "Yan, Hang and
Dai, Junqi and
Ji, Tuo and
Qiu, Xipeng and
Zhang, Zheng",
editor = "Zong, Chengqing and
Xia, Fei and
Li, Wenjie and
Navigli, Roberto",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.188/",
doi = "10.18653/v1/2021.acl-long.188",
pages = "2416--2429",
abstract = "Aspect-based Sentiment Analysis (ABSA) aims to identify the aspect terms, their corresponding sentiment polarities, and the opinion terms. There exist seven subtasks in ABSA. Most studies only focus on the subsets of these subtasks, which leads to various complicated ABSA models while hard to solve these subtasks in a unified framework. In this paper, we redefine every subtask target as a sequence mixed by pointer indexes and sentiment class indexes, which converts all ABSA subtasks into a unified generative formulation. Based on the unified formulation, we exploit the pre-training sequence-to-sequence model BART to solve all ABSA subtasks in an end-to-end framework. Extensive experiments on four ABSA datasets for seven subtasks demonstrate that our framework achieves substantial performance gain and provides a real unified end-to-end solution for the whole ABSA subtasks, which could benefit multiple tasks."
}
- Original BARTABSA Paper and Code
- A Unified Generative Framework for Aspect-Based Sentiment Analysis (Yan et al., 2021)
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks (Rothe et al., 2020)
This repository is provided as-is to support reproducible research. If you find this code helpful for your research, please consider:
- ⭐ Starring this repository
- 📄 Citing our paper (see Citation section)
- 🐛 Opening issues for bugs or questions
- 🔧 Contributing improvements via pull requests
For questions about the original BART-ABSA method, please refer to the original repository.