Replication Package for "LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms"
This repository contains all the necessary artifacts to replicate the experiments and analyses presented in our paper. It includes the source code for both the SBCR and MergeGen approaches, as well as the scripts required to prepare the datasets, run the experiments, and generate the results.
We provide two ways to engage with our research artifacts:
- Use Pre-processed Data (Recommended for Analysis): If you are primarily interested in analyzing our results or using the final datasets, you can download them directly from our archival repositories. This is the fastest way to get started.
- Full Replication (From Scratch): If you wish to replicate our entire experimental process, from data pre-processing to running the tools and analyzing the output, follow the detailed instructions below.
To facilitate reproducibility and further inspection, we have archived our datasets and full experimental results on FigShare.
-
Pre-processed Datasets: This repository contains the final, clean datasets used directly in our experiments. This is ideal for researchers who want to bypass the initial data pre-processing steps.
-
Full Experimental Results Archive: This repository contains a complete snapshot of our experimental run, including all intermediate files, execution logs, every candidate generated for each conflict, and the trained models produced by MergeGen. This is useful for a deep inspection of all generated artifacts.
Follow these steps to set up the environment and run the entire experimental pipeline.
First, create and activate a new Conda environment with the required dependencies.
# Create a new conda environment using Python 3.8
conda create -n sbcr_study python=3.8
# Activate the environment
conda activate sbcr_study
# Install the required packages
python -m pip install -r requirements.txt
Run the following script to download the original datasets and pre-process them into the format required for the experiments.
- Original Datasets Used:
- Dataset1: SBES 2022 Dataset
- Dataset2: FSE 2022 Dataset
# This script will download and prepare the datasets
./prepare_dataset.sh
This step involves training the MergeGen models and running both MergeGen and SBCR on the prepared datasets.
# 3.1 Train the MergeGen models for each dataset
# Note: This action can take a very long time, depending on your hardware.
./train_all_models.sh
# 3.2 Run MergeGen to generate resolution candidates for each conflict
./test_all_models.sh
# 3.3 Run the parameter tuning process for SBCR
./tunning_sbcr.sh
# 3.4 Run the final evaluation of SBCR with the tuned parameters
./evaluate_sbcr.sh
After the experiments are complete, run the following scripts to extract the similarity scores and generate the statistics presented in the paper.
# 4.1 Extract similarities for the candidates generated by MergeGen
./extract_all_mergeGen_similarities.sh
# 4.2 Collect and summarize statistics for all datasets and results
./collect_dataset_stats.sh
The analyses notebooks are located in the analysis folder. They can be used to generate the figures and tables from the paper.
If you use the artifacts from this repository in your research, please cite our paper (to appear).