Replication Package for "LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms"

This repository contains all the necessary artifacts to replicate the experiments and analyses presented in our paper. It includes the source code for both the SBCR and MergeGen approaches, as well as the scripts required to prepare the datasets, run the experiments, and generate the results.

🔬 Replication Options

We provide two ways to engage with our research artifacts:

Use Pre-processed Data (Recommended for Analysis): If you are primarily interested in analyzing our results or using the final datasets, you can download them directly from our archival repositories. This is the fastest way to get started.
Full Replication (From Scratch): If you wish to replicate our entire experimental process, from data pre-processing to running the tools and analyzing the output, follow the detailed instructions below.

💾 Archival Repositories (Data and Results)

To facilitate reproducibility and further inspection, we have archived our datasets and full experimental results on FigShare.

Pre-processed Datasets: This repository contains the final, clean datasets used directly in our experiments. This is ideal for researchers who want to bypass the initial data pre-processing steps.

Link: https://figshare.com/s/d196f4ccb3ef34d2e770
Full Experimental Results Archive: This repository contains a complete snapshot of our experimental run, including all intermediate files, execution logs, every candidate generated for each conflict, and the trained models produced by MergeGen. This is useful for a deep inspection of all generated artifacts.

Link: https://figshare.com/s/b3cdd351d077a9b08121

⚙️ Full Replication Instructions (From Scratch)

Follow these steps to set up the environment and run the entire experimental pipeline.

Step 1: Environment Setup

First, create and activate a new Conda environment with the required dependencies.

# Create a new conda environment using Python 3.8
conda create -n sbcr_study python=3.8

# Activate the environment
conda activate sbcr_study

# Install the required packages
python -m pip install -r requirements.txt

Step 2: Data Preparation

Run the following script to download the original datasets and pre-process them into the format required for the experiments.

Original Datasets Used:
- Dataset1: SBES 2022 Dataset
- Dataset2: FSE 2022 Dataset

# This script will download and prepare the datasets
./prepare_dataset.sh

Step 3: Run the Experiments

This step involves training the MergeGen models and running both MergeGen and SBCR on the prepared datasets.

# 3.1 Train the MergeGen models for each dataset
# Note: This action can take a very long time, depending on your hardware.
./train_all_models.sh

# 3.2 Run MergeGen to generate resolution candidates for each conflict
./test_all_models.sh

# 3.3 Run the parameter tuning process for SBCR
./tunning_sbcr.sh

# 3.4 Run the final evaluation of SBCR with the tuned parameters
./evaluate_sbcr.sh

Step 4: Analyze the Results

After the experiments are complete, run the following scripts to extract the similarity scores and generate the statistics presented in the paper.

# 4.1 Extract similarities for the candidates generated by MergeGen
./extract_all_mergeGen_similarities.sh

# 4.2 Collect and summarize statistics for all datasets and results
./collect_dataset_stats.sh

The analyses notebooks are located in the analysis folder. They can be used to generate the figures and tables from the paper.

📜 Citation

If you use the artifacts from this repository in your research, please cite our paper (to appear).

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
analysis		analysis
mergeGen		mergeGen
sbcr		sbcr
.gitignore		.gitignore
README.md		README.md
collect_dataset_stats.py		collect_dataset_stats.py
collect_dataset_stats.sh		collect_dataset_stats.sh
evaluate_sbcr.sh		evaluate_sbcr.sh
extract_all_mergeGen_similarities.sh		extract_all_mergeGen_similarities.sh
extract_combination_on_separated_datasets.py		extract_combination_on_separated_datasets.py
filter_FSE_dataset.py		filter_FSE_dataset.py
find_base.py		find_base.py
prepare_dataset.sh		prepare_dataset.sh
requirements.txt		requirements.txt
separate_datasets.py		separate_datasets.py
test_all_models.sh		test_all_models.sh
train_all_models.sh		train_all_models.sh
transform_SBES2022_dataset_to_ase2023.py		transform_SBES2022_dataset_to_ase2023.py
transform_fse2022_dataset_for_ase2023.py		transform_fse2022_dataset_for_ase2023.py
tunning_sbcr.sh		tunning_sbcr.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replication Package for "LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms"

🔬 Replication Options

💾 Archival Repositories (Data and Results)

⚙️ Full Replication Instructions (From Scratch)

Step 1: Environment Setup

Step 2: Data Preparation

Step 3: Run the Experiments

Step 4: Analyze the Results

📜 Citation

About

Uh oh!

Releases

Packages

Languages

gems-uff/sbcr_study

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms"

🔬 Replication Options

💾 Archival Repositories (Data and Results)

⚙️ Full Replication Instructions (From Scratch)

Step 1: Environment Setup

Step 2: Data Preparation

Step 3: Run the Experiments

Step 4: Analyze the Results

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages