This Python 3 script takes two repositories and checks for "code clones," or code which has likely been taken from elsewhere, between the two. The algorithm used comes from Lee et al. (2018) - Tree-Pattern-Based Clone Detection with High Precision and Recall. There are four main types of code clones:
- Type 1: exact copies
- Type 2: copies with renamed elements (ex. variables)
- Type 3: copies that have been slightly modified
- Type 4: "semantic" copies (code that is not copied, but does the same thing)
This code checks for Type 1, 2, and 3 clones.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software
- Python 3.7+ (Download Link)
- pip 19+ (typically already installed with Python)
- Git (Download Link)
- Clone the repository
git clone github.com/calebdehaan/codeDuplicationParser.git
- Enter the repository directory
cd codeDuplicationParser
- Install dependencies
pip3 install -r requirements.txt
- Run the program
python3 -m code_duplication [args]
Alternatively you can run the program in a Python virtual environment
./code-duplication.sh
Python packages required for the tool to run
gitpython
bitstring
fastlog
windows-curses
(Windows only, required byfastlog
)
- Python 3.7.3 - The Python version used
- GitPython - Used to pull git repositories