We use the Python language, where dependencies can be found in trace_gen/requirements.txt.
Also, we install our code as a Python package, and we use poetry.
Use the following commands:
conda create -n tracellm python=3.8 -y
conda activate tracellm
pip install poetry
# In the root directory
poetry install
cd trace_gen
pip install -r requirements.txtWe use CallGraph data in Alibaba microservice v2022 traces as our training data.
Fetch the first 20 files of CallGraph using the scripts in Alibaba's repo and preprocess the data using our scripts.
To convert the separate API calls into call graphs and remove redundant call graphs, use the following command. Make sure to change file directories before you execute the command.
> python trace_gen/preprocess/trace_to_training_data.py
> python trace_gen/preprocess/remove_redundant_training_data.pyTo collect call graph stats required to generate instructions, run the following commands:
> python trace_gen/preprocess/trace_to_cg_stats.py
> python trace_gen/preprocess/merge_cg_stats.pyTo convert the call graphs to text representations, run the following command. Make sure to change file directories before you execute the command.
Also, make sure to set the task_type correctly depending on your use cases:
TraceGenTaskType.graph_gen_non_recursive: tabular formatTraceGenTaskType.graph_gen_recursive: recursive formatTraceGenTaskType.graph_gen: instruction-tuning
> python trace/preprocess/training_data_to_text_representations.pyWe include part of the training datasets in the dataset_examples folder:
tabular_dataset.txt: dataset in tabular formatrecursive_dataset.txt: dataset in recursive formatrecursive_instruction_dataset.txt: dataset in recursive format with instructions
Model training scripts are in trace_gen/train.
pretraining.py: For pretraining LLaMA-2-7B with trace data.sft.py: For supervised-fine-tuning the model with instruction datasets.
To get the accuracy report, follow the script in trace_gen/generate/run_accuracy_eval.sh.
Before running the script, you'll need:
- Prompt files - These contain the input instructions for the model
- LoRA adapter checkpoints - These are model weights after training
Example file locations:
- LoRA adapters: Find them in the
checkpointsdirectory - Example prompts: Located in various directories starting with
trace_gen/sft_heatmap_prompts