Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ We design an RL training pipeline to train a base model for generating [Triton K

We design the reward function with two components:

1. ✅ Format Checking: Validate correct usage of `<thinking>` and `<answer>` tags.
1. ✅ Format Checking: Validate correct usage of `<think>` and `<answer>` tags.
2. 🔍 Similarity Score: Measure string similarity between generated and ground-truth Triton kernels using Python’s `difflib.SequenceMatcher`. This idea is inspired by [`SWE-RL`](https://arxiv.org/abs/2502.18449).

### 🧪 Evaluation
Expand Down