You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Key Achievement:** Outperforms the base model on AIME, achieving a mean score of **0.232 ± 0.003** (vs. 0.110 ± 0.004 for base), demonstrating strong reasoning capabilities without supervision.
23376
+
**Use Case:** Ideal for reasoning tasks, especially where labeled data is scarce. Best suited for evaluation on math and logical reasoning benchmarks.
23377
+
**Note:** This is a fine-tuned variant of Qwen2.5-7B-Instruct, trained using an innovative unsupervised RL method. The model is not quantized — the original full-precision weights are available in the base repository.
0 commit comments