-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Description
Problem Statement
Local model integrations in Kilo Code are unable to reliably invoke tools or retain task context, leading to infinite loops, ignored tasks, and truncated prompts.
These failures occur across multiple models (DevStral, Qwen2.5 Coder, Llama variants).
Details Reported by Octopusman
-
Tool Invocation Failures
- DevStral (q8) and Qwen2.5 Coder 32B (unquantized) never actually execute tool calls. Their responses indicate an attempt to call tools, but the XML tags are not recognized by Kilo
-
Infinite Loop with Llama 4 16×17B
- When using
llama4:16x17b, the model enters an infinite loop instead of proceeding to tool usage or task completion.
- When using
-
Task Context Dropped with Llama 3.3 70B q8
-
The
<task>payload is correctly included in the first API request but is ignored by the model (it prompts “what’s the task?”). -
In the second request, the model finally uses the tool but has already “forgotten” the original task because it wasn’t re‐sent.
-
-
Context Window Misconfiguration
-
Kilo’s local runner does not reflect the model’s true context window (e.g. 128 k tokens for DevStral). Instead, prompts are truncated to 4 096 tokens, emitting warnings like:
truncating input prompt limit=4096 prompt=21068 keep=5 new=4096 -
In contrast,
clineallows manual context window edits and correctly applies the full window.
-
-
Provider vs. Local Discrepancy
- Using DevStral through the Kilo Code provider yields proper tool usage and context handling. The failure only manifests in the local extension.
Steps to Reproduce
-
Setup
-
Install Kilo Code local extension.
-
Configure local model endpoints for DevStral, Qwen2.5 Coder 32B, Llama variants.
-
-
Attempt Tool Call
-
Switch Models
-
Repeat with
llama4:16x17b- Notice the model loops infinitely asking follow-up questions.
-
Repeat with
llama3.3:70b q8- Notice the
<task>payload is ignored, and context is dropped on subsequent requests.
- Notice the
-
-
Inspect Logs
- Observe “truncating input prompt” warnings despite configuring a 128 k token window.
-
Compare with Provider
- Run the same prompts via the Kilo Code provider for DevStral—tools execute correctly and no truncation occurs.
Expected Behavior
-
Reliable Tool Execution: Local models should parse XML‐style tool tags (or JSON if configured) and invoke the corresponding Kilo Code tools automatically.
-
Task Context Persistence: The
<task>content must be retained across API calls so the model can complete the assigned task without repetition of the prompt. -
Correct Context Window Application: Kilo should dynamically set the model’s context window based on the model’s advertised capacity (e.g., 128 k tokens for DevStral, 70B Llama).
-
Consistent Behavior Across Interfaces: Local extension behavior should match provider behavior, barring hardware constraints.
Environment
-
Kilo Code Version: 4.45.0 (cd1c089)
-
Models tested:
-
DevStral – q8
-
Qwen2.5 Coder 32B – unquantized
-
Llama 4 16×17B
-
Llama 3.3 70B q8
-
-
Host OS : Arch
-
Inference server OS : Ubuntu server
Additional Notes
-
Potential root cause could lie in the local runner’s Ollama integration (mirroring known issues in
roocode). -
Context window misreads might stem from an integer field omission in Kilo’s API wrapper compared to
cline.
Please investigate tool‐tag parsing logic, context‐window configuration, and task‐prompt retention in the local extension to align behavior with provider integration.
See the discord thread for more details
Discord thread
Metadata
Metadata
Assignees
Labels
Type
Projects
Status