Local Models Fail to Invoke Tools & Mismanage Context Across Calls

### Description

## Problem Statement

Local model integrations in Kilo Code are unable to reliably invoke tools or retain task context, leading to infinite loops, ignored tasks, and truncated prompts. 
These failures occur across multiple models (DevStral, Qwen2.5 Coder, Llama variants).

---

## Details Reported by Octopusman

1. **Tool Invocation Failures**
    
    - **DevStral (q8)** and **Qwen2.5 Coder 32B (unquantized)** never actually execute tool calls. Their responses indicate an attempt to call tools, but the XML tags are not recognized by Kilo
        
2. **Infinite Loop with Llama 4 16×17B**
    
    - When using `llama4:16x17b`, the model enters an infinite loop instead of proceeding to tool usage or task completion.
        
3. **Task Context Dropped with Llama 3.3 70B q8**
    
    - The `<task>` payload is correctly included in the first API request but is ignored by the model (it prompts “what’s the task?”).
        
    - In the second request, the model finally uses the tool but has already “forgotten” the original task because it wasn’t re‐sent.
        
4. **Context Window Misconfiguration**
    
    - Kilo’s local runner does not reflect the model’s true context window (e.g. 128 k tokens for DevStral). Instead, prompts are truncated to 4 096 tokens, emitting warnings like:
        
        ```
        truncating input prompt limit=4096 prompt=21068 keep=5 new=4096
        ```
        
    - In contrast, `cline` allows manual context window edits and correctly applies the full window.
        
5. **Provider vs. Local Discrepancy**
    
    - Using DevStral through the Kilo Code **provider** yields proper tool usage and context handling. The failure only manifests in the **local** extension.
        

---

## Steps to Reproduce

1. **Setup**
    
    - Install Kilo Code local extension.
        
    - Configure local model endpoints for DevStral, Qwen2.5 Coder 32B, Llama variants.
        
2. **Attempt Tool Call**
       
3. **Switch Models**
    
    - Repeat with `llama4:16x17b`
        
        - Notice the model loops infinitely asking follow-up questions.
            
    - Repeat with `llama3.3:70b q8`
        
        - Notice the `<task>` payload is ignored, and context is dropped on subsequent requests.
            
4. **Inspect Logs**
    
    - Observe “truncating input prompt” warnings despite configuring a 128 k token window.
        
5. **Compare with Provider**
    
    - Run the same prompts via the Kilo Code provider for DevStral—tools execute correctly and no truncation occurs.
        

---

## Expected Behavior

- **Reliable Tool Execution:** Local models should parse XML‐style tool tags (or JSON if configured) and invoke the corresponding Kilo Code tools automatically.
    
- **Task Context Persistence:** The `<task>` content must be retained across API calls so the model can complete the assigned task without repetition of the prompt.
    
- **Correct Context Window Application:** Kilo should dynamically set the model’s context window based on the model’s advertised capacity (e.g., 128 k tokens for DevStral, 70B Llama).
    
- **Consistent Behavior Across Interfaces:** Local extension behavior should match provider behavior, barring hardware constraints.
    

---

## Environment

- **Kilo Code** Version: 4.45.0 (cd1c0893)
    
- Models tested:
    
    - DevStral – q8
        
    - Qwen2.5 Coder 32B – unquantized
        
    - Llama 4 16×17B
        
    - Llama 3.3 70B q8
        
- Host OS : Arch
- Inference server OS : Ubuntu server 
    

---

## Additional Notes
  
- Potential root cause could lie in the local runner’s Ollama integration (mirroring known issues in `roocode`).
    
- Context window misreads might stem from an integer field omission in Kilo’s API wrapper compared to `cline`.
    

---

**Please investigate tool‐tag parsing logic, context‐window configuration, and task‐prompt retention in the local extension to align behavior with provider integration.**

See the discord thread for more details
[Discord thread](https://discord.com/channels/1349288496988160052/1384865704682983484)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local Models Fail to Invoke Tools & Mismanage Context Across Calls #927

Description

Problem Statement

Details Reported by Octopusman

Steps to Reproduce

Expected Behavior

Environment

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Local Models Fail to Invoke Tools & Mismanage Context Across Calls #927

Description

Description

Problem Statement

Details Reported by Octopusman

Steps to Reproduce

Expected Behavior

Environment

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions