-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Open
Labels
Description
Name and Version
version: 6661 (1fe4e38)
built with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin24.6.0
Operating systems
Mac
GGML backends
Metal
Hardware
MacBook Pro M4 Max
Models
ibm-granite/granite-4.0-h-small-GGUF:Q4_K_M
Problem description & steps to reproduce
Sometimes I switch to other chats while generation is ongoing in one chat. With the new Web UI, switching to another chat causes generation to stop part way through the response in the previous chat I was on. In the old Web UI, it would complete generation in the other chat in the background.
Reproduction steps:
- Request something that requires a long response in one chat
- While response generation is ongoing, switch to a different (existing?) chat
- Observe that response generation stopped part way through in the chat you left
First Bad Commit
No response
Relevant log output
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
srv params_from_: Chat format: Hermes 2 Pro
common_sampler_types_from_names: unable to match sampler by name 'edkypmxt'
slot get_availabl: id 1 | task 1583 | selected slot by lcs similarity, lcs_len = 2793, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id 1 | task 2749 | processing task
slot update_slots: id 1 | task 2749 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 2819
slot update_slots: id 1 | task 2749 | kv cache rm [2793, end)
slot update_slots: id 1 | task 2749 | prompt processing progress, n_past = 2819, n_tokens = 26, progress = 0.009223
slot update_slots: id 1 | task 2749 | prompt done, n_past = 2819, n_tokens = 26
srv cancel_tasks: cancel task, id_task = 2749
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv log_server_r: request: GET /props 127.0.0.1 200
slot release: id 1 | task 2749 | stop processing: n_past = 2856, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: request: GET /props 127.0.0.1 200