server : fix handling of the ignore_eos flag #14710

ggerganov · 2025-07-16T04:51:23Z

Remove slot_params.ignore_eos
When ignore_eos is passed in the request, we ignore all EOG tokens by adding -INF logit bias to the sampler

ggml-ci

JohannesGaessler

The changes seem correct to me and I can confirm that I am now getting the expected numbers of tokens when setting ignore_eos.

slaren · 2025-07-16T09:59:34Z

tools/server/server.cpp

+                for (llama_token i = 0; i < llama_vocab_n_tokens(vocab); i++) {
+                    if (llama_vocab_is_eog(vocab, i)) {
+                        //SRV_DBG("%s: added %s logit bias = %f\n", __func__, common_token_to_piece(ctx, i).c_str(), -INFINITY);
+                        params.sampling.logit_bias.push_back({i, -INFINITY});
+                    }
+                }


If this is done for every token during generation, I suspect it is going to have a significant performance impact.

It's done once per completion request, at the beginning, upon processing the input json parameters.

If performance is a concern we could maybe provide a list of EoG tokens to iterate over instead of iterating over all tokens and checking whether each one is EoG. Although I think iterating over all tokens once per request is going to be negligible vs. iterating over all tokens once per generated token as is being done for sampling.

In my system this takes about 0.3 ms for a 150k vocab model, so I suppose it is not that bad.

I fixed this anyway: #14721

server : fix handling of the ignore_eos flag

9f8d285

ggml-ci

ggerganov requested a review from ngxson as a code owner July 16, 2025 04:51

github-actions bot added examples server labels Jul 16, 2025

ggerganov mentioned this pull request Jul 16, 2025

scripts: synthetic prompt mode for server-bench.py #14695

Merged

JohannesGaessler approved these changes Jul 16, 2025

View reviewed changes

ggerganov merged commit 538cc77 into master Jul 16, 2025
52 of 56 checks passed

slaren reviewed Jul 16, 2025

View reviewed changes

ggerganov mentioned this pull request Jul 16, 2025

server : pre-calculate EOG logit biases #14721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : fix handling of the ignore_eos flag #14710

server : fix handling of the ignore_eos flag #14710

Uh oh!

ggerganov commented Jul 16, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

slaren Jul 16, 2025

Uh oh!

ggerganov Jul 16, 2025

Uh oh!

JohannesGaessler Jul 16, 2025

Uh oh!

slaren Jul 16, 2025

Uh oh!

ggerganov Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

server : fix handling of the ignore_eos flag #14710

server : fix handling of the ignore_eos flag #14710

Uh oh!

Conversation

ggerganov commented Jul 16, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slaren Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

slaren Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants