Skip to content

StatelessExecutor and/or grammar perf? #1099

@phil-scott-78

Description

@phil-scott-78

Description

Working on a gbnf generation library, and the perf started bugging me. Once I started getting some larger outputs of json the perf difference started getting really slow.

For example, when I run this command against the llama.cpp cli, I'm seeing total time = 3374.20 ms / 462 tokens

.\llama-cli -m B:\models\Qwen2.5-Coder-3B-Instruct-Q8_0.gguf --grammar-file B:\llama-src\LLamaSharp\LLama.Examples\Assets\json.gbnf -ngl 48 -no-cnv --prompt "give me a list of all nfl teams in the afc. include their team, city and state. group by division in json format"

Running the grammar example with the same prompt with the same model and the parameters tweaked to match the CLI (context, gpu layers), and I'm looking at about 16s to run the prompt.

I'm not 100% sure if it is related to grammar sampling or the StatelessExecutor though, and I'll sheepishly admit I'm just pulling levers here trying to track down the culprit. A bit of guidance to a culprit and I can get after it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions