StatelessExecutor and/or grammar perf?

### Description

Working on a gbnf generation library, and the perf started bugging me. Once I started getting some larger outputs of json the perf difference started getting really slow. 

For example, when I run this command against the llama.cpp cli, I'm seeing `total time =    3374.20 ms /   462 tokens`

```bash
.\llama-cli -m B:\models\Qwen2.5-Coder-3B-Instruct-Q8_0.gguf --grammar-file B:\llama-src\LLamaSharp\LLama.Examples\Assets\json.gbnf -ngl 48 -no-cnv --prompt "give me a list of all nfl teams in the afc. include their team, city and state. group by division in json format"
```

Running the grammar example with the same prompt with the same model and the parameters tweaked to match the CLI (context, gpu layers), and I'm looking at about 16s to run the prompt.

I'm not 100% sure if it is related to grammar sampling or the `StatelessExecutor` though, and I'll sheepishly admit I'm just pulling levers here trying to track down the culprit. A bit of guidance to a culprit and I can get after it. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StatelessExecutor and/or grammar perf? #1099

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StatelessExecutor and/or grammar perf? #1099

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions