Skip to content

Conversation

phil-scott-78
Copy link
Contributor

I started getting -3 values when doing batch operations upon calling Infer on the batched executor. I was checking for NoKvSlot, but it turns out -3 aligns to "failed to prepare ubatch" when calling find_slot. I believe these values first appeared with this commit two weeks ago, but that's with, admittedly, limited research.

I'd like a better enum value, but unfortunately, they also are using -3 for GGML_STATUS_FAILED results when calling graph_compute, so the value has a double meaning. So, I'm definitely open to other names for the value than what I have here, but at a loss at what they could be.

@martindevans
Copy link
Member

Thanks for this. It looks like the doc comments in the header with the errors codes wasn't updated, so I missed these changes when upgrading.

Do you have any idea how to handle these new return values? If possible I'd like to add something to the doc comments indicating what should be done when you receive this result. I just had a quick dig through llama.cpp and I can't see anything explaining it, so if you don't know either I'm happy to merge as-is.

If you're interesting in contributing upstream, it would be great to submit a PR upstream which changes one of the two -3 errors to a new value (and also updating the docs in the header here).

@phil-scott-78
Copy link
Contributor Author

So I'm treating -3 with the ubatch like a NoKvSlot error in my code for batched execution, which is based heavily on some suggestions you've made to me on handling out of memory which was working great. I'm more or less just going wild adding conversations until I hit a KvSlot error, then rolling that last one back and requeuing it for later, treating that as a theoretical maximum rather than hardcoding a batch size. The issue was with this upgrade I was only checking for three paths - Error, NoKvSlot and Ok. The -3 left me in an endless loop of calling Infer and failing with the finding a slot for a new batch. Treating this the same, rollback and continue, got me moving forward. I obviously probably want to check my settings around slots better, but looks like that oversight at least found this.

I'm just playing around right now, but here's my code with the current "solution" until I get the explicit enums to check
https://github.com/phil-scott-78/RedPajama/blob/main/RedPajama.ConsoleTest/BatchedExecutor.cs#L75

@martindevans martindevans requested a review from Copilot March 30, 2025 14:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR expands the DecodeResult enum to better capture different failure modes encountered during batch operations and decoding. It introduces new values for compute abortion, memory allocation failures, and general decode failures.

  • Introduces ComputeAborted for canceled computations.
  • Adds AllocationFailed to denote memory-related issues.
  • Adds DecodeFailed to cover general decode errors.
Comments suppressed due to low confidence (2)

LLama/Native/DecodeResult.cs:26

  • [nitpick] The new enum value 'ComputeAborted' is assigned a positive value while other error indicators use negatives; consider using a negative value or renaming it (e.g., 'OperationAborted') to clearly denote an error state.
ComputeAborted = 2,

LLama/Native/DecodeResult.cs:36

  • [nitpick] The name 'DecodeFailed' is quite general; if possible, consider clarifying the context of the failure to help differentiate it from other potential decode issues.
DecodeFailed = -3,

@martindevans
Copy link
Member

Oops, I didn't mean to request that review from Copilot. That's a pretty cool new feature though :O

@martindevans martindevans merged commit ef31ec6 into SciSharp:master Mar 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants