Fix crash on prefill-after-preemption: conditionally serialize token_ids #103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi there. Since this is my first PR on github—feedback is very welcome. Thanks in advance for your review!
This PR fixes a crash that occurs when a sequence is preempted during decode and later re-enters the prefill path. In that scenario worker ranks hit:
The root cause is that our cross-process
Sequence
serialization intentionally avoided sendingtoken_ids
after decode began, but prefill-after-preemption does need them.Problem Reproduction
I use the following script to make this failure mode easy to reproduce and to sanity-check multi-GPU runs. The script drives a large batch and long prompts to increase the chance of preemption and re-prefill.
And then:
Then it would crash and we would see the AttributeError above.
Following the nanovllm style, I aimed for a minimal, targeted fix; happy to iterate based on your feedback.