-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Bugfix][Mamba] Fix MambaCache leak #14820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@tlrmchlsmth is the expert on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge in latest main. I think the bug in in async_llm_engine.py
was fixed in #13454
Could you please take a look to see if that fixes the issue and if so, let's apply a consistent fix to llm_engine.py
@tlrmchlsmth I confirmed that this leak has been fixed in async engine in the latest main. Let's apply to llm engine. |
@tlrmchlsmth , I forgot to test with Here is a test script to reproduce
code
|
Signed-off-by: Sixue(Cecil) Wang <[email protected]>
Since V0 is deprecated and this PR is for the V0 engine, I'm going to close this. If this change should also be made to V1, feel free to update and re-open this PR, or create a new one for V1. |
When
MambaCacheManager
is full, the following error may occur at this line:This happens because when a
seq_group
reachesmax_model_len
, it is moved toscheduler._async_stopped
instead ofscheduler._finished_requests_ids
, as seen in this line. As a result,MambaCacheManager
cannot immediately release theseq_group
inscheduler._async_stopped
at the current step, leading to this issue.However, this does not cause a memory leak, as
scheduler.free_finished_seq_groups
will eventually move_async_stopped
to_finished_requests_ids
after model execution, allowing it to be freed in the next step.FIX #10693,
#13129