[Bugfix][Mamba] Fix MambaCache leak #14820

Cecilwang · 2025-03-14T14:17:27Z

When MambaCacheManager is full, the following error may occur at this line:

IndexError: pop from empty list

This happens because when a seq_group reaches max_model_len, it is moved to scheduler._async_stopped instead of scheduler._finished_requests_ids, as seen in this line. As a result, MambaCacheManager cannot immediately release the seq_group in scheduler._async_stopped at the current step, leading to this issue.

However, this does not cause a memory leak, as scheduler.free_finished_seq_groups will eventually move _async_stopped to _finished_requests_ids after model execution, allowing it to be freed in the next step.

FIX #10693, ~~#13129~~

github-actions · 2025-03-14T14:17:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

youkaichao · 2025-03-22T02:38:31Z

@tlrmchlsmth is the expert on this.

tlrmchlsmth

Please merge in latest main. I think the bug in in async_llm_engine.py was fixed in #13454

Could you please take a look to see if that fixes the issue and if so, let's apply a consistent fix to llm_engine.py

Cecilwang · 2025-04-16T10:00:04Z

Please merge in latest main. I think the bug in in async_llm_engine.py was fixed in #13454

Could you please take a look to see if that fixes the issue and if so, let's apply a consistent fix to llm_engine.py

@tlrmchlsmth I confirmed that this leak has been fixed in async engine in the latest main. Let's apply to llm engine.

Cecilwang · 2025-04-17T02:37:57Z

@tlrmchlsmth , I forgot to test with enable-chunked-prefill.
#13129 this issue has been fixed with and without enable-chunked-prefill.
However, the bug described in this PR has not been fixed yet.

Here is a test script to reproduce
cmd:

python debug_mamba_cache.py --model state-spaces/mamba-130m-hf --enable-chunked-prefill --gpu-memory-utilization 0.1 --max-model-len 128 --max-num-seqs 2

code debug_mamba_cache.py:

"""Benchmark offline inference throughput."""
import argparse
from typing import Any, Optional, Union

import uvloop
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
from vllm.entrypoints.openai.api_server import (
    build_async_engine_client_from_engine_args,
)
from vllm.inputs import TokensPrompt
from vllm.utils import FlexibleArgumentParser, merge_async_iterators


async def run_vllm_async(
    engine_args: AsyncEngineArgs,
):
    from vllm import SamplingParams

    async with build_async_engine_client_from_engine_args(
        engine_args, True
    ) as llm:
        # Add the requests to the engine.
        prompts: list[TokensPrompt] = [
            TokensPrompt(prompt_token_ids=[0 for i in range(10)]),
            TokensPrompt(prompt_token_ids=[0 for i in range(110)]),
            TokensPrompt(prompt_token_ids=[0 for i in range(10)]),
        ]
        sampling_params: SamplingParams = SamplingParams(
            n=1,
            temperature=1.0,
            top_p=1.0,
            ignore_eos=True,
            max_tokens=100,
        )

        generators = []
        for i, prompt in enumerate(prompts):
            generator = llm.generate(
                prompt, sampling_params, request_id=f"test{i}"
            )
            generators.append(generator)
        all_gens = merge_async_iterators(*generators)
        async for i, res in all_gens:
            pass


def main(args: argparse.Namespace):
    uvloop.run(
        run_vllm_async(
            AsyncEngineArgs.from_cli_args(args),
        )
    )


if __name__ == "__main__":
    parser = FlexibleArgumentParser(description="Benchmark the throughput.")
    parser = AsyncEngineArgs.add_cli_args(parser)
    args = parser.parse_args()
    main(args)

Signed-off-by: Sixue(Cecil) Wang <[email protected]>

hmellor · 2025-07-16T14:35:08Z

Since V0 is deprecated and this PR is for the V0 engine, I'm going to close this.

If this change should also be made to V1, feel free to update and re-open this PR, or create a new one for V1.

Cecilwang requested review from alexm-redhat, comaniac, njhill, youkaichao and zhuohan123 as code owners March 14, 2025 14:17

youkaichao requested a review from tlrmchlsmth March 22, 2025 02:38

Cecilwang force-pushed the main branch from 8722deb to f4b7f96 Compare April 10, 2025 03:07

Cecilwang changed the title ~~[Bugfix][Mamba] Fix IndexError When MambaCacheManager is Full~~ [Bugfix][Mamba] Fix MambaCache leak Apr 15, 2025

tlrmchlsmth reviewed Apr 15, 2025

View reviewed changes

[Bugfix][Mamba] Fix finished_requests_ids of Mamba-like models

52a2bb0

Signed-off-by: Sixue(Cecil) Wang <[email protected]>

Cecilwang force-pushed the main branch from 41a7203 to 52a2bb0 Compare April 17, 2025 02:44

hmellor closed this Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Mamba] Fix MambaCache leak #14820

[Bugfix][Mamba] Fix MambaCache leak #14820

Uh oh!

Cecilwang commented Mar 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

youkaichao commented Mar 22, 2025

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Cecilwang commented Apr 16, 2025

Uh oh!

Cecilwang commented Apr 17, 2025 •

edited

Loading

Uh oh!

hmellor commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix][Mamba] Fix MambaCache leak #14820

[Bugfix][Mamba] Fix MambaCache leak #14820

Uh oh!

Conversation

Cecilwang commented Mar 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

youkaichao commented Mar 22, 2025

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Cecilwang commented Apr 16, 2025

Uh oh!

Cecilwang commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cecilwang commented Mar 14, 2025 •

edited by github-actions bot

Loading

Cecilwang commented Apr 17, 2025 •

edited

Loading