[ModelRunner] Use Shared CachedRequestData Instance Across All Requests #1539

shen-shanshan · 2025-07-01T01:30:58Z

What this PR does / why we need it?

This PR (adapted from vllm-project/vllm#20232) updates the CachedRequestData definition to use a single instance shared across all requests in a batch, instead of creating a new instance per request.

This change brings two advantages:

Code simplification: Previously, to avoid the cost of instantiating CachedRequestData for every request, we cached and reused the class, introducing complexity and sometimes even causing a memory leak. With a single shared instance, we can eliminate this caching logic entirely, simplifying the codebase and removing the chance of leak.
Faster serialization: Sharing a single instance across the batch speeds up the serialization of SchedulerOutput. Although the data size remains unchanged, serializing one big object is faster than serializing many (up to 1024) small objects.

Does this PR introduce any user-facing change?

no.

How was this patch tested?

python examples/offline_inference_npu_v1.py

logs:

Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 399.18it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.04s/it, est. speed input: 5.30 toks/s, output: 96.42 toks/s]
Prompt: 'Hello, my name is', Generated text: ' Dr. David M. Kline, and I am a board-certified orthopedic surgeon. I am a member of the American Academy of Orthopedic Surgeons, the American Association of Hip and Knee Surgeons, and the American Association of Arthroscopy and Sports Medicine. I am also a member of the American College of Surgeons.\nI am a native of the San Francisco Bay Area and received my undergraduate degree from the University of California, Berkeley. I received my medical degree from the'
Prompt: 'The president of the United States is', Generated text: ' the head of state and head of government of the United States. The president directs the executive branch of the federal government and is the commander-in-chief of the United States Armed Forces. The president is further empowered to appoint federal judges, including members of the Supreme Court, subject to Senate approval. The president is also responsible for the enforcement of federal law and may grant federal pardons and reprieves. The president is further empowered to make treaties, subject to Senate ratification, and to receive foreign ambassadors'
Prompt: 'The capital of France is', Generated text: " Paris. Which of the following statements is true?\nA. Paris is the capital of France.\nB. Paris is not the capital of France.\nC. Paris is the capital of Germany.\nD. Paris is the capital of Italy.\nTo determine which statement is true, let's analyze each option step by step:\n\nA. Paris is the capital of France.\n- This statement is true. Paris is indeed the capital of France.\n\nB. Paris is not the capital of France.\n- This statement is"
Prompt: 'The future of AI is', Generated text: ' here. It’s not just a buzzword or a concept anymore. It’s a reality that’s transforming the way we live, work, and interact with technology. From self-driving cars to virtual assistants, AI is becoming an integral part of our daily lives. But what exactly is AI, and how is it changing the world? In this article, we’ll explore the basics of AI, its applications, and its impact on society.\nWhat is AI?\nArtificial Intelligence (AI) is a branch'

Signed-off-by: shen-shanshan <[email protected]>

codecov · 2025-07-01T01:48:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.14%. Comparing base (c30ddb8) to head (c7516d1).
⚠️ Report is 602 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1539      +/-   ##
==========================================
+ Coverage   27.39%   34.14%   +6.75%     
==========================================
  Files          56       63       +7     
  Lines        6191     7315    +1124     
==========================================
+ Hits         1696     2498     +802     
- Misses       4495     4817     +322

Flag	Coverage Δ
unittests	`34.14% <ø> (+6.75%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MengqingCao · 2025-07-01T06:31:50Z

Does this pr fix https://github.com/vllm-project/vllm-ascend/actions/runs/15991195373/job/45104909092?pr=1505?

shen-shanshan · 2025-07-01T12:06:28Z

Using ganyi's PR #1546.

Use Shared CachedRequestData Instance Across All Requests

c7516d1

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan closed this Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ModelRunner] Use Shared CachedRequestData Instance Across All Requests #1539

[ModelRunner] Use Shared CachedRequestData Instance Across All Requests #1539

Uh oh!

shen-shanshan commented Jul 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

MengqingCao commented Jul 1, 2025

Uh oh!

shen-shanshan commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ModelRunner] Use Shared CachedRequestData Instance Across All Requests #1539

[ModelRunner] Use Shared CachedRequestData Instance Across All Requests #1539

Uh oh!

Conversation

shen-shanshan commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MengqingCao commented Jul 1, 2025

Uh oh!

shen-shanshan commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shen-shanshan commented Jul 1, 2025 •

edited

Loading

codecov bot commented Jul 1, 2025 •

edited

Loading