-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[misc] Add Torch profiler support #7451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ready |
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
comaniac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. This is super useful!
comaniac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
added docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use a more common path as an example, such as $HOME/traces/ or /tmp/traces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest cover offline batching as well. It should be even easier by setting the environment variable before creating the engine.
rkooo567
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, we have to manually trigger start/stop profile.
actually, why don't we add an env var like TORCH_PROFILER_NUM_REQUESTS or something. -1 by default (meaning it is infinite). If it is N, we trigger stop_profile after N requests? This could be useful with throughput benchmark (e.g., you can just do first N batch of requests)
benchmarks/backend_request_func.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of deleting it we add another condition for start_profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the naming of this function should be changed instead.
|
this might be a late review, but do you think it is possible to move this as a plugin? |
It might be a good idea. Is there related document/resource? |
|
plugin system just landed in #7426 . docs will come later, but you can be early users. |
|
@SolitaryThinker What is your plan on this ? Do you already have this setup as a plugin in a different PR ? I was looking to build on this - hence checking. |
|
I'll take a look at plugin system and update this PR today. |
comaniac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying the plugin system, I feel this is not a good timing to plugin this feature because the plugin system currently doesn't support the following reasons:
- This feature needs to extend API server with new APIs.
- This feature needs to extend engine APIs.
With these limitations, the plugin version of this PR is more like a hacky patch, which I don't think is a good practice for "plugins". Ideally we should improve the plugin design to have various of plugin APIs such as plugin.plugin_api_server(...) to clearly define the plugin scope with proper "plugin" behaviors.
Accordingly, this PR should be good to go. cc @rkooo567 @sfc-gh-mkeralapura @youkaichao
benchmarks/backend_request_func.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the naming of this function should be changed instead.
5969fd4 to
18799de
Compare
format remove print Update vllm/worker/worker.py Co-authored-by: Cody Yu <[email protected]> comments add docs Update vllm/worker/worker.py Co-authored-by: Cody Yu <[email protected]> Update vllm/entrypoints/openai/api_server.py Co-authored-by: Cody Yu <[email protected]> Update benchmarks/benchmark_serving.py Co-authored-by: Cody Yu <[email protected]> Update vllm/envs.py Co-authored-by: Cody Yu <[email protected]>
18799de to
d754eda
Compare
|
Hi @SolitaryThinker , thanks for the great work. How can we start/stop profiling for a real use case (not through benchmark_serving.py)? |
|
Here is the follow up for CPU-only devices #7806. |
|
I encounter this error as follow, do you know why? INFO 08-27 03:15:55 api_server.py:322] Stopping profiler... |
|
To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100. Set the env variable VLLM_RPC_GET_DATA_TIMEOUT_MS to a big number before you start the server. Say something like 30 minutes. This theoretically affects the regular requests too, but the profiling works with it. |
|
@sfc-gh-mkeralapura thanks, super useful tip, will add to the docs @DamonFool You would need a client that can ping the |
Co-authored-by: Cody Yu <[email protected]> Signed-off-by: Alvant <[email protected]>
Co-authored-by: Cody Yu <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>
Utility PR to add torch profiler support to vllm worker, openai client, and benchmark_serving.py.
Enabled and configured through env vars:
VLLM_TORCH_PROFILER_DIR=/mnt/traces/Will be useful for performance tuning and save me from always manually patching it in for #7000.
For example of what trace looks like see #6854
Traces can be viewed at https://ui.perfetto.dev/ and no need to untar.
Example commands:
cc @Yard1 @comaniac