-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[RL] Support weight update with multi ipc handles + zmq #28607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: knlnguyen1802 <[email protected]>
|
Documentation preview: https://vllm--28607.org.readthedocs.build/en/28607/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends functionality for fast weight updates by adding support for multiple IPC handles and non-blocking communication with ZeroMQ. The changes look good overall, introducing an asynchronous update path alongside the existing synchronous one. However, I've found two critical issues that should be addressed. First, the example script rlhf_colocate.py now contains a hardcoded local model path, which makes it non-portable and difficult for others to run. Second, rlhf_utils.py incorrectly imports Dict and Tuple from the ast module for type hinting, which is a bug. These issues should be fixed before merging.
Signed-off-by: knlnguyen1802 <[email protected]>
Signed-off-by: knlnguyen1802 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for asynchronous weight updates using multiple IPC handles and ZeroMQ, which is a great enhancement for RLHF and multi-agent workloads. The implementation correctly uses non-blocking ZMQ patterns (DEALER/ROUTER with POLLER) for improved concurrency. The code is well-structured, with the logic split between the training actor (sender) and the worker extension (receiver). I've found one issue regarding an incorrect import that should be addressed.
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| import gc | ||
| from ast import Dict, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@youkaichao @22quinn Please help review. Thanks. |
Signed-off-by: knlnguyen1802 <[email protected]>
|
Here is result when run facebook/opt-125M |
Signed-off-by: knlnguyen1802 <[email protected]>
Purpose
This pull request extends the functionality introduced in #24295, which added support for colocated fast weight updates using IPC handles and ZeroMQ.
In the existing implementation, only a single IPC handle is supported, and the communication pattern uses blocking operations (
zmq.REQ/zmq.REP).This PR enhances that capability by introducing:
zmq.POLLandzmq.ROUTER, improving concurrency and throughput for RLHF or multi-agent workloads.These upgrades improve performance and flexibility in colocated reinforcement learning pipelines and low-latency model update scenarios.
Implementation Overview
ROUTER/POLLIN)Test Plan
Run colocated RLHF example
cd examples/offline_inference python3 rlhf_colocate.py --num-ipc-handles {num_ipc_handles}Default value of num_ipc_handles is 1 ipc handles which still use the old code
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.cc: @youkaichao