[RL] Support weight update with multi ipc handles + zmq #28607

knlnguyen1802 · 2025-11-13T02:29:02Z

Purpose

This pull request extends the functionality introduced in #24295, which added support for colocated fast weight updates using IPC handles and ZeroMQ.

In the existing implementation, only a single IPC handle is supported, and the communication pattern uses blocking operations (zmq.REQ / zmq.REP).
This PR enhances that capability by introducing:

✅ Support for multiple IPC handles, enabling parallel and scalable weight updates.
✅ Non-blocking communication via zmq.POLL and zmq.ROUTER, improving concurrency and throughput for RLHF or multi-agent workloads.

These upgrades improve performance and flexibility in colocated reinforcement learning pipelines and low-latency model update scenarios.

Implementation Overview

Using the non-blocking ZeroMQ pattern (ROUTER/POLLIN)
Added configuration options to allow users to specify the number of IPC handles.
Updated the weight synchronization logic for multi-handle parallelism.

Test Plan

Run colocated RLHF example

cd examples/offline_inference
python3 rlhf_colocate.py  --num-ipc-handles {num_ipc_handles}

Default value of num_ipc_handles is 1 ipc handles which still use the old code

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

cc: @youkaichao

Signed-off-by: knlnguyen1802 <[email protected]>

mergify · 2025-11-13T02:29:45Z

Documentation preview: https://vllm--28607.org.readthedocs.build/en/28607/

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

examples/offline_inference/rlhf_utils.py

gemini-code-assist

Code Review

This pull request extends functionality for fast weight updates by adding support for multiple IPC handles and non-blocking communication with ZeroMQ. The changes look good overall, introducing an asynchronous update path alongside the existing synchronous one. However, I've found two critical issues that should be addressed. First, the example script rlhf_colocate.py now contains a hardcoded local model path, which makes it non-portable and difficult for others to run. Second, rlhf_utils.py incorrectly imports Dict and Tuple from the ast module for type hinting, which is a bug. These issues should be fixed before merging.

examples/offline_inference/rlhf_colocate.py

examples/offline_inference/rlhf_utils.py

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 · 2025-11-13T03:25:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for asynchronous weight updates using multiple IPC handles and ZeroMQ, which is a great enhancement for RLHF and multi-agent workloads. The implementation correctly uses non-blocking ZMQ patterns (DEALER/ROUTER with POLLER) for improved concurrency. The code is well-structured, with the logic split between the training actor (sender) and the worker extension (receiver). I've found one issue regarding an incorrect import that should be addressed.

gemini-code-assist · 2025-11-13T03:26:38Z

examples/offline_inference/rlhf_utils.py

 # SPDX-License-Identifier: Apache-2.0
 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project
 import gc
+from ast import Dict, Tuple


The import from ast import Dict, Tuple is incorrect. The ast module is for working with Abstract Syntax Trees, not for type hinting. You should import Dict and Tuple from the typing module for type annotations.

Suggested change

from ast import Dict, Tuple

from typing import Dict, Tuple

knlnguyen1802 · 2025-11-17T01:43:33Z

@youkaichao @22quinn Please help review. Thanks.

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 · 2025-11-19T07:21:04Z

Here is result when run facebook/opt-125M
Original code
Weights updated completed in 2.09 seconds.
Original code using 4 times buffer size
Weights updated completed in 1.59 seconds
Using --num-ipc-handles=4
Weights updated completed in 0.39 seconds

Signed-off-by: knlnguyen1802 <[email protected]>

[RL] Multi IPC handles example for rlhf colocated

bdf34c1

Signed-off-by: knlnguyen1802 <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Nov 13, 2025

chatgpt-codex-connector bot reviewed Nov 13, 2025

View reviewed changes

examples/offline_inference/rlhf_utils.py Show resolved Hide resolved

examples/offline_inference/rlhf_utils.py Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

examples/offline_inference/rlhf_colocate.py Outdated Show resolved Hide resolved

examples/offline_inference/rlhf_utils.py Show resolved Hide resolved

knlnguyen1802 added 2 commits November 13, 2025 10:38

Revert model change

c594843

Signed-off-by: knlnguyen1802 <[email protected]>

[Fix] remove pickle and clean code

c64ec86

Signed-off-by: knlnguyen1802 <[email protected]>

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

heheda12345 requested a review from 22quinn November 14, 2025 08:30

Add estimate time for weight update

18b68fa

Signed-off-by: knlnguyen1802 <[email protected]>

Fix pre-commit

5cb2408

Signed-off-by: knlnguyen1802 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RL] Support weight update with multi ipc handles + zmq #28607

[RL] Support weight update with multi ipc handles + zmq #28607

knlnguyen1802 commented Nov 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Nov 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

knlnguyen1802 commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

knlnguyen1802 commented Nov 17, 2025

Uh oh!

knlnguyen1802 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[RL] Support weight update with multi ipc handles + zmq #28607

Are you sure you want to change the base?

[RL] Support weight update with multi ipc handles + zmq #28607

Conversation

knlnguyen1802 commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Implementation Overview

Test Plan

Uh oh!

mergify bot commented Nov 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

knlnguyen1802 commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 commented Nov 17, 2025

Uh oh!

knlnguyen1802 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

knlnguyen1802 commented Nov 13, 2025 •

edited by github-actions bot

Loading

knlnguyen1802 commented Nov 19, 2025 •

edited

Loading