Skip to content

Conversation

@knlnguyen1802
Copy link
Contributor

@knlnguyen1802 knlnguyen1802 commented Nov 13, 2025

Purpose

This pull request extends the functionality introduced in #24295, which added support for colocated fast weight updates using IPC handles and ZeroMQ.

In the existing implementation, only a single IPC handle is supported, and the communication pattern uses blocking operations (zmq.REQ / zmq.REP).
This PR enhances that capability by introducing:

  • Support for multiple IPC handles, enabling parallel and scalable weight updates.
  • Non-blocking communication via zmq.POLL and zmq.ROUTER, improving concurrency and throughput for RLHF or multi-agent workloads.

These upgrades improve performance and flexibility in colocated reinforcement learning pipelines and low-latency model update scenarios.


Implementation Overview

  • Using the non-blocking ZeroMQ pattern (ROUTER/POLLIN)
  • Added configuration options to allow users to specify the number of IPC handles.
  • Updated the weight synchronization logic for multi-handle parallelism.

Test Plan

  1. Run colocated RLHF example

    cd examples/offline_inference
    python3 rlhf_colocate.py  --num-ipc-handles {num_ipc_handles}
    

Default value of num_ipc_handles is 1 ipc handles which still use the old code


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

cc: @youkaichao

@mergify
Copy link

mergify bot commented Nov 13, 2025

Documentation preview: https://vllm--28607.org.readthedocs.build/en/28607/

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 13, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends functionality for fast weight updates by adding support for multiple IPC handles and non-blocking communication with ZeroMQ. The changes look good overall, introducing an asynchronous update path alongside the existing synchronous one. However, I've found two critical issues that should be addressed. First, the example script rlhf_colocate.py now contains a hardcoded local model path, which makes it non-portable and difficult for others to run. Second, rlhf_utils.py incorrectly imports Dict and Tuple from the ast module for type hinting, which is a bug. These issues should be fixed before merging.

@knlnguyen1802
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for asynchronous weight updates using multiple IPC handles and ZeroMQ, which is a great enhancement for RLHF and multi-agent workloads. The implementation correctly uses non-blocking ZMQ patterns (DEALER/ROUTER with POLLER) for improved concurrency. The code is well-structured, with the logic split between the training actor (sender) and the worker extension (receiver). I've found one issue regarding an incorrect import that should be addressed.

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
import gc
from ast import Dict, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The import from ast import Dict, Tuple is incorrect. The ast module is for working with Abstract Syntax Trees, not for type hinting. You should import Dict and Tuple from the typing module for type annotations.

Suggested change
from ast import Dict, Tuple
from typing import Dict, Tuple

@heheda12345 heheda12345 requested a review from 22quinn November 14, 2025 08:30
@knlnguyen1802
Copy link
Contributor Author

@youkaichao @22quinn Please help review. Thanks.

@knlnguyen1802
Copy link
Contributor Author

knlnguyen1802 commented Nov 19, 2025

Here is result when run facebook/opt-125M
Original code
Weights updated completed in 2.09 seconds.
Original code using 4 times buffer size
Weights updated completed in 1.59 seconds
Using --num-ipc-handles=4
Weights updated completed in 0.39 seconds

Signed-off-by: knlnguyen1802 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant