Skip to content

[RFC]: Deprecation and removal for --engine-use-ray #7045

@simon-mo

Description

@simon-mo

Motivation.

In the async_engine code path, we have an option to launch the engine in a separate process using Ray

        parser.add_argument('--engine-use-ray',
                            action='store_true',
                            help='Use Ray to start the LLM engine in a '
                            'separate process as the server process

Originally, the option make it possible to separate the server's Python overhead with the engine's main scheduler loop.

However, few factors made this unused/less popular

  • Ray is an optional component, and typically not used in single node environment.
  • The serialization and rpc typically offset the theoretical performance gain
  • There are typically other ways to isolate server and engine (through multiprocessing, threading, etc).
  • Recently, we are separating this in server using lower overhead approaches [ Frontend ] Multiprocessing for OpenAI Server with zeromq #6883

Proposed Change.

Deprecation of the flag with warning for one release.
Removal of the flag given no major pushbacks.

Feedback Period.

1wk

CC List.

No response

Any Other Things.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions