Skip to content

[RFC]: Replaceable Scheduler #7123

@NadavShmayo

Description

@NadavShmayo

Motivation.

The default scheduler is functioning well for the basic use case of serving with maximum throughput.
There are still some use cases in which we prioritize other metrics before maximum throughput, for example maintaining fairness between different users.

I specifically have a use case in which I have an application that uses vLLM, and tries to maintain fairness between requests of different users of the application.
By making the scheduler component more abstract and replaceable (perhaps also pluginable) we can allow such use case without having to change the scheduler logic to support each of these use cases.

Proposed Change.

I propose 2 different solutions, one of which may be hard to implement, but allows anyone to implement any scheduling logic they wish without changing any other core logic. The other is simple to implement but doesn't allow full control of the scheduler logic, and the other may be harder to implement but .

Solution 1 - Scheduler plugins

This solution requires defining an abstract base class of a scheduler, and allowing to pass the desired scheduler implementation file path as a CLI argument (or an environment variable).
This idea could also serve as the basis of scheduler plugins - meaning anyone could implement their own scheduler as a package separate from core vLLM, which allows for great extensibility and modularity.

Solution 2 - Support voluntary preemption hooks

This solution is less flexible but should still allow support for most scheduling logic.
This solution means that the Scheduler class should expose public methods for preempt/suspend and resume a SequenceGroup, and then the API can add routes to expose these methods.
This way we allow applications wrapping vLLM to implement their own complex scheduling logic, to give each user it's fair share of scheduling, or any other desired scheduling logic.

Feedback Period.

No response

CC List.

No response

Any Other Things.

Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions