Skip to content

[Feature]: Consolidate performance benchmark datasets #13351

@ywang96

Description

@ywang96

🚀 The feature, motivation and pitch

On vLLM we have two main benchmark scripts (benchmark_throughput.py and benchmark_serving.py) to measure the performance of vLLM.

However, the dataset sampling functions are defined within each script itself and over time it'll be hard to maintain these and to add new datasets to both scripts as we want to have the flexibility to run benchmark on different datasets.

Alternatives

Ideally the dataset sampling should be defined in a separate file (e.g, benchmark_dataset.py) where we define the sampling functions for different datasets (sharegpt, sonnet, random, vision arena, etc), and the benchmark scripts themselves can simply import from benchmark_dataset depending on which dataset is specified at command line.

This modularization brings us a number of benefits:

  • Ensure the alignment of dataset sampling between the two benchmarks in case we want to compare the performance between online serving and offline inference.
  • Ease the process of adding new types of benchmark datasets.
  • Open up the opportunity to support user-defined custom datasets as long as they conform to a format that we pre-define.

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions