[Feature]: Consolidate performance benchmark datasets

### 🚀 The feature, motivation and pitch

On vLLM we have two main benchmark scripts ([benchmark_throughput.py](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_throughput.py) and [benchmark_serving.py](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py)) to measure the performance of vLLM. 

However, the dataset sampling functions are defined within each script itself and over time it'll be hard to maintain these and to add new datasets to both scripts as we want to have the flexibility to run benchmark on different datasets.

### Alternatives

Ideally the dataset sampling should be defined in a separate file (e.g, `benchmark_dataset.py`) where we define the sampling functions for different datasets (sharegpt, sonnet, random, vision arena, etc), and the benchmark scripts themselves can simply import from benchmark_dataset depending on which dataset is specified at command line. 

This modularization brings us a number of benefits:
- Ensure the alignment of dataset sampling between the two benchmarks in case we want to compare the performance between online serving and offline inference.
- Ease the process of adding new types of benchmark datasets.
- Open up the opportunity to support user-defined custom datasets as long as they conform to a format that we pre-define.

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Consolidate performance benchmark datasets #13351

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Consolidate performance benchmark datasets #13351

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions