[Feature]: log download time and model weights load time separately

### 🚀 The feature, motivation and pitch

As someone trying to understand the components of latency in vLLM, I would like the logging from vLLM to distinguish between (a) the time to download a model from HuggingFace (or wherever) into the local filesystem and (b) the time to load the model weights from the local filesystem. These are two steps done in series, right?

Following is an example of the logging that I got from release 0.7.2 in V1 mode; I do not see how to distinguish these two components of latency.

```
INFO 02-07 19:23:56 gpu_model_runner.py:867] Starting to load model ibm-granite/granite-3.0-3b-a800m-instruct...
INFO 02-07 19:23:56 cuda.py:158] Using Flash Attention backend on V1 engine.
WARNING 02-07 19:23:56 topk_topp_sampler.py:46] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 02-07 19:23:57 weight_utils.py:252] Using model weights format ['*.safetensors']
model-00002-of-00002.safetensors: 100%|█████████████████████████████████| 1.75G/1.75G [00:56<00:00, 30.8MB/s]
model-00001-of-00002.safetensors: 100%|█████████████████████████████████| 5.00G/5.00G [02:47<00:00, 29.8MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████| 25.6k/25.6k [00:00<00:00, 1.87MB/s]
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 119.74it/s]

INFO 02-07 19:26:48 gpu_model_runner.py:872] Loading model weights took 6.1506 GB
```


### Alternatives

I suppose that I could get nearly the same thing by using vLLM twice, once to download and load weights and once to just load the weights. But that seems like a lot more trouble than just getting a useful log line.

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: log download time and model weights load time separately #12916

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: log download time and model weights load time separately #12916

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions