[Feature]: Support loading of sharded vLLM serialized models with Tensorizer

### 🚀 The feature, motivation and pitch

PR https://github.com/vllm-project/vllm/pull/3476 added support for loading models with Tensorizer, but has the limitation that it does not support loading a sharded vllm-serialized model to multiple GPUs (see [this verification check](https://github.com/vllm-project/vllm/blob/c3af44722cff56bba5fc912c8e16d9de02dfb532/vllm/model_executor/model_loader/tensorizer.py#L81-L87)). Use of sharded models would also benefit from the faster loading and encryption provided by Tensorizer.

[This issue open with Tensorizer](https://github.com/coreweave/tensorizer/issues/81#issuecomment-1935224826) suggests a couple of approaches to support sharding. With tensor-parallel models, the model is split across the GPUs and the suggestion is to serialize each shard separately.

I have prototyped this approach of splitting the vllm-tensorized model into multiple shards and am working on a PR.

### Alternatives

The alternative given in [the Tensorizer issue](https://github.com/coreweave/tensorizer/issues/81#issuecomment-1935224826) is to deserializing Tensors to CPU memory and then send the tensors to the GPUs. This would decouple the serialization of the model from the sharding configuration, but would also be less efficient.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support loading of sharded vLLM serialized models with Tensorizer #4957

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support loading of sharded vLLM serialized models with Tensorizer #4957

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions