-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Open
Description
Hi! I am trying to convert Transformer2DModel to onnx and cannot solve some problems.
I am trying to export UNet model with next architecture
Transformer2DModel(
(pos_embed): PatchEmbed(
(proj): Conv2d(96, 1584, kernel_size=(2, 2), stride=(2, 2))
)
(transformer_blocks): ModuleList(
(0-23): 24 x BasicTransformerBlock(
(norm1): LayerNorm((1584,), eps=1e-06, elementwise_affine=False)
(attn1): Attention(
(to_q): Linear(in_features=1584, out_features=1584, bias=True)
(to_k): Linear(in_features=1584, out_features=1584, bias=True)
(to_v): Linear(in_features=1584, out_features=1584, bias=True)
(to_out): ModuleList(
(0): Linear(in_features=1584, out_features=1584, bias=True)
(1): Dropout(p=0.0, inplace=False)
)
)
(norm2): LayerNorm((1584,), eps=1e-06, elementwise_affine=False)
(ff): FeedForward(
(net): ModuleList(
(0): GELU(
(proj): Linear(in_features=1584, out_features=6336, bias=True)
)
(1): Dropout(p=0.0, inplace=False)
(2): Linear(in_features=6336, out_features=1584, bias=True)
)
)
)
)
(norm_out): LayerNorm((1584,), eps=1e-06, elementwise_affine=False)
(proj_out): Linear(in_features=1584, out_features=128, bias=True)
(adaln_single): AdaLayerNormSingleFlow(
(emb): PixArtAlphaCombinedFlowEmbeddings(
(timestep_embedder): TimestepEmbedding(
(linear_1): Linear(in_features=512, out_features=1584, bias=True)
(act): SiLU()
(linear_2): Linear(in_features=1584, out_features=1584, bias=True)
)
)
(silu): SiLU()
(linear): Linear(in_features=1584, out_features=9504, bias=True)
)
)
As as input for my model i use next inputs with shapes:
hidden_states -> (B, 96, Height, Width)
timestep -> (B)
resolution -> (B, 2)
aspect_ratio -> (B, 1)
In python version resolution and aspect ratio are parts of added_cond_kwargs, but since onnx doesn't support dicts i wrote a wrapper that
import torch
import torch.nn as nn
class Transformer2DWrapper(nn.Module):
def __init__(self, model):
super().__init__()
self.model = model
def forward(
self,
hidden_states: torch.Tensor,
timestep: torch.Tensor,
resolution: torch.Tensor,
aspect_ratio: torch.Tensor,
):
timestep = timestep.float()
added_cond_kwargs = {
"resolution": resolution,
"aspect_ratio": aspect_ratio,
}
out = self.model(
hidden_states=hidden_states,
timestep=timestep,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)
return out[0] # sample tensor
I export to onnx with torch.onnx.export
wrapper = Transformer2DWrapper(unet_model)
torch.onnx.export(
wrapper,
dummy_inputs,
"unet_converted/model.onnx",
input_names=["hidden_states", "timestep", "resolution", "aspect_ratio"],
output_names=["out_sample"],
dynamic_axes={
"hidden_states": {0: "batch", 2: "height", 3: "width"},
"timestep": {0: "batch"},
"resolution": {0: "batch"},
"aspect_ratio": {0: "batch"},
"out_sample": {0: "batch", 2: "height", 3: "width"},
},
opset_version=17,
)
But after loading onnx version there is error with squeeze operation
Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from unet_converted/model.onnx failed:Node (/model/transformer_blocks.0/If) Op (If) [TypeInferenceError] Graph attribute inferencing failed: Node (/model/transformer_blocks.0/Squeeze) Op (Squeeze) [ShapeInferenceError] Dimension of input 1 must be 1 instead of 256
Can you please help with convertation?
Versions:
diffusers==0.27.2
torch==2.2.0+cu118
Metadata
Metadata
Assignees
Labels
No labels