- 
                Notifications
    You must be signed in to change notification settings 
- Fork 598
Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Amazing! Testing now :) | 
| Installed branch with: Ran into this issue when exporting: Command: Versions: Log: Am I missing something? 👀 I skimmed over the code and it should be using opset 13 | 
| Which version of transformers are you using? If >4.31, could you update to PyTorch nightly (cf my comment above)? Are you sure you checkout out in the correct branch? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variant approach seems nice and more generic.
| optional_group.add_argument( | ||
| "--variant", | ||
| type=str, | ||
| default="default", | ||
| help=("Select a variant of the model to export."), | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a set of possible choices here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit tricky given that the choices are dynamic (dependent on the onnx config).
| monolith: bool, | ||
| custom_onnx_configs: Dict, | ||
| custom_architecture: bool, | ||
| _variant: str, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make it a protected parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking to keep it "private" for now, and support it correctly once we move to this API fully instead of  -with-past, monolith, etc.
Co-authored-by: Michael Benayoun <[email protected]>
Co-authored-by: Michael Benayoun <[email protected]>
…optimum into sam-vision-encoder-onnx
| 
 Sorry, for some reason I forgot to mention: installed from source 
 Will try that 👍 For the testing I just downgraded to PyTorch v2.0.0. 
 I believe so, as I ran Will do more testing today. | 
| @xenova transformers from source + PyTorch 2.0 does not work for SAM ONNX export (you should get a meaningful error message actually) due to the issue detailed in my first post. Edit: something like this:  | 
| After fixing some versions, I got it exported! 🥳 a noticeable difference in validation of  Let me try quantizing for use in transformers.js 🔥 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-exported, but it looks like the config.json has changed quite a bit (see diff):
        
          
                optimum/exporters/onnx/utils.py
              
                Outdated
          
        
      | else: | ||
| # We use the model patcher to patch their forward method. | ||
| models_for_export["vision_encoder"] = model | ||
| models_for_export["prompt_mask"] = model | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason behind the naming btw?
What about "vision_encoder" and "mask_decoder"? I'd say this better aligns with the original naming and HF names:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, is prompt_mask a combination of the prompt_encoder and mask_decoder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe I should name it more explicitly prompt_encoder_mask_decoder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could help I guess (albeit a bit verbose 😅).
Also, something you've probably thought about already: is there a situation where the user may want to use the prompt encoder and mask decoder separately? I haven't needed to worry about this (yet), but just asking in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess when we want to encode a point for several different images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave as is and just rename to prompt_encoder_mask_decoder.onnx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess when we want to encode a point for several different images?
Indeed, quite a rare use case 😅
I'll leave as is and just rename to prompt_encoder_mask_decoder.onnx.
Sounds good!
| Good news: got the export working in Transformers.js! 🥳 I replicated the demo code from here to test it. I'm still trying to figure out the input and output format, but it looks like it matches the python version quite well! 🤣 | 
| Awesome, thank you for giving it a try! So it means this PR is probably good to merge. 
 Good catch! I'll fix that. | 
| @xenova This PR huggingface/transformers#25237 broke the nested config loading it seems Edit: it is not really a bug in transformers, but a behavior that I find unintuitive. Basically saved configs are not greedy due to a hard-coded  | 
| Ah, I see 😅. Well, I'll be able to work around this in transformers.js; but I thought I should bring it up. We can also just wait for the transformers team to respond to that issue before merging this PR. | 




The ONNX export of SAM will fail at the next transformers release due to huggingface/transformers#25074, that unintentionally triggers a bug in PyTorch that is since fixed pytorch/pytorch#100429.
Note as well that there is an additional not-fixed bug (see pytorch/pytorch#107591) in PyTorch for
repeat_interleavethat make it impossible to export SAM on cuda device (and thus in fp16) following huggingface/transformers#25074 (so for transformers>=4.32). In order to export on CUDA device transformers<=4.31 is required, or alternatively we need to wait from a fix in PyTorch.This PR also adds the
variantfor the ONNX export #1299. The idea is that for the same model, task, one may want to have different ONNX exported models. Currently, existing variants may bewith-past,without-past,monolithfor the encoder/decoder models. Here the motivation is that for SAM either we want to export the whole model standalone, or as separate vision encoder / prompt encoder, mask decoder.A description of the variant is displayed during the export:
If this is validated, we could move from
-with-pastand monolith configbehavior tovariants.Fixes #1078. By default exporting as
split.Potentially breaking (have to check the tests): the model patcher is used during the export validation, have to check that tests are fine.