Trickle train the Vision portion along with the text_encoder portion #2033

AbstractEyes · 2025-04-05T23:39:13Z

AbstractEyes
Apr 5, 2025

I've been doing some foundational work with remerging my clips into the OpenClip VIT-G for my CLIP_G omega and ViT-L for my CLIP_L.

https://huggingface.co/AbstractPhil/omega-vit-g-reformed

There are some foundational problems already solved here in sd-scripts; but I'd also like to experiment with actually trickle training these vision encoders in a safe and measurably similar way to the differences being tuned into the text_encoders upon training.

I've also been devising ways to test these using clip-interrogator against only the tags used for training particular images and subsets; but I haven't made much headway on that.

Is there some sort of fundamental ViT vision training built into sd-scripts already, and if there is what sort of access does it have to multi-gpu, accelerate, and the robust bucketing system that is currently available in sd-scripts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Trickle train the Vision portion along with the text_encoder portion #2033

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Trickle train the Vision portion along with the text_encoder portion #2033

Uh oh!

Uh oh!

AbstractEyes Apr 5, 2025

https://huggingface.co/AbstractPhil/omega-vit-g-reformed

Replies: 0 comments

AbstractEyes
Apr 5, 2025