Trickle train the Vision portion along with the text_encoder portion #2033
AbstractEyes
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been doing some foundational work with remerging my clips into the OpenClip VIT-G for my CLIP_G omega and ViT-L for my CLIP_L.
https://huggingface.co/AbstractPhil/omega-vit-g-reformed
There are some foundational problems already solved here in sd-scripts; but I'd also like to experiment with actually trickle training these vision encoders in a safe and measurably similar way to the differences being tuned into the text_encoders upon training.
I've also been devising ways to test these using clip-interrogator against only the tags used for training particular images and subsets; but I haven't made much headway on that.
Is there some sort of fundamental ViT vision training built into sd-scripts already, and if there is what sort of access does it have to multi-gpu, accelerate, and the robust bucketing system that is currently available in sd-scripts?
Beta Was this translation helpful? Give feedback.
All reactions