-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Fix pi0 checkpoint state map #1415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Copilot <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>
|
Hi, can u consider make it more robust? I suppose transformers might renaming them back, then your patch would fail again. |
I will make it rubost in the feature. |
|
@YushunXiang Hi, actually your code still missed some params: Missing key(s) in state_dict: "normalize_inputs.buffer_observation_state.mean", "normalize_inputs.buffer_observation_state.std", "normalize_targets.buffer_action.mean", "normalize_targets.buffer_action.std", "unnormalize_outputs.buffer_action.mean", "unnormalize_outputs.buffer_action.std", "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight". the last one, did u noticed that? |
I have noticed that. But I don't think it's a mapping problem, if there's a |
|
Sure, since I am still can not make it work, and you did. My question is, training with 8 GPUs not work, the policy loss goes down to about 0.06, and not increasing any longer |
Here are my training loss curves. batchsize=16 w/ this PRThe lowest loss value is about 0.002. w/o this PRThe lowest loss value is about 0.012. |
|
@YushunXiang Using single GPU, is this normal? Looks like, multiple GPUs can not decrease after 0.006 |
|
@YushunXiang DO u know how to set lr for multipl GPUS? Am confused, why the args will throw error: |
You should use |
|
Hi, I tried --policy.optimizer_lr, but somehow the draccus didn't parsed it correclty. So confused. Also, I found the trained model, by loading it back, still throw an error: Missing keys: in state_dict: "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight". Have u tried by strict = True when loading the trained model? It will still throw this error. |
lerobot/src/lerobot/configs/policies.py Lines 37 to 71 in aec1b29
does not contain Without modifying the source code, I think it's a good idea to change the value of
I have tried, and the error message is the same as you. |
Don't worry about that. The When I was reading the source code of def tie_weights(self):
"""
Tie the weights between the input embeddings and the output embeddings.
If the `torchscript` flag is set in the configuration, can't handle parameter sharing so we are cloning the
weights instead.
"""
if getattr(self.config.get_text_config(decoder=True), "tie_word_embeddings", True):
output_embeddings = self.get_output_embeddings()
if output_embeddings is not None:
self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())
if getattr(self.config, "is_encoder_decoder", False) and getattr(self.config, "tie_encoder_decoder", False):
if hasattr(self, self.base_model_prefix):
self = getattr(self, self.base_model_prefix)
tied_weights = self._tie_encoder_decoder_weights(
self.encoder, self.decoder, self.base_model_prefix, "encoder"
)
# Setting a dynamic variable instead of `_tied_weights_keys` because it's a class
# attributed not an instance member, therefore modifying it will modify the entire class
# Leading to issues on subsequent calls by different tests or subsequent calls.
self._dynamic_tied_weights_keys = tied_weights
for module in self.modules():
if hasattr(module, "_tie_weights"):
module._tie_weights() def get_input_embeddings(self):
return self.model.embed_tokens
def get_output_embeddings(self):
return self.lm_headThis means that |
|
I have a question. Convert the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes checkpoint state mismatches for the PI0Policy by transforming state dict keys and adds support for loading weights from safetensor files.
- Adds a key transformation method to align PaliGemma layer names.
- Introduces a safetensor loader that applies these transformations before model loading.
Comments suppressed due to low confidence (2)
src/lerobot/policies/pi0/modeling_pi0.py:262
- [nitpick] Add more specific type annotations (e.g.,
Dict[str, torch.Tensor]) for the input and return values to improve code clarity and editor support.
def _transform_state_dict_keys(cls, state_dict: dict) -> dict:
src/lerobot/policies/pi0/modeling_pi0.py:261
- There’s no test coverage for the key-transformation logic; consider adding unit tests that verify each mapping and the tied-weights handling.
@classmethod
michel-aractingi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! I left a couple of comments
Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>
for more information, see https://pre-commit.ci
… and unexpected keys
Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>
for more information, see https://pre-commit.ci
…ing and unexpected keys
|
Thank your for this fix! it is merged now, |
|
I mean this is kinda ugly not gonna lie... Is it possible to either:
I suppose copying PaliGemma code over is somewhat complicated, but I do think we will benefit from not having to worry about future transformers library updates, as well as look into ways to speed things up, as well as mess with floating point precisions like how |
|
@branyang02 This is a temporary fix until we merge the pipeline pr #1431 |
|
@branyang02 Thank you for your advice. My code is indeed not elegant enough. PR 1431 is a wonderful work, and I have learned a lot from it. |
Co-authored-by: Michel Aractingi <[email protected]>
Thanks for your work on LeRobot and for sharing the training configurations. However, when I try to reproduce training with the Here are the details: 🔧 Training Configuration'dataset': {
'root': '/home/Program/lerobot_new/datasets/libero_10_no_noops_1.0.0_lerobot',
'video_backend': 'torchcodec',
'use_imagenet_stats': True,
'image_transforms': {
'enable': True,
'max_num_transforms': 3,
'random_order': True,
'tfs': {
'brightness': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'brightness': [0.8, 1.2]}},
'contrast': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'contrast': [0.8, 1.2]}},
'hue': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'hue': [-0.05, 0.05]}},
'saturation': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'saturation': [0.5, 1.5]}},
'sharpness': {'type': 'SharpnessJitter', 'weight': 1.0, 'kwargs': {'sharpness': [0.5, 1.5]}}
}
}
},
'policy': {
'type': 'pi0',
'n_obs_steps': 1,
'n_action_steps': 50,
'chunk_size': 50,
'proj_width': 1024,
'tokenizer_max_length': 48,
'freeze_vision_encoder': True,
'train_state_proj': True,
'resize_imgs_with_padding': [224, 224],
'normalization_mapping': {
'STATE': 'MEAN_STD',
'ACTION': 'MEAN_STD',
'VISUAL': 'IDENTITY'
},
'scheduler_decay_steps': 30000,
'scheduler_warmup_steps': 1000,
'scheduler_decay_lr': 2.5e-6,
'optimizer_lr': 1e-4,
'optimizer_weight_decay': 1e-10,
'optimizer_betas': [0.9, 0.95]
},
'scheduler': {
'type': 'cosine_decay_with_warmup',
'peak_lr': 1e-4,
'num_decay_steps': 30000,
'num_warmup_steps': 1000,
'decay_lr': 2.5e-6
},
'optimizer': {
'type': 'adamw',
'lr': 1e-4,
'betas': [0.9, 0.95],
'eps': 1e-8,
'weight_decay': 1e-10
},
'use_policy_training_preset': True,
'device': 'cuda',
'use_amp': False,
'steps': 30000,
'log_freq': 10Transformers version: 4.53.0 📉 Loss Output SampleHere’s a snippet of the logs: Any insight or clarification would be appreciated! Thanks 🙏 |
|
@ymy1946676292 |
Okay, below is the training loss curve of 160,000, after multiple rounds of training, the loss stabilizes at about 0.05 |
|
@ymy1946676292 I guess that #952 may have introduced this problem? |
Thank you very much for your answer, I have tried to fix it using the scheme mentioned in #952, but after multiple rounds of training, the loss is still very high and the success rate is almost 0 |
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>









The issue about this bug is #1406, which is probably caused by huggingface/transformers#37033. In the v4.52.1 release of the transformers library, huggingface/transformers#37033 introduced a bug by renaming the class
PaliGemmaForConditionalGeneration(PaliGemmaPreTrainedModel, GenerationMixin)to classPaliGemmaModel(PaliGemmaPreTrainedModel).This pull request introduces enhancements to the
PI0Policyclass inlerobot/common/policies/pi0/modeling_pi0.pyto improve model state handling. The changes include adding a method to transform state dictionary keys and a class method to load model weights assafetensorfiles, ensuring compatibility with expected model structures. Solved #1406.Enhancements to model state handling:
Key transformation for state dictionaries: Added
_transform_state_dict_keysmethod to modify state dictionary keys for compatibility with expected model structure. This includes specific transformations forPaliGemmacomponents to ensure proper mapping of model layers.Support for
safetensorfile loading: Introduced_load_as_safetensorclass method to load model weights fromsafetensorfiles. This method applies the key transformations before loading the state dictionary into the model.Apply transformations for PaliGemma components
model.paligemma_with_expert.paligemma.language_model.lm_head->model.paligemma_with_expert.paligemma.lm_headmodel.paligemma_with_expert.paligemma.language_model.model->model.paligemma_with_expert.paligemma.model.language_modelmodel.paligemma_with_expert.paligemma.vision_tower->model.paligemma_with_expert.paligemma.model.vision_towermodel.paligemma_with_expert.paligemma.multi_modal_projector->model.paligemma_with_expert.paligemma.model.multi_modal_projectorEnvironment
transformers: 4.53.0
@Cadene, @mshukor