Fix pi0 checkpoint state map #1415

YushunXiang · 2025-06-30T22:06:22Z

The issue about this bug is #1406, which is probably caused by huggingface/transformers#37033. In the v4.52.1 release of the transformers library, huggingface/transformers#37033 introduced a bug by renaming the class PaliGemmaForConditionalGeneration(PaliGemmaPreTrainedModel, GenerationMixin) to class PaliGemmaModel(PaliGemmaPreTrainedModel).

This pull request introduces enhancements to the PI0Policy class in lerobot/common/policies/pi0/modeling_pi0.py to improve model state handling. The changes include adding a method to transform state dictionary keys and a class method to load model weights as safetensor files, ensuring compatibility with expected model structures. Solved #1406.

Enhancements to model state handling:

Key transformation for state dictionaries: Added _transform_state_dict_keys method to modify state dictionary keys for compatibility with expected model structure. This includes specific transformations for PaliGemma components to ensure proper mapping of model layers.
Support for safetensor file loading: Introduced _load_as_safetensor class method to load model weights from safetensor files. This method applies the key transformations before loading the state dictionary into the model.
Apply transformations for PaliGemma components
- model.paligemma_with_expert.paligemma.language_model.lm_head -> model.paligemma_with_expert.paligemma.lm_head
- model.paligemma_with_expert.paligemma.language_model.model -> model.paligemma_with_expert.paligemma.model.language_model
- model.paligemma_with_expert.paligemma.vision_tower -> model.paligemma_with_expert.paligemma.model.vision_tower
- model.paligemma_with_expert.paligemma.multi_modal_projector -> model.paligemma_with_expert.paligemma.model.multi_modal_projector

Environment

transformers: 4.53.0

@Cadene, @mshukor

Co-authored-by: Copilot <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

OpenJarvisAI · 2025-07-03T09:36:56Z

Hi, can u consider make it more robust? I suppose transformers might renaming them back, then your patch would fail again.

YushunXiang · 2025-07-03T10:22:43Z

Hi, can u consider make it more robust? I suppose transformers might renaming them back, then your patch would fail again.

I will make it rubost in the feature.

OpenJarvisAI · 2025-07-07T06:37:23Z

@YushunXiang Hi, actually your code still missed some params:

Missing key(s) in state_dict: "normalize_inputs.buffer_observation_state.mean", "normalize_inputs.buffer_observation_state.std", "normalize_targets.buffer_action.mean", "normalize_targets.buffer_action.std", "unnormalize_outputs.buffer_action.mean", "unnormalize_outputs.buffer_action.std", "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight".

the last one, did u noticed that?

YushunXiang · 2025-07-07T07:30:31Z

@YushunXiang Hi, actually your code still missed some params:

Missing key(s) in state_dict: "normalize_inputs.buffer_observation_state.mean", "normalize_inputs.buffer_observation_state.std", "normalize_targets.buffer_action.mean", "normalize_targets.buffer_action.std", "unnormalize_outputs.buffer_action.mean", "unnormalize_outputs.buffer_action.std", "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight".

the last one, did u noticed that?

I have noticed that. But I don't think it's a mapping problem, if there's a missing keys like that, shouldn't there also be a corresponding unexpected keys? That's a good question. I will look into it later. Can you help me to figure out the problem together?

OpenJarvisAI · 2025-07-07T07:35:58Z

Sure, since I am still can not make it work, and you did.
I think I need align with you first.

My question is, training with 8 GPUs not work, the policy loss goes down to about 0.06, and not increasing any longer

YushunXiang · 2025-07-07T09:01:11Z

Sure, since I am still can not make it work, and you did. I think I need align with you first.

My question is, training with 8 GPUs not work, the policy loss goes down to about 0.06, and not increasing any longer

Here are my training loss curves. batchsize=16

w/ this PR

Loss:
Learning Rate:
L2 Loss:

The lowest loss value is about 0.002.

w/o this PR

Loss:

The lowest loss value is about 0.012.

OpenJarvisAI · 2025-07-07T11:06:21Z

@YushunXiang Using single GPU, is this normal?

2025-07-07 19:03:23.939 | INFO     | __main__:train:230 - Checkpoint policy after step 18200
2025-07-07 19:03:52.801 | INFO     | __main__:train:221 - step:18K smpl:255K ep:911 epch:2.28 loss:0.004 grdn:0.206 lr:1.0e-05 updt_s:0.524 data_s:0.000
2025-07-07 19:04:03.350 | INFO     | __main__:train:221 - step:18K smpl:255K ep:912 epch:2.28 loss:0.004 grdn:0.220 lr:1.0e-05 updt_s:0.525 data_s:0.000
2025-07-07 19:04:13.922 | INFO     | __main__:train:221 - step:18K smpl:256K ep:913 epch:2.28 loss:0.004 grdn:0.200 lr:1.0e-05 updt_s:0.526 data_s:0.000
2025-07-07 19:04:24.491 | INFO     | __main__:train:221 - step:18K smpl:256K ep:914 epch:2.28 loss:0.004 grdn:0.220 lr:1.0e-05 updt_s:0.526 data_s:0.000
2025-07-07 19:04:35.060 | INFO     | __main__:train:221 - step:18K smpl:256K ep:915 epch:2.29 loss:0.003 grdn:0.199 lr:9.9e-06 updt_s:0.526 data_s:0.000

Looks like, multiple GPUs can not decrease after 0.006

OpenJarvisAI · 2025-07-07T11:12:15Z

@YushunXiang DO u know how to set lr for multipl GPUS?

accelerate launch train_multi.py \
  --policy.path=$MODEL_PATH \
  --policy.optimizer_lr=$LR \
  --dataset.repo_id=$DATA_PATH \
  --dataset.image_transforms.enable=true \
  --dataset.image_transforms.random_order=true \
  --output_dir=outputs/$POLICY_TYPE/$EXP_NAME-$DATASET_NAME-$DATE \
  --batch_size=$GLOBAL_BATCH_SIZE \
  --steps=30000 \
  --log_freq=20 \
  --save_freq=100

Am confused, why the args will throw error:

train_multi.py: error: unrecognized arguments: --optimizer_lr=0.00016
usage: train_multi.py [-h] [--config_path str] [--dataset str] [--dataset.repo_id str] [--dataset.root str] [--dataset.episodes str] [--image_transforms str]
                      [--dataset.image_transforms.enable str] [--dataset.image_transforms.max_num_transforms str] [--dataset.image_transforms.random_order str]
                      [--dataset.image_transforms.tfs str] [--dataset.revision str] [--dataset.use_imagenet_stats str] [--dataset.video_backend str] [--env str]
                      [--env.type {aloha,pusht,xarm}] [--env.task str] [--env.fps str] [--env.features str] [--env.features_map str] [--env.episode_length str]
                      [--env.obs_type str] [--env.render_mode str] [--env.visualization_width str] [--env.visualization_height str] [--policy str]
                      [--policy.type {pi0,smolvla,pi0fast}] [--policy.attention_implementation str] [--policy.num_steps str] [--policy.train_expert_only str]
                      [--policy.train_state_proj str] [--policy.optimizer_grad_clip_norm str] [--policy.vlm_model_name str] [--policy.load_vlm_weights str]
                      [--policy.add_image_special_tokens str] [--policy.attention_mode str] [--policy.prefix_length str] [--policy.pad_language_to str]
                      [--policy.num_expert_layers str] [--policy.num_vlm_layers str] [--policy.self_attn_every_n_layers str] [--policy.expert_width_multiplier str]
                      [--policy.min_period str] [--policy.max_period str] [--policy.n_obs_steps str] [--policy.normalization_mapping str] [--policy.input_features str]
                      [--policy.output_features str] [--policy.device str] [--policy.use_amp str] [--policy.gradient_accumulation_steps str] [--policy.chunk_size str]
                      [--policy.n_action_steps str] [--policy.max_state_dim str] [--policy.max_action_dim str] [--policy.resize_imgs_with_padding str]
                      [--policy.interpolate_like_pi str] [--policy.empty_cameras str] [--policy.adapt_to_pi_aloha str] [--policy.use_delta_joint_actions_aloha str]
                      [--policy.tokenizer_max_length str] [--policy.proj_width str] [--policy.max_decoding_steps str] [--policy.fast_skip_tokens str]
                      [--policy.max_input_seq_len str] [--policy.use_cache str] [--policy.freeze_vision_encoder str] [--policy.freeze_lm_head str] [--policy.optimizer_lr str]
                      [--policy.optimizer_betas str] [--policy.optimizer_eps str] [--policy.optimizer_weight_decay str] [--policy.scheduler_warmup_steps str]
                      [--policy.scheduler_decay_steps str] [--policy.scheduler_decay_lr str] [--policy.checkpoint_path str] [--policy.padding_side str] [--policy.precision str]
                      [--policy.grad_clip_norm str] [--policy.relaxed_action_decoding str] [--output_dir str] [--job_name str] [--resume str] [--seed str] [--num_workers str]
                      [--batch_size str] [--steps str] [--eval_freq str] [--log_freq str] [--save_checkpoint str] [--save_freq str] [--use_policy_training_preset str]
                      [--optimizer str] [--optimizer.type {adam,adamw,sgd}] [--optimizer.betas str] [--optimizer.eps str] [--optimizer.lr str] [--optimizer.weight_decay str]
                      [--optimizer.grad_clip_norm str] [--optimizer.momentum str] [--optimizer.dampening str] [--optimizer.nesterov str] [--scheduler str]
                      [--scheduler.type {diffuser,vqbet,cosine_decay_with_warmup}] [--scheduler.name str] [--scheduler.num_vqvae_training_steps str] [--scheduler.num_cycles str]
                      [--scheduler.num_warmup_steps str] [--scheduler.num_decay_steps str] [--scheduler.peak_lr str] [--scheduler.decay_lr str] [--eval str]
                      [--eval.n_episodes str] [--eval.batch_size str] [--eval.use_async_envs str] [--wandb str] [--wandb.enable str] [--wandb.disable_artifact str]
                      [--wandb.project str] [--wandb.entity str] [--wandb.notes str] [--wandb.run_id str] [--wandb.mode str]
train_multi.py: error: unrecognized arguments: --optimizer_lr=0.00016

YushunXiang · 2025-07-07T11:36:04Z

@YushunXiang DO u know how to set lr for multipl GPUS?

accelerate launch train_multi.py \
  --policy.path=$MODEL_PATH \
  --policy.optimizer_lr=$LR \
  --dataset.repo_id=$DATA_PATH \
  --dataset.image_transforms.enable=true \
  --dataset.image_transforms.random_order=true \
  --output_dir=outputs/$POLICY_TYPE/$EXP_NAME-$DATASET_NAME-$DATE \
  --batch_size=$GLOBAL_BATCH_SIZE \
  --steps=30000 \
  --log_freq=20 \
  --save_freq=100

Am confused, why the args will throw error:

train_multi.py: error: unrecognized arguments: --optimizer_lr=0.00016
usage: train_multi.py [-h] [--config_path str] [--dataset str] [--dataset.repo_id str] [--dataset.root str] [--dataset.episodes str] [--image_transforms str]
                      [--dataset.image_transforms.enable str] [--dataset.image_transforms.max_num_transforms str] [--dataset.image_transforms.random_order str]
                      [--dataset.image_transforms.tfs str] [--dataset.revision str] [--dataset.use_imagenet_stats str] [--dataset.video_backend str] [--env str]
                      [--env.type {aloha,pusht,xarm}] [--env.task str] [--env.fps str] [--env.features str] [--env.features_map str] [--env.episode_length str]
                      [--env.obs_type str] [--env.render_mode str] [--env.visualization_width str] [--env.visualization_height str] [--policy str]
                      [--policy.type {pi0,smolvla,pi0fast}] [--policy.attention_implementation str] [--policy.num_steps str] [--policy.train_expert_only str]
                      [--policy.train_state_proj str] [--policy.optimizer_grad_clip_norm str] [--policy.vlm_model_name str] [--policy.load_vlm_weights str]
                      [--policy.add_image_special_tokens str] [--policy.attention_mode str] [--policy.prefix_length str] [--policy.pad_language_to str]
                      [--policy.num_expert_layers str] [--policy.num_vlm_layers str] [--policy.self_attn_every_n_layers str] [--policy.expert_width_multiplier str]
                      [--policy.min_period str] [--policy.max_period str] [--policy.n_obs_steps str] [--policy.normalization_mapping str] [--policy.input_features str]
                      [--policy.output_features str] [--policy.device str] [--policy.use_amp str] [--policy.gradient_accumulation_steps str] [--policy.chunk_size str]
                      [--policy.n_action_steps str] [--policy.max_state_dim str] [--policy.max_action_dim str] [--policy.resize_imgs_with_padding str]
                      [--policy.interpolate_like_pi str] [--policy.empty_cameras str] [--policy.adapt_to_pi_aloha str] [--policy.use_delta_joint_actions_aloha str]
                      [--policy.tokenizer_max_length str] [--policy.proj_width str] [--policy.max_decoding_steps str] [--policy.fast_skip_tokens str]
                      [--policy.max_input_seq_len str] [--policy.use_cache str] [--policy.freeze_vision_encoder str] [--policy.freeze_lm_head str] [--policy.optimizer_lr str]
                      [--policy.optimizer_betas str] [--policy.optimizer_eps str] [--policy.optimizer_weight_decay str] [--policy.scheduler_warmup_steps str]
                      [--policy.scheduler_decay_steps str] [--policy.scheduler_decay_lr str] [--policy.checkpoint_path str] [--policy.padding_side str] [--policy.precision str]
                      [--policy.grad_clip_norm str] [--policy.relaxed_action_decoding str] [--output_dir str] [--job_name str] [--resume str] [--seed str] [--num_workers str]
                      [--batch_size str] [--steps str] [--eval_freq str] [--log_freq str] [--save_checkpoint str] [--save_freq str] [--use_policy_training_preset str]
                      [--optimizer str] [--optimizer.type {adam,adamw,sgd}] [--optimizer.betas str] [--optimizer.eps str] [--optimizer.lr str] [--optimizer.weight_decay str]
                      [--optimizer.grad_clip_norm str] [--optimizer.momentum str] [--optimizer.dampening str] [--optimizer.nesterov str] [--scheduler str]
                      [--scheduler.type {diffuser,vqbet,cosine_decay_with_warmup}] [--scheduler.name str] [--scheduler.num_vqvae_training_steps str] [--scheduler.num_cycles str]
                      [--scheduler.num_warmup_steps str] [--scheduler.num_decay_steps str] [--scheduler.peak_lr str] [--scheduler.decay_lr str] [--eval str]
                      [--eval.n_episodes str] [--eval.batch_size str] [--eval.use_async_envs str] [--wandb str] [--wandb.enable str] [--wandb.disable_artifact str]
                      [--wandb.project str] [--wandb.entity str] [--wandb.notes str] [--wandb.run_id str] [--wandb.mode str]
train_multi.py: error: unrecognized arguments: --optimizer_lr=0.00016

You should use --policy.optimizer_lr instead of --optimizer_lr.

OpenJarvisAI · 2025-07-08T02:12:12Z

Hi, I tried --policy.optimizer_lr, but somehow the draccus didn't parsed it correclty.

So confused.

Also, I found the trained model, by loading it back, still throw an error:

Missing keys: in state_dict: "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight".

Have u tried by strict = True when loading the trained model? It will still throw this error.

YushunXiang · 2025-07-08T08:50:50Z

Hi, I tried --policy.optimizer_lr, but somehow the draccus didn't parsed it correclty.

So confused.

lerobot/src/lerobot/configs/policies.py

Lines 37 to 71 in aec1b29

    
           class PreTrainedConfig(draccus.ChoiceRegistry, HubMixin, abc.ABC): 
        
               """ 
        
               Base configuration class for policy models. 
        
               Args: 
        
                   n_obs_steps: Number of environment steps worth of observations to pass to the policy (takes the 
        
                       current step and additional steps going back). 
        
                   input_shapes: A dictionary defining the shapes of the input data for the policy. 
        
                   output_shapes: A dictionary defining the shapes of the output data for the policy. 
        
                   input_normalization_modes: A dictionary with key representing the modality and the value specifies the 
        
                       normalization mode to apply. 
        
                   output_normalization_modes: Similar dictionary as `input_normalization_modes`, but to unnormalize to 
        
                       the original scale. 
        
               """ 
        
               n_obs_steps: int = 1 
        
               normalization_mapping: dict[str, NormalizationMode] = field(default_factory=dict) 
        
               input_features: dict[str, PolicyFeature] = field(default_factory=dict) 
        
               output_features: dict[str, PolicyFeature] = field(default_factory=dict) 
        
               device: str | None = None  # cuda | cpu | mp 
        
               # `use_amp` determines whether to use Automatic Mixed Precision (AMP) for training and evaluation. With AMP, 
        
               # automatic gradient scaling is used. 
        
               use_amp: bool = False 
        
               push_to_hub: bool = True 
        
               repo_id: str | None = None 
        
               # Upload on private repository on the Hugging Face hub. 
        
               private: bool | None = None 
        
               # Add tags to your policy on the hub. 
        
               tags: list[str] | None = None 
        
               # Add tags to your policy on the hub. 
        
               license: str | None = None

does not contain optimizer_lr, which causes this error.

Without modifying the source code, I think it's a good idea to change the value of optimizer_lr in lerobot/pi0/config.json

Also, I found the trained model, by loading it back, still throw an error:

Missing keys: in state_dict: "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight".

Have u tried by strict = True when loading the trained model? It will still throw this error.

I have tried, and the error message is the same as you.

YushunXiang · 2025-07-08T15:18:09Z

@YushunXiang Hi, actually your code still missed some params:

Missing key(s) in state_dict: "normalize_inputs.buffer_observation_state.mean", "normalize_inputs.buffer_observation_state.std", "normalize_targets.buffer_action.mean", "normalize_targets.buffer_action.std", "unnormalize_outputs.buffer_action.mean", "unnormalize_outputs.buffer_action.std", "model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight".

the last one, did u noticed that?

Don't worry about that. The model.paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight is the same as model.paligemma_with_expert.paligemma.lm_head.weight. You can check the torch.Tensor.untyped_storage().data_ptr() and torch.Tensor.untyped_storage().nbytes() and you will find the memory address and memory byte size are the same, proving that the two tensors are sharing the same underlying memory.

When I was reading the source code of Transformers, I found that one of the components of pi0, GemmaForCausalLM, which is the model.paligemma_with_expert.paligemma.model.language_model, has tie_word_embeddings=True in set in its config file. When a model has both get_input_embeddings() and get_output_embeddings() methods defined, the Transformers framework will automatically tie them together in the tie_weights() method.

    def tie_weights(self):
        """
        Tie the weights between the input embeddings and the output embeddings.

        If the `torchscript` flag is set in the configuration, can't handle parameter sharing so we are cloning the
        weights instead.
        """
        if getattr(self.config.get_text_config(decoder=True), "tie_word_embeddings", True):
            output_embeddings = self.get_output_embeddings()
            if output_embeddings is not None:
                self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())

        if getattr(self.config, "is_encoder_decoder", False) and getattr(self.config, "tie_encoder_decoder", False):
            if hasattr(self, self.base_model_prefix):
                self = getattr(self, self.base_model_prefix)
            tied_weights = self._tie_encoder_decoder_weights(
                self.encoder, self.decoder, self.base_model_prefix, "encoder"
            )
            # Setting a dynamic variable instead of `_tied_weights_keys` because it's a class
            # attributed not an instance member, therefore modifying it will modify the entire class
            # Leading to issues on subsequent calls by different tests or subsequent calls.
            self._dynamic_tied_weights_keys = tied_weights

        for module in self.modules():
            if hasattr(module, "_tie_weights"):
                module._tie_weights()

    def get_input_embeddings(self):
        return self.model.embed_tokens

    def get_output_embeddings(self):
        return self.lm_head

This means that lm_head.weight and embed_tokens.weight are equivalent.

@OpenJarvisAI

YushunXiang · 2025-07-08T20:08:20Z

I have a question. Convert the state_dict() key name of the model itself at load time to match the checkpoints, instead of converting the state_dict loaded from a file. Is this a more elegant approach?

Copilot

Pull Request Overview

This PR fixes checkpoint state mismatches for the PI0Policy by transforming state dict keys and adds support for loading weights from safetensor files.

Adds a key transformation method to align PaliGemma layer names.
Introduces a safetensor loader that applies these transformations before model loading.

Comments suppressed due to low confidence (2)

src/lerobot/policies/pi0/modeling_pi0.py:262

[nitpick] Add more specific type annotations (e.g., Dict[str, torch.Tensor]) for the input and return values to improve code clarity and editor support.

    def _transform_state_dict_keys(cls, state_dict: dict) -> dict:

src/lerobot/policies/pi0/modeling_pi0.py:261

There’s no test coverage for the key-transformation logic; consider adding unit tests that verify each mapping and the tied-weights handling.

    @classmethod

src/lerobot/policies/pi0/modeling_pi0.py

michel-aractingi

Thanks for this PR! I left a couple of comments

src/lerobot/policies/pi0/modeling_pi0.py

Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

for more information, see https://pre-commit.ci

… and unexpected keys

src/lerobot/policies/pretrained.py

Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

for more information, see https://pre-commit.ci

…ing and unexpected keys

michel-aractingi · 2025-07-30T15:38:58Z

Thank your for this fix! it is merged now,

branyang02 · 2025-07-31T06:52:29Z

I mean this is kinda ugly not gonna lie...

Is it possible to either:

cap the transformers library version.
copy all PaliGemma and related code over so we have a fixed implementation?

I suppose copying PaliGemma code over is somewhat complicated, but I do think we will benefit from not having to worry about future transformers library updates, as well as look into ways to speed things up, as well as mess with floating point precisions like how openpi does.

michel-aractingi · 2025-07-31T08:29:02Z

@branyang02 This is a temporary fix until we merge the pipeline pr #1431
Which will do exactly what you suggested, bump the transformers and change the model keys directly.

YushunXiang · 2025-07-31T11:46:17Z

@branyang02 Thank you for your advice. My code is indeed not elegant enough.

PR 1431 is a wonderful work, and I have learned a lot from it.

Co-authored-by: Michel Aractingi <[email protected]>

ymy1946676292 · 2025-08-05T08:17:16Z

Sure, since I am still can not make it work, and you did. I think I need align with you first.
My question is, training with 8 GPUs not work, the policy loss goes down to about 0.06, and not increasing any longer

Here are my training loss curves. batchsize=16

w/ this PR

Loss:

Learning Rate:

L2 Loss:

The lowest loss value is about 0.002.

w/o this PR

Loss:

The lowest loss value is about 0.012.

Thanks for your work on LeRobot and for sharing the training configurations. However, when I try to reproduce training with the pi0 policy on the Libero dataset using the same config, I notice that the training loss remains consistently high in the early stage even on a single GPU setup.

Here are the details:

🔧 Training Configuration

'dataset': {
  'root': '/home/Program/lerobot_new/datasets/libero_10_no_noops_1.0.0_lerobot',
  'video_backend': 'torchcodec',
  'use_imagenet_stats': True,
  'image_transforms': {
    'enable': True,
    'max_num_transforms': 3,
    'random_order': True,
    'tfs': {
      'brightness': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'brightness': [0.8, 1.2]}},
      'contrast':   {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'contrast': [0.8, 1.2]}},
      'hue':        {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'hue': [-0.05, 0.05]}},
      'saturation': {'type': 'ColorJitter', 'weight': 1.0, 'kwargs': {'saturation': [0.5, 1.5]}},
      'sharpness':  {'type': 'SharpnessJitter', 'weight': 1.0, 'kwargs': {'sharpness': [0.5, 1.5]}}
    }
  }
},
'policy': {
  'type': 'pi0',
  'n_obs_steps': 1,
  'n_action_steps': 50,
  'chunk_size': 50,
  'proj_width': 1024,
  'tokenizer_max_length': 48,
  'freeze_vision_encoder': True,
  'train_state_proj': True,
  'resize_imgs_with_padding': [224, 224],
  'normalization_mapping': {
    'STATE': 'MEAN_STD',
    'ACTION': 'MEAN_STD',
    'VISUAL': 'IDENTITY'
  },
  'scheduler_decay_steps': 30000,
  'scheduler_warmup_steps': 1000,
  'scheduler_decay_lr': 2.5e-6,
  'optimizer_lr': 1e-4,
  'optimizer_weight_decay': 1e-10,
  'optimizer_betas': [0.9, 0.95]
},
'scheduler': {
  'type': 'cosine_decay_with_warmup',
  'peak_lr': 1e-4,
  'num_decay_steps': 30000,
  'num_warmup_steps': 1000,
  'decay_lr': 2.5e-6
},
'optimizer': {
  'type': 'adamw',
  'lr': 1e-4,
  'betas': [0.9, 0.95],
  'eps': 1e-8,
  'weight_decay': 1e-10
},
'use_policy_training_preset': True,
'device': 'cuda',
'use_amp': False,
'steps': 30000,
'log_freq': 10

Transformers version: 4.53.0

📉 Loss Output Sample

Here’s a snippet of the logs:

step:10  loss:0.160
step:20  loss:0.176
step:50  loss:0.176
step:100 loss:0.126
step:150 loss:0.127
step:200 loss:0.108
step:250 loss:0.098
step:300 loss:0.095
step:320 loss:0.101
step:1K loss:0.108
step:2K loss:0.090

Any insight or clarification would be appreciated! Thanks 🙏

YushunXiang · 2025-08-05T08:56:51Z

@ymy1946676292
Could you give me more log steps or loss curve graphs (both w/ PR and w/o PR) to determine whether it has converged?

ymy1946676292 · 2025-08-06T10:21:36Z

@ymy1946676292 Could you give me more log steps or loss curve graphs (both w/ PR and w/o PR) to determine whether it has converged?

Okay, below is the training loss curve of 160,000, after multiple rounds of training, the loss stabilizes at about 0.05
w/o PR

w/ PR
Currently, the training has reached around 20k, the loss has decreased to about 0.05.
However, after referring to the training configuration provided in [Issue #952](#952), the model is expected to exhibit a relatively low loss at the initial stage. However, even with exactly the same configuration, the loss I obtained during training remained very high, and it did not significantly decrease even after prolonged training.

YushunXiang · 2025-08-06T14:00:28Z

@ymy1946676292 I guess that #952 may have introduced this problem?

ymy1946676292 · 2025-08-07T02:43:32Z

@ymy1946676292 I guess that #952 may have introduced this problem?

Thank you very much for your answer, I have tried to fix it using the scheme mentioned in #952, but after multiple rounds of training, the loss is still very high and the success rate is almost 0

Co-authored-by: Michel Aractingi <[email protected]>

YushunXiang added 2 commits July 1, 2025 05:03

feat(pi0): add state dict key transformation for model loading

590a3d3

[feat] fix pi0 checkpoint state map

f517c30

Copilot AI review requested due to automatic review settings June 30, 2025 22:06

This comment was marked as outdated.

Sign in to view

YushunXiang mentioned this pull request Jun 30, 2025

PI0 policy suffers from weight/key mismatch on main when using transformers ≥ 4.52.0 #1406

Open

YushunXiang and others added 2 commits July 1, 2025 06:12

Update lerobot/common/policies/pi0/modeling_pi0.py

a67520b

Co-authored-by: Copilot <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

Merge branch 'main' into fix-pi0

9b9a5a7

Merge branch 'main' into fix-pi0

49f549a

Merge branch 'main' into fix-pi0

02d3530

Add logging for missing and unexpected keys during state dict loading

e42c33f

Merge branch 'main' into fix-pi0

7008a75

YushunXiang requested a review from Copilot July 8, 2025 20:09

Copilot AI reviewed Jul 8, 2025

View reviewed changes

src/lerobot/policies/pi0/modeling_pi0.py Show resolved Hide resolved

src/lerobot/policies/pi0/modeling_pi0.py Outdated Show resolved Hide resolved

src/lerobot/policies/pi0/modeling_pi0.py Outdated Show resolved Hide resolved

pkooij requested a review from michel-aractingi July 20, 2025 08:42

pkooij assigned YushunXiang Jul 20, 2025

pkooij added the policies Items related to robot policies label Jul 20, 2025

Merge branch 'main' into fix-pi0

19e287a

michel-aractingi reviewed Jul 20, 2025

View reviewed changes

src/lerobot/policies/pi0/modeling_pi0.py Outdated Show resolved Hide resolved

src/lerobot/policies/pi0/modeling_pi0.py Outdated Show resolved Hide resolved

src/lerobot/policies/pi0/modeling_pi0.py Show resolved Hide resolved

YushunXiang and others added 3 commits July 21, 2025 16:03

Update src/lerobot/policies/pi0/modeling_pi0.py

6d5cda6

Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

64c530c

for more information, see https://pre-commit.ci

feat: Enhance logging for safetensor model loading to capture missing…

81424b9

… and unexpected keys

michel-aractingi reviewed Jul 28, 2025

View reviewed changes

src/lerobot/policies/pretrained.py Outdated Show resolved Hide resolved

YushunXiang and others added 3 commits July 28, 2025 21:17

Update src/lerobot/policies/pretrained.py

009fa7b

Co-authored-by: Michel Aractingi <[email protected]> Signed-off-by: Yushun Xiang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

cea8c20

for more information, see https://pre-commit.ci

feat: Refactor model loading logging to use utility function for miss…

14737a8

…ing and unexpected keys

YushunXiang requested a review from michel-aractingi July 30, 2025 09:20

michel-aractingi approved these changes Jul 30, 2025

View reviewed changes

michel-aractingi merged commit 71eff18 into huggingface:main Jul 30, 2025
12 checks passed

YushunXiang deleted the fix-pi0 branch July 31, 2025 11:46

Maelic pushed a commit to Maelic/lerobot that referenced this pull request Aug 4, 2025

Fix pi0 checkpoint state map (huggingface#1415)

f643f16

Co-authored-by: Michel Aractingi <[email protected]>

AdilZouitine pushed a commit that referenced this pull request Aug 10, 2025

Fix pi0 checkpoint state map (#1415)

188cb93

Co-authored-by: Michel Aractingi <[email protected]>

milong26 pushed a commit to milong26/lerobot_diy that referenced this pull request Aug 26, 2025

Fix pi0 checkpoint state map (huggingface#1415)

d4d3e9b

Co-authored-by: Michel Aractingi <[email protected]>

Ricci084 pushed a commit to JeffWang987/lerobot that referenced this pull request Sep 5, 2025

Fix pi0 checkpoint state map (huggingface#1415)

30fb112

Co-authored-by: Michel Aractingi <[email protected]>

BillmanH pushed a commit to BillmanH/lerobot that referenced this pull request Sep 7, 2025

Fix pi0 checkpoint state map (huggingface#1415)

7abb54e

Co-authored-by: Michel Aractingi <[email protected]>

fracapuano pushed a commit that referenced this pull request Sep 12, 2025

$@fracapuano$

Fix pi0 checkpoint state map (#1415)

3d59a31

Co-authored-by: Michel Aractingi <[email protected]>

HyeonseokE mentioned this pull request Oct 9, 2025

"model.model.paligemma_with_expert.paligemma.model.language_model.embed_tokens" parameter is missing in pretrained weight "lerobot/pi0_base" #2151

Closed

Fix pi0 checkpoint state map #1415

Fix pi0 checkpoint state map #1415

Uh oh!

Conversation

YushunXiang commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enhancements to model state handling:

Environment

Uh oh!

This comment was marked as outdated.

Uh oh!

OpenJarvisAI commented Jul 3, 2025

Uh oh!

YushunXiang commented Jul 3, 2025

Uh oh!

OpenJarvisAI commented Jul 7, 2025

Uh oh!

YushunXiang commented Jul 7, 2025

Uh oh!

OpenJarvisAI commented Jul 7, 2025

Uh oh!

YushunXiang commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

w/ this PR

w/o this PR

Uh oh!

OpenJarvisAI commented Jul 7, 2025

Uh oh!

OpenJarvisAI commented Jul 7, 2025

Uh oh!

YushunXiang commented Jul 7, 2025

Uh oh!

OpenJarvisAI commented Jul 8, 2025

Uh oh!

YushunXiang commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YushunXiang commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YushunXiang commented Jul 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michel-aractingi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michel-aractingi commented Jul 30, 2025

Uh oh!

branyang02 commented Jul 31, 2025

Uh oh!

michel-aractingi commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YushunXiang commented Jul 31, 2025

Uh oh!

ymy1946676292 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

w/ this PR

w/o this PR

🔧 Training Configuration

📉 Loss Output Sample

Uh oh!

YushunXiang commented Aug 5, 2025

Uh oh!

ymy1946676292 commented Aug 6, 2025

Uh oh!

YushunXiang commented Aug 6, 2025

Uh oh!

ymy1946676292 commented Aug 7, 2025

YushunXiang commented Jun 30, 2025 •

edited

Loading

YushunXiang commented Jul 7, 2025 •

edited

Loading

YushunXiang commented Jul 8, 2025 •

edited

Loading

YushunXiang commented Jul 8, 2025 •

edited

Loading

michel-aractingi commented Jul 31, 2025 •

edited

Loading

ymy1946676292 commented Aug 5, 2025 •

edited

Loading