test_pt_flax_equivalence and test_encoder_decoder_model_standalone fail running on device (cuda or xpu)

With:
* https://github.com/pytorch/pytorch/commit/0aa41eb52f7e577cf88e0f1b0adb34167a9ae94b
* https://github.com/huggingface/accelerate/commit/4b4c036933f7c50fe3a7027b0380fcec53c6975e
* https://github.com/huggingface/transformers/commit/98adf24883b007c2a7fb17bab1c01b1614673433

Issue seen on NVidia A10 and Intel PVC.

`test_pt_flax_equivalence` and `test_encoder_decoder_model_standalone` are failing across multiple models due to missing models or tensors placements on devices. Specifically, there are 3 types of issues:
1. Model was not moved to device (`model.to(cuda)` is missing)
2. Input was not moved to device (`input.to(cuda)` is missing)
3. `torch.Tensor.numpy()` called with tensor being on device (should first be moved to CPU according to https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html)

**Proposed fix:**
* #33485

CC: @sanchit-gandhi, @amyeroberts

See the following log for repro cmdline and list of errors (log running on NVidia A10, for XPU log will be similar):
```
$ python3 -m pytest --tb=short \
tests/models/informer/test_modeling_informer.py::InformerModelTest::test_encoder_decoder_model_standalone \
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxGPT2EncoderDecoderModelTest::test_pt_flax_equivalence \
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxBartEncoderDecoderModelTest::test_pt_flax_equivalence \
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxBertEncoderDecoderModelTest::test_pt_flax_equivalence \
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py::ViTBertModelTest::test_pt_flax_equivalence \
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py::CLIPVisionBertModelTest::test_pt_flax_equivalence \
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2GPT2ModelTest::test_pt_flax_equivalence \
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2BartModelTest::test_pt_flax_equivalence \
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2BertModelTest::test_pt_flax_equivalence \
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py::FlaxViTBertModelTest::test_pt_flax_equivalence tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py::FlaxCLIPVisionBertModelTest::test_pt_flax_equivalence \
tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py::FlaxViT2GPT2EncoderDecoderModelTest::test_pt_flax_equivalence
========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/dvrogozh/git/huggingface/transformers
configfile: pyproject.toml
plugins: hypothesis-6.111.1, subtests-0.13.1, rich-0.1.1, dash-2.17.1, xdist-3.6.1, pspec-0.0.4, timeout-2.3.1
collected 12 items

tests/models/informer/test_modeling_informer.py F                                                                                                                                               [  8%]
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py FFF                                                                                                                          [ 33%]
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py FF                                                                                                              [ 50%]
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py FFF                                                                                                            [ 75%]
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py FF                                                                                                         [ 91%]
tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py F                                                                                                              [100%]

============================================================================================== FAILURES ===============================================================================================
_______________________________________________________________________ InformerModelTest.test_encoder_decoder_model_standalone _______________________________________________________________________
tests/models/informer/test_modeling_informer.py:226: in test_encoder_decoder_model_standalone
    self.model_tester.check_encoder_decoder_model_standalone(*config_and_inputs)
tests/models/informer/test_modeling_informer.py:174: in check_encoder_decoder_model_standalone
    self.parent.assertTrue(torch.equal(model.encoder.embed_positions.weight, embed_positions.weight))
E   RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument other in method wrapper_CUDA__equal)
______________________________________________________________________ FlaxGPT2EncoderDecoderModelTest.test_pt_flax_equivalence _______________________________________________________________________
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:413: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:344: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:303: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/encoder_decoder/modeling_encoder_decoder.py:597: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:1077: in forward
    embedding_output = self.embeddings(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:210: in forward
    inputs_embeds = self.word_embeddings(input_ids)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/sparse.py:190: in forward
    return F.embedding(
../../pytorch/pytorch/torch/nn/functional.py:2551: in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
E   RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
______________________________________________________________________ FlaxBartEncoderDecoderModelTest.test_pt_flax_equivalence _______________________________________________________________________
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:413: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:344: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:303: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/encoder_decoder/modeling_encoder_decoder.py:597: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:1077: in forward
    embedding_output = self.embeddings(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:210: in forward
    inputs_embeds = self.word_embeddings(input_ids)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/sparse.py:190: in forward
    return F.embedding(
../../pytorch/pytorch/torch/nn/functional.py:2551: in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
E   RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
---------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------
Config of the decoder: <class 'transformers.models.bart.modeling_bart.BartForCausalLM'> is overwritten by shared decoder config: BartConfig {
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_cross_attention": true,
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 32,
  "decoder_attention_heads": 4,
  "decoder_ffn_dim": 4,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 2,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 4,
  "encoder_ffn_dim": 4,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 2,
  "eos_token_id": 2,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "initializer_range": 0.02,
  "is_decoder": true,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "max_position_embeddings": 32,
  "model_type": "bart",
  "num_hidden_layers": 2,
  "pad_token_id": 1,
  "scale_embedding": false,
  "transformers_version": "4.45.0.dev0",
  "use_cache": false,
  "vocab_size": 99
}

______________________________________________________________________ FlaxBertEncoderDecoderModelTest.test_pt_flax_equivalence _______________________________________________________________________
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:413: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:344: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py:303: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/encoder_decoder/modeling_encoder_decoder.py:597: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:1077: in forward
    embedding_output = self.embeddings(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/bert/modeling_bert.py:210: in forward
    inputs_embeds = self.word_embeddings(input_ids)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/sparse.py:190: in forward
    return F.embedding(
../../pytorch/pytorch/torch/nn/functional.py:2551: in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
E   RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
______________________________________________________________________________ ViTBertModelTest.test_pt_flax_equivalence ______________________________________________________________________________
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:266: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(vision_config, text_config, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:226: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, **inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:182: in check_pt_flax_equivalence
    flax_inputs = {k: v.numpy() for k, v in pt_inputs.items()}
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:182: in <dictcomp>
    flax_inputs = {k: v.numpy() for k, v in pt_inputs.items()}
E   TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
__________________________________________________________________________ CLIPVisionBertModelTest.test_pt_flax_equivalence ___________________________________________________________________________
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:266: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(vision_config, text_config, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:226: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, **inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:182: in check_pt_flax_equivalence
    flax_inputs = {k: v.numpy() for k, v in pt_inputs.items()}
tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py:182: in <dictcomp>
    flax_inputs = {k: v.numpy() for k, v in pt_inputs.items()}
E   TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
_________________________________________________________________________ FlaxWav2Vec2GPT2ModelTest.test_pt_flax_equivalence __________________________________________________________________________
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:532: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:459: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:418: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py:501: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:1809: in forward
    extract_features = self.feature_extractor(input_values)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:463: in forward
    hidden_states = conv_layer(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:332: in forward
    hidden_states = self.conv(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:375: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:370: in _conv_forward
    return F.conv1d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
_________________________________________________________________________ FlaxWav2Vec2BartModelTest.test_pt_flax_equivalence __________________________________________________________________________
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:532: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:459: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:418: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py:501: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:1809: in forward
    extract_features = self.feature_extractor(input_values)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:463: in forward
    hidden_states = conv_layer(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:332: in forward
    hidden_states = self.conv(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:375: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:370: in _conv_forward
    return F.conv1d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
---------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------
Config of the decoder: <class 'transformers.models.bart.modeling_bart.BartForCausalLM'> is overwritten by shared decoder config: BartConfig {
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_cross_attention": true,
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.0,
  "d_model": 24,
  "decoder_attention_heads": 4,
  "decoder_ffn_dim": 4,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 2,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "encoder_attention_heads": 4,
  "encoder_ffn_dim": 4,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 2,
  "eos_token_id": 2,
  "forced_eos_token_id": 2,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "initializer_range": 0.02,
  "is_decoder": true,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "max_position_embeddings": 32,
  "model_type": "bart",
  "num_hidden_layers": 2,
  "pad_token_id": 1,
  "scale_embedding": false,
  "transformers_version": "4.45.0.dev0",
  "use_cache": false,
  "vocab_size": 99
}

_________________________________________________________________________ FlaxWav2Vec2BertModelTest.test_pt_flax_equivalence __________________________________________________________________________
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:532: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:459: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py:418: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py:501: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:1809: in forward
    extract_features = self.feature_extractor(input_values)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:463: in forward
    hidden_states = conv_layer(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/wav2vec2/modeling_wav2vec2.py:332: in forward
    hidden_states = self.conv(hidden_states)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:375: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:370: in _conv_forward
    return F.conv1d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
____________________________________________________________________________ FlaxViTBertModelTest.test_pt_flax_equivalence ____________________________________________________________________________
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:243: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(vision_config, text_config, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:207: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:166: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py:358: in forward
    vision_outputs = self.vision_model(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:619: in forward
    embedding_output = self.embeddings(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:124: in forward
    embeddings = self.patch_embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:183: in forward
    embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:554: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:549: in _conv_forward
    return F.conv2d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
________________________________________________________________________ FlaxCLIPVisionBertModelTest.test_pt_flax_equivalence _________________________________________________________________________
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:243: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(vision_config, text_config, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:207: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py:166: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vision_text_dual_encoder/modeling_vision_text_dual_encoder.py:358: in forward
    vision_outputs = self.vision_model(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/clip/modeling_clip.py:1116: in forward
    return self.vision_model(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/clip/modeling_clip.py:1040: in forward
    hidden_states = self.embeddings(pixel_values)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/clip/modeling_clip.py:202: in forward
    patch_embeds = self.patch_embedding(pixel_values.to(dtype=target_dtype))  # shape = [*, width, grid, grid]
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:554: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:549: in _conv_forward
    return F.conv2d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
____________________________________________________________________ FlaxViT2GPT2EncoderDecoderModelTest.test_pt_flax_equivalence _____________________________________________________________________
tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py:352: in test_pt_flax_equivalence
    self.check_equivalence_pt_to_flax(config, decoder_config, inputs_dict)
tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py:288: in check_equivalence_pt_to_flax
    self.check_pt_flax_equivalence(pt_model, fx_model, inputs_dict)
tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py:247: in check_pt_flax_equivalence
    pt_outputs = pt_model(**pt_inputs).to_tuple()
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py:587: in forward
    encoder_outputs = self.encoder(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:619: in forward
    embedding_output = self.embeddings(
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:124: in forward
    embeddings = self.patch_embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/vit/modeling_vit.py:183: in forward
    embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
../../pytorch/pytorch/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../pytorch/pytorch/torch/nn/modules/conv.py:554: in forward
    return self._conv_forward(input, self.weight, self.bias)
../../pytorch/pytorch/torch/nn/modules/conv.py:549: in _conv_forward
    return F.conv2d(
E   RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
========================================================================================== warnings summary ===========================================================================================
../../../pytorch.cuda/lib/python3.10/site-packages/tensorflow/__init__.py:30
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/tensorflow/__init__.py:30: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
    import distutils as _distutils

src/transformers/deepspeed.py:24
  /home/dvrogozh/git/huggingface/transformers/src/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
    warnings.warn(

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:46
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:46: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    v: jnp.DeviceArray,

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:48
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:48: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    inputs: jnp.DeviceArray,

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:49
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:49: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    targets: jnp.DeviceArray,

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:50
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:50: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    ) -> jnp.DeviceArray:

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:72
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:72: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    inputs: jnp.DeviceArray,

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:73
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:73: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    targets: jnp.DeviceArray,

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:74
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:74: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    ) -> jnp.DeviceArray:

../../../pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:97
  /home/dvrogozh/pytorch.cuda/lib/python3.10/site-packages/optax/_src/second_order.py:97: DeprecationWarning: jax.numpy.DeviceArray is deprecated. Use jax.Array.
    ) -> jnp.DeviceArray:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================= short test summary info =======================================================================================
FAILED tests/models/informer/test_modeling_informer.py::InformerModelTest::test_encoder_decoder_model_standalone - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument other in method wrapper_CUDA__equal)
FAILED tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxGPT2EncoderDecoderModelTest::test_pt_flax_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
FAILED tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxBartEncoderDecoderModelTest::test_pt_flax_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
FAILED tests/models/encoder_decoder/test_modeling_flax_encoder_decoder.py::FlaxBertEncoderDecoderModelTest::test_pt_flax_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
FAILED tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py::ViTBertModelTest::test_pt_flax_equivalence - TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
FAILED tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py::CLIPVisionBertModelTest::test_pt_flax_equivalence - TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
FAILED tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2GPT2ModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
FAILED tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2BartModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
FAILED tests/models/speech_encoder_decoder/test_modeling_flax_speech_encoder_decoder.py::FlaxWav2Vec2BertModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
FAILED tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py::FlaxViTBertModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
FAILED tests/models/vision_text_dual_encoder/test_modeling_flax_vision_text_dual_encoder.py::FlaxCLIPVisionBertModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
FAILED tests/models/vision_encoder_decoder/test_modeling_flax_vision_encoder_decoder.py::FlaxViT2GPT2EncoderDecoderModelTest::test_pt_flax_equivalence - RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
================================================================================== 12 failed, 10 warnings in 19.19s ===================================================================================
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test_pt_flax_equivalence and test_encoder_decoder_model_standalone fail running on device (cuda or xpu) #33517

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

test_pt_flax_equivalence and test_encoder_decoder_model_standalone fail running on device (cuda or xpu) #33517

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions