RWKV - loss.backward() failed

### System Info

- `transformers` version: 4.29.2
- Platform: Linux-5.15.107+-x86_64-with-glibc2.31
- Python version: 3.10.11
- Huggingface_hub version: 0.14.1
- Safetensors version: not installed
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Tensorflow version (GPU?): 2.12.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.6.9 (gpu)
- Jax version: 0.4.8
- JaxLib version: 0.4.7
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

### Who can help?

@ArthurZucker @younesbelkada

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

1. Below is the code from the official example: https://huggingface.co/docs/transformers/main/en/model_doc/rwkv#transformers.RwkvForCausalLM
```
import torch
from transformers import AutoTokenizer, RwkvForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-4-169m-pile")
model = RwkvForCausalLM.from_pretrained("RWKV/rwkv-4-169m-pile")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])
loss = outputs.loss
```

2. I only added this line `loss.backward()` to run but it failed: 
```
import torch
from transformers import AutoTokenizer, RwkvForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-4-169m-pile")
model = RwkvForCausalLM.from_pretrained("RWKV/rwkv-4-169m-pile")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])
loss = outputs.loss
loss.backward()
```


3. Error messages:
```
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-ffc1d58be8b3>](https://localhost:8080/#) in <cell line: 10>()
      8 outputs = model(**inputs, labels=inputs["input_ids"])
      9 loss = outputs.loss
---> 10 loss.backward()

1 frames
[/usr/local/lib/python3.10/dist-packages/torch/_tensor.py](https://localhost:8080/#) in backward(self, gradient, retain_graph, create_graph, inputs)
    485                 inputs=inputs,
    486             )
--> 487         torch.autograd.backward(
    488             self, gradient, retain_graph, create_graph, inputs=inputs
    489         )

[/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py](https://localhost:8080/#) in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    198     # some Python versions print out the first line of a multi-line function
    199     # calls in the traceback and some print out the last line
--> 200     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 768]], which is output 0 of AsStridedBackward0, is at version 12; expected version 11 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```

### Expected behavior

loss.backward() should work out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RWKV - loss.backward() failed #23653

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RWKV - loss.backward() failed #23653

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions