Incorrect segmentation results on float input in 4.31.0

### System Info

python-3.9.10
transformers-4.31.0
pytorch-2.0.1

### Who can help?

@amyeroberts

### Information

- [X] The official example scripts
- [X] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

The following example (based on the examples from the docs) gives consistent results with transformers-4.27.2 whether or not `image` is kept as `uint8` or converted to `float32`. But with 4.31.0, the result is wrong when using the `float32` input:

```
import torch
import numpy as np

from transformers import AutoImageProcessor, UperNetForSemanticSegmentation
from PIL import Image
from huggingface_hub import hf_hub_download

image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-tiny")
model = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-tiny")

filepath = hf_hub_download(
    repo_id="hf-internal-testing/fixtures_ade20k", filename="ADE_val_00000001.jpg", repo_type="dataset"
)

image = Image.open(filepath).convert("RGB")
image = np.array(image)

# Comment the line below to get the right result in 4.31.0
image = image.astype(np.float32)/255.0

inputs = image_processor(images=image, return_tensors="pt").pixel_values
outputs = model(inputs)
sizes = [np.array(image).shape[:2]]
seg = torch.stack(image_processor.post_process_semantic_segmentation(outputs, target_sizes=sizes))
torch.unique(seg)
```



### Expected behavior

Expected result (observerd behaviour in 4.27.2 regardless of whether the float conversion is commented out):
```
tensor([ 0,  1,  2,  4,  6,  9, 17, 25, 52, 53])
```

Actual result in 4.31.0 (unless the float conversion is commented out and the input image is kept as `uint8`):
```
tensor([2])
```
(i.e., the whole image is perceived as one class)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect segmentation results on float input in 4.31.0 #25195

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect segmentation results on float input in 4.31.0 #25195

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions