Skip to content

GPU OOM during prediction #145

@mitchellxh

Description

@mitchellxh

Calling TreeOfLifeClassifier.predict keeps every batch’s probability tensors on GPU memory until completion, so VRAM usage grows with dataset size. I'm using >10,000 images; this exhausts my 32GB-RTX5000 at about 65% of dataset predicted:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB...

Profiling shows that the create_probabilities_for_images helper keeps probability tensors on the device until the entire dataset finishes.

I fixed this by offloading each batch's probabilities to CPU with a tensor.detach().cpu() step before proceeding to the next. Submitted this as #144.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions