Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Conversation

@juliagmt-google
Copy link
Contributor

@juliagmt-google juliagmt-google commented Apr 4, 2024

Add a file to print a statement.
strategy:
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
container:
image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change image to matrix.docker which should be now since this PR is merged: pytorch/test-infra#5081

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

Copy link
Contributor Author

@juliagmt-google juliagmt-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code and triggered the workflow run.

strategy:
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
container:
image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

@juliagmt-google
Copy link
Contributor Author

Added run-cpu-tests and run-gpu-tests to validate docker images; tested in https://github.com/juliagmt-google/builder/actions/runs/8636731822

  • run-cpu-tests: 3/4 passed, 1/4 failed with local error of insufficient space

  • run-gpu-tests: 4/4 failed due to permission to use linux.g5.4xlarge.nvidia.gpu locally;
    error: Called workflows cannot be queued onto self-hosted runners across organizations/enterprises. Failed to queue this job. Labels: 'linux.g5.4xlarge.nvidia.gpu'.

@atalman atalman merged commit e7948ec into pytorch:main Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants