Skip to content

Conversation

Madeeks
Copy link

@Madeeks Madeeks commented Oct 3, 2025

Support for zstd layers was introduced with commit 4d843dc, however the current implementation assumes that all layers in the image have the same media type (.layers[0].mediaType).

When importing images which have mixed layer types (e.g. gzip compression and zstd compression), this behavior causes layer extraction to fail, since layers with a type different from the first one are not pre-processed properly.

This PR addresses the issue by passing the individual media type for each layer when calling the docker::_download_extract function.

Media types are extracted from the image manifest in the same way as layers' digests.

@Madeeks Madeeks force-pushed the fix_mixed_media_types branch from 13ab29f to 27bb2bf Compare October 3, 2025 16:43
@flx42
Copy link
Member

flx42 commented Oct 3, 2025

Do you happen to have an example of a Docker Hub image that has mixed layer types? I couldn't find one when I added the feature.

@Madeeks
Copy link
Author

Madeeks commented Oct 3, 2025

I have some images I created with Podman (set to push layers with zstd compression) on top of Docker Hub CUDA images, which have gzip-compressed layers.
For example, an image providing OpenMPI: quay.io/ethcscs/ompi:5.0.8-ofi1.22-cuda12.8

skopeo inspect --raw docker://quay.io/ethcscs/ompi:5.0.8-ofi1.22-cuda12.8 | jq .layers[].mediaType
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+gzip"
"application/vnd.oci.image.layer.v1.tar+zstd"
"application/vnd.oci.image.layer.v1.tar+zstd"
"application/vnd.oci.image.layer.v1.tar+zstd"
"application/vnd.oci.image.layer.v1.tar+zstd"
"application/vnd.oci.image.layer.v1.tar+zstd"

@Madeeks
Copy link
Author

Madeeks commented Oct 3, 2025

If you think it could be useful, I could just create a much simpler image with a zstd layer on top of a Docker Hub's Ubuntu image (which is made of a gzip layer)

@flx42
Copy link
Member

flx42 commented Oct 3, 2025 via email

@flx42
Copy link
Member

flx42 commented Oct 7, 2025

So far I'm not able to create a mixed image with podman (compiled from sources today), it seems podman is always recompressing all layers, even if the base gzip layers already exist in the target registry. Is there a particular trick you had to use during the build/push?

@flx42
Copy link
Member

flx42 commented Oct 7, 2025

Ah, I got it, I had to add --force-compression=false to podman push, I can reproduce the problem now!

@flx42
Copy link
Member

flx42 commented Oct 7, 2025

I remember now why I assumed all layers have the same time :) Because I couldn't find mixed images, and also because it's a bit more complex than that to handle.

Just above the call to _download_extract, there is the following:

    # Check which digests are already cached.
    printf "%s\n" "${layers[@]}" \
      | sort -u \
      | sort - <(ls "${ENROOT_CACHE_PATH}") \
      | uniq -d \
      | paste -sd '|' - \
      | common::read -r cached_digests

    if [ -n "${cached_digests}" ]; then
        printf "%s\n" "${layers[@]}" \
          | { grep -Ev "${cached_digests}" || :; } \
          | readarray -t missing_digests
    fi

So it filters the list of digests to download, but the media_types array is not filtered.
You can test by importing an image, manually removing one of the base layer from the enroot cache, and then importing the image again. The image will import fine, but if you inspect the cache you will see that the layer was re-downloaded as gzip, because this layer was downloaded with the wrong media type argument.

@Madeeks
Copy link
Author

Madeeks commented Oct 8, 2025

I double-checked the Podman installation I used and unless I'm missing something, I did not resort to any particular trick beyond just defining compression_format=zstd in the containers.conf file ([engine] table):

compression_format="gzip"

Specifies the compression format to use when pushing an image. Supported values are: gzip, zstd and zstd:chunked. This field is ignored when pushing images to the docker-daemon and docker-archive formats. It is also ignored when the manifest format is set to v2s2. zstd:chunked is incompatible with encrypting images, and will be treated as zstd with a warning in that case.

Regarding accounting for the media types of missing layers from the cache, you are right, I missed that detail.
I have a slightly different implementation on a private fork where this is accounted for implicitly, but with a slightly less elegant approach:

  1. store the image manifest in a bash variable
  2. pass the whole manifest into _download_extract
  3. inside _download_extract, parse again the manifest to extract the media type of the layer to download so that everything matches.

When I saw your implementation reading both media types and layer digests in a single parsing of the manifest (without carrying the latter around), I liked the approach and tried to adapt to it, but missed the additional implication of resolving media types for non-cached layers.

@flx42
Copy link
Member

flx42 commented Oct 8, 2025

So what do you want to do? You have convinced me that we need to support mixed layer types, but do you want to do it and suggest another patch? Or I can pick that up at a later time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants