Skip to content

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Sep 29, 2025

The regular CI workflow uses ccache when compiling, and GHA cache action to save and then reseed the .ccache directory from run to run, to dramatically speed up compilation for repeated submissions along a branch or for subsequent pushes of a PR in progress.

The wheel workflow doesn't do this, and is annoyingly slow, but it looks like it should benefit greatly from caching like this. Note that even within a single run, each platform does several builds that differ only by which python version they use, which means 95% of compilation units are not affected by the change of python.

This PR tries to prototype adding GHA caching (just to the Intel Linux jobs, to test it), but dammit, for the life of me, I cannot seem to get ccache itself installed in the container that's used where the wheel is built.

I'm hoping that by submitting this as a draft for everyone to see, somebody will be able to tell me what to do to fix it.

The key is the setting of CIBW_BEFORE_BUILD env variable, which gives commands that run inside the container before the build. The bottom line is that the yum install -y ccache is saying that no ccache package is available, and I can't imagine why. Does anybody know?

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

@zachlewis You set the wheel building up. Do you have any insight here?

I'll also note that there are a few places in the wheel.yml workflow that you set environment variables, but as far as I understand now, they will NOT be seen inside the container unless they are one of the CIBW ones. I believe (though learning the hard way while doing this work) that any arbitrary env variables you need to get all the way to the build have to be listedin the CIBW_ENVIRONMENT_PASS variables, or be set directly in the pyproject.toml file.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

See this log to see the messages about how it can't find ccache.

@zachlewis
Copy link
Collaborator

Hey Larry! I apologize, I've been very distracted these past several weeks.

I'm really not too certain about what's going wrong with the ccache stuff -- I can look into it.

That said, there might be an easier way. CIBW can incrementally build wheels for each cpython (and pypy) interpreter available for the platform on a single runner / in a single job -- in fact, that's CIBW's default behavior. I've found that the trick to getting CIBW to use previously "autobuilt" dependencies instead of re-auto-building dependencies not found at the system level is setting the env var $SKBUILD_BUILD_DIR to a persistent directory (e.g., /tmp/oiio -- not sure what that might be on Windows, maybe something under wherever $PWD might be). Does this sound like a suitable workaround for ccache?

Two things to note:

  • Our dependency-autobuildy stuff attempts to move any built dynamic libraries to a temporary build-specific location, which confuses subsequent incremental builds (because previously-built CMake configs and find modules and whathaveyou will still exist under $SKBUILD_BUILD_DIR/deps/dist even though the actual libraries no longer do, and CMake will understandably fail to realize what happened). Not too much of an issue for us, because we're currently statically building all dependencies, but this will be a problem when we start linking + bundling GPL-licensed dynamic libraries...
  • This incremental building is what elicits the DEFLATE::DEFLATE not found errors I've been seeing with trying to link a previously-autobuilt libtiff -- future builds don't seem to have any awareness of what "DEFLATE" means. My workaround is to force libtiff to always autobuild...

I'll also note that there are a few places in the wheel.yml workflow that you set environment variables, but as far as I understand now, they will NOT be seen inside the container unless they are one of the CIBW ones. I believe (though learning the hard way while doing this work) that any arbitrary env variables you need to get all the way to the build have to be listedin the CIBW_ENVIRONMENT_PASS variables, or be set directly in the pyproject.toml file.

Hmm... that's interesting. Maybe environment variables prefixed with CIBW_ still pass through, because setting the CIBW_BUILD and CIBW_ARCHS environment variables like we are is the only reason why each cibuildwheels task builds for a single architecture / interpreter at a time. As I said, the default CIBW behavior is to build for all python interpreters and interpreter versions available to the host platform. That said, I had previously set USE_Libheif to OFF for the intel mac runners to prevent them from linking the system libheif (which links the system libpng...)... IIRC, the logs indicated that libheif was correctly being disabled, but maybe I'm missing something. I'll see if I can get some clarity here...

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

That said, there might be an easier way. CIBW can incrementally build wheels for each cpython (and pypy) interpreter available for the platform on a single runner / in a single job -- in fact, that's CIBW's default behavior.

Oh, that's fantastic news. I had no idea!

It's not just the auto-built dependencies that can be amortized. The vast majority of OIIO itself has nothing to do with which Python version is used, and will be virtually free to compile (after the first variant) if ccache is operating. It's really only the few cpp files that constitute the python bindings that are different for each wheel of a given platform.

Ideally, we would have only one "job" for each platform, which within the job would separately and sequentially build the variants for each python version. As long as ccache is installed, they should absolutely FLY.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

Does this sound like a suitable workaround for ccache?

Not at all! For any of the scenarios we're discussing, ccache is going to be responsible for achieving the majority of the savings.

Making huge improvements to wheel building speed with very little work is currently held up only by the following question:

  • Why is yum install ccache in the container not installing ccache, when it seems to work fine on a bare runner?

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

I might download and build ccache from source, if I have to, just to prove out how much savings we would get. But I had really hoped to simply install a pre-built version via the package manager.

@zachlewis
Copy link
Collaborator

Understood! I'll see what I can dig up.
Maybe the linux CIBW images use dnf instead of yum or something?

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

Understood! I'll see what I can dig up. Maybe the linux CIBW images use dnf instead of yum or something?

Tried that already.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

I suspect that the problem is something like an unusually limited set of remote repos being known to yum as set up in the container, and ccache not being in them, instead needing some other repo to be activated for ccache to be found. That's the kind of thing I'm expecting it to turn out to be.

@zachlewis
Copy link
Collaborator

Hmm. Yeah, that's very strange.
Maybe for now, we can just download the binary archive from https://github.com/ccache/ccache/releases/download/v4.12/ccache-4.12-linux-x86_64.tar.xz (or ccache-4.12-linux_aarch64.tar.xz, etc) and dump to /usr/bin, if it's easier.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 29, 2025

I will try that!

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 30, 2025

OK, I tried both downloading ccache binaries, as well as building it from source. I can make either of those "work" in the sense of building and then using ccache for the compilation.

BUT

It's deceptively difficult to make this work in practice!

For our usual CI run, the entire CI job, including each individual Actions step, runs in the container image you've chosen (for example, one of the aswf-docker images). In other words, you're doing the compilation (using the ccache cache) in the same container as the steps where you save the ccache or restore it from GHA's caches.

But in the wheel workflow, the GHA actions are happening on the bare Ubuntu runner, but the build itself is happening within a container set up by the one "cibuildwheel" action.

So the long and short of it is, I haven't identified any directory where I can put the ccache cache files that exist "outside the build container" and thus would have the files visible for the subsequent cache-save step!

Still plugging away at it...

@zachlewis
Copy link
Collaborator

FWIW, I think cibuildwheel ultimately copies anything under the container's /output directory to the host's $PWD/wheelhouse directory, where $PWD is the directory in which the cibuildwheel command is executed (e.g., the git repo root). This is the mechanism by which the various *.whl files pooped out by the cibuildwheel process end up on the host's storage.

@lgritz
Copy link
Collaborator Author

lgritz commented Sep 30, 2025

Ah, ok, maybe there is a way...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants