Skip to content

Conversation

reeselevine
Copy link
Collaborator

This PR adds a few improvements to the host-side setup for WebGPU, which should make it easier to add more operations/improve performance:

  • Creates a pool of parameter buffers which operations can pull from, allowing more in-flight operations.
  • Batches command submissions
  • Some refactoring of the WebGPU setup code to avoid repeating a bunch of code for new operations
  • Uses clang-format to format the WebGPU file, since this wasn't done before (sorry).

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 30, 2025
@reeselevine
Copy link
Collaborator Author

Looks like one test is failing, I'll investigate that.

guokoni

This comment was marked as spam.

@guokoni

This comment was marked as spam.

@github-actions github-actions bot added the devops improvements to build systems and github actions label Aug 1, 2025
@reeselevine
Copy link
Collaborator Author

A couple updates:

  • The CI in Dawn's upstream repository (https://github.com/google/dawn/actions/workflows/ci.yml) have been failing for a few weeks now, which causes issues for the CI here since it was relying on pulling their artifacts. So I created my own release of Dawn with the binaries from the last successful build here: https://github.com/reeselevine/dawn/releases/tag/v1.0.0, and am using that in the WebGPU CI here now.
  • I'm not 100% sure why the WebGPU Ubuntu CI test was failing in earlier commits on this PR, as I could not reproduce it locally. My best guess is that there was something broken in the interaction between Dawn's thread-safe implementation (which they call ImplicitDeviceSynchronization) and the simulated LLVMpipe Vulkan backend used by the CI here. Right now, it seems like combining the successful build from the previous Dawn commits and the updated code in this PR is stable, the CI runs successfully locally and has succeeded in a few successive commits.

Going forward, I'll roll my own releases of Dawn when necessary, and once the Dawn folks are able to get their CI working again. Otherwise the code here is ready for review/merging.

@CISC @ggerganov

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going forward, I'll roll my own releases of Dawn when necessary, and once the Dawn folks are able to get their CI working again.

Sounds good 👍

@reeselevine reeselevine merged commit 587d011 into ggml-org:master Aug 4, 2025
47 checks passed
@CISC
Copy link
Collaborator

CISC commented Aug 4, 2025

@reeselevine It looks like the CI is stalling after a crash due to missing SET_ROWS support.

@reeselevine
Copy link
Collaborator Author

@CISC yeah I noticed that after it was merged 😞. Simplest solution is to disable set rows for now, I have a commit ready: reeselevine@ae8edbf

But I also am working on adding support for SET_ROWS. If I get that done today I'll open a PR with support for it, otherwise I'll open a quick PR with that fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants