Skip to content

Commit faf6b76

Browse files
committed
Add comparison with SHA2-256 and a spicy TLDR
1 parent 1d6225d commit faf6b76

File tree

1 file changed

+37
-13
lines changed
  • src/app/blog/hashing-multiple-blobs-with-BLAKE3

1 file changed

+37
-13
lines changed

src/app/blog/hashing-multiple-blobs-with-BLAKE3/page.mdx

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,9 @@ But what if you have a situation where you don't have enough chunks to work with
3535

3636
For this exploration, we are going to assume that all blobs have the same size, and that this size is known at compile time.
3737

38-
So the signature of the function we want to implement is
38+
<Note>TLDR: This post demonstrates BLAKE3 can be silly fast, even for small blobs</Note>
39+
40+
The signature of the function we want to implement is
3941

4042
```rust
4143
fn hash_many<const N: usize>(slices: &[[u8;N]]) -> Vec<Hash>
@@ -127,9 +129,9 @@ The hazmat API gives you the ability to use the `Hasher` to compute the intermed
127129

128130
But the API still focuses around the `Hasher`, so it still works only for computing data for *individual* blobs.
129131

130-
## Extending the public API
132+
## Using the internal platform API
131133

132-
So it looks like we have no choice but to dig deeper and see if we can extend the public API.
134+
So it looks like we have no choice but to dig deeper and see if we can implement this using existing internals.
133135

134136
What we definitely don't want to touch for this small exploration is the hand-optimized SIMD code. So let's look at the entry point to the SIMD code and check if we can repurpose it to work with multiple blobs.
135137

@@ -295,16 +297,31 @@ hash_many_simd_rayon 1024 bytes, 1048576 blobs: 75.162083ms
295297

296298
The result is pretty good. We get a factor 17 speed up over the reference implementation, and still a factor 2.1 speedup over just using rayon.
297299

298-
# A public API?
300+
Comparing with SHA2-256, we get an improvement of ~2.5 when hashing both sequentially, an improvement of 2.6 if we hash both using rayon, and an improvement of 5.4 if we use SIMD+rayon for BLAKE3 and just rayon for SHA2.
301+
302+
```
303+
Speedups over SHA2-256:
304+
sequential: 2.5049265097570244
305+
rayon: 2.6302891590885866
306+
rayon+simd: 5.3943491646477755
307+
```
308+
309+
The improvement will vary a lot between architectures and depending on the chosen small blob size.
310+
311+
# What would a public API look like?
299312

300313
The fn we have implemented for the bechmarks is very limited. The number of blobs to hash must be a multiple of the platform specific `MAX_SIMD_DEGREE`, the blobs to be hashed must be all the same size, and the size must be a multiple of the BLOCK_LEN of 64 bytes.
301314

302-
We can relax most of these constraints with some extra effort. But having *different sized* small blobs would be a can of worms.
315+
We can relax most of these constraints with some extra effort.
316+
317+
But having *different sized* small blobs would be a can of worms. It would require changes to the SIMD implementation itself, such as the ability to set the offset per block instead of just having the option to increment or not.
303318

304319
In addition, at present the API only supports hashing an array of slices in memory. There might be situations where you have an iterator of slices but don't want to collect them into a vec for hashing.
305320

306321
Also, if you have blobs that are more than 1 chunk but less than simd_degree chunks in size, currently there is no way to hash those using `Platform::hash_many`, so you would have to fall back to sequential hashing.
307322

323+
Last but not least, requiring the blob size to be known at compile time is limiting.
324+
308325
So I am not sure how a public API for hashing multiple blobs would look like.
309326

310327
# Try it out
@@ -316,18 +333,25 @@ So I am not sure how a public API for hashing multiple blobs would look like.
316333
317334
Platform: NEON
318335
rayon threads: 10
319-
hash_many_baseline 1024 bytes, 1048576 blobs: 1.309129958s
320-
hash_many_rayon_simple 1024 bytes, 1048576 blobs: 153.760791ms
321-
hash_many_simd 1024 bytes, 1048576 blobs: 549.289042ms
322-
hash_many_simd_rayon 1024 bytes, 1048576 blobs: 74.79275ms
336+
hash_many_baseline 1024 bytes, 1048576 blobs: 1.254154625s
337+
hash_many_rayon_simple 1024 bytes, 1048576 blobs: 152.511417ms
338+
hash_many_simd 1024 bytes, 1048576 blobs: 563.662083ms
339+
hash_many_simd_rayon 1024 bytes, 1048576 blobs: 79.925208ms
340+
sha2_hash_many_baseline 1024 bytes, 1048576 blobs: 3.270222791s
341+
sha2_hash_many_rayon 1024 bytes, 1048576 blobs: 403.399834ms
323342
324343
Speedups over baseline:
325-
rayon: 8.514068830460165
326-
simd: 2.383317084268359
327-
simd+rayon: 17.50343392909072
344+
rayon: 8.223349108349048
345+
simd: 2.2250115145673193
346+
simd+rayon: 15.691602892043772
328347
329348
Speedups over rayon:
330-
simd+rayon: 2.055824809222819
349+
simd+rayon: 1.908176666865853
350+
351+
Speedups over SHA2-256:
352+
sequential: 2.6075116463410564
353+
rayon: 2.6450467901691583
354+
rayon+simd: 5.047216567769207
331355
```
332356

333357
I would be curious what the ratio is on different architectures. Try it out and let me know on X (@klaehnr) or bluesky (@rklaehn.bsky.social).

0 commit comments

Comments
 (0)