Skip to content

Conversation

@sjakobi
Copy link
Member

@sjakobi sjakobi commented Oct 21, 2025

This reduces the core size for the inner loop of two by 10 terms.

This reduces the core size for the inner loop of two by 10 terms.
@sjakobi
Copy link
Member Author

sjakobi commented Oct 21, 2025

To my surprise, this actually seems to make fromList (which uses unsafeInsert) slower!

Before:

$ fine-grained --stdev 0.6 -p fromList.Int -p '$NF == "1000" || $NF == "10000"'
All
  HashMap.Strict
    fromList
      Int
        1000:  OK
          37.5 μs ± 226 ns
        10000: OK
          540  μs ± 5.5 μs
  HashSet
    fromList
      Int
        1000:  OK
          37.7 μs ± 206 ns
        10000: OK
          546  μs ± 3.5 μs

After:

$ fine-grained --stdev 0.6 -p fromList.Int -p '$NF == "1000" || $NF == "10000"'
All
  HashMap.Strict
    fromList
      Int
        1000:  OK
          38.3 μs ± 426 ns
        10000: OK
          551  μs ± 4.0 μs
  HashSet
    fromList
      Int
        1000:  OK
          39.1 μs ± 325 ns
        10000: OK
          560  μs ± 5.6 μs

(This is with GHC-9.12.2 on a ThinkPad with an Intel i5-8265U CPU)

I don't understand why, yet.

@sjakobi
Copy link
Member Author

sjakobi commented Oct 21, 2025

Core, STG, Cmm and ASM all look slightly improved. I will run the benchmarks on a different machine too.

@treeowl
Copy link
Collaborator

treeowl commented Oct 21, 2025

Interesting. Could you explain how this works? Also, what's the Core look like? I wonder if the new one made branch prediction less reliable or if it's just alignment nonsense.

@sjakobi
Copy link
Member Author

sjakobi commented Oct 22, 2025

So mask h s is just unsafeShiftL 1 (index h s). Therefore, when i1 > i2, we also have bp1 > bp2 and we can avoid recomputing the index.

idx2 is 0 or 1 depending on the comparison, so we can use the result of the comparison directly instead of branching on it.

Here's the Core diff for two.go

 Rec {
--- RHS size: {terms: 105, types: 165, coercions: 0, joins: 0/2}
+-- RHS size: {terms: 95, types: 165, coercions: 0, joins: 0/2}
 $wgo
   :: forall k v s.
      Int#
@@ -11131,18 +11131,18 @@ $wgo
       (t2 :: HashMap k v)
       (eta [OS=OneShot] :: State# s) ->
       let {
-        bp2 :: Word#
+        bp2# :: Word#
         [LclId]
-        bp2
+        bp2#
           = uncheckedShiftL#
               1## (word2Int# (and# (uncheckedShiftRL# ww2 ww) 31##)) } in
       let {
-        bp1 :: Word#
+        bp1# :: Word#
         [LclId]
-        bp1
+        bp1#
           = uncheckedShiftL#
               1## (word2Int# (and# (uncheckedShiftRL# ww1 ww) 31##)) } in
-      case eqWord# bp1 bp2 of {
+      case eqWord# bp1# bp2# of {
         __DEFAULT ->
           case k1 of conrep { __DEFAULT ->
           case newSmallArray#
@@ -11150,20 +11150,12 @@ $wgo
           of
           { (# ipv, ipv1 #) ->
           case writeSmallArray#
-                 @Lifted
-                 @s
-                 @(HashMap k v)
-                 ipv1
-                 (<#
-                    (word2Int# (and# (uncheckedShiftRL# ww1 ww) 31##))
-                    (word2Int# (and# (uncheckedShiftRL# ww2 ww) 31##)))
-                 t2
-                 ipv
+                 @Lifted @s @(HashMap k v) ipv1 (ltWord# bp1# bp2#) t2 ipv
           of s'
           { __DEFAULT ->
           case unsafeFreezeSmallArray# @Lifted @s @(HashMap k v) ipv1 s' of
           { (# ipv2, ipv3 #) ->
-          (# ipv2, or# bp1 bp2, ipv3 #)
+          (# ipv2, or# bp1# bp2#, ipv3 #)
           }
           }
           }
@@ -11177,7 +11169,7 @@ $wgo
           { (# ipv, ipv1 #) ->
           case unsafeFreezeSmallArray# @Lifted @s @(HashMap k v) ipv1 ipv of
           { (# ipv2, ipv3 #) ->
-          (# ipv2, bp1, ipv3 #)
+          (# ipv2, bp1#, ipv3 #)
           }
           }
           }

I wouldn't expect much of a speedup from this change, since the operations avoided are very cheap, but I didn't expect a slow-down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants