Skip to content

Conversation

jonathanc-n
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

We can use u32 indices instead of u64 indices when there are less than u32::MAX rows when building the hashmap. This acts as a memory optimization

What changes are included in this PR?

During HashJoinExec we construct the JoinLeftData with a Box<dyn JoinHashMapType> choosing between a u32 indice or u64 JoinHashMap.

I changed the JoinHashMapType to hold the update_from_iter, get_matched_indice, and get_matched_indices_with_limit_offset and split the JoinHashMap into JoinHashMapU32 and JoinHashMapU64.

I deliberately did not try to expose a generic in the trait, nor did I try to create a generic on the JoinHashMap struct as doing so would lead to many problems with having to call preceding functions with a generic; doing so is not possible since we are determining the JoinHashMapType during runtime.

Are these changes tested?

Yes I added a test for checking the hashmap created using u32 indices.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 18, 2025
@jonathanc-n
Copy link
Contributor Author

cc @Dandandan

@alamb
Copy link
Contributor

alamb commented Jun 18, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing support-u32-hashmap (732fa21) to 056f546 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 18, 2025

🤖: Benchmark completed

Details

Comparing HEAD and support-u32-hashmap
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1891.38 ms │          1886.58 ms │     no change │
│ QQuery 1     │   697.56 ms │           704.74 ms │     no change │
│ QQuery 2     │  1381.91 ms │          1361.41 ms │     no change │
│ QQuery 3     │   669.87 ms │           634.92 ms │ +1.06x faster │
│ QQuery 4     │  1327.15 ms │          1339.32 ms │     no change │
│ QQuery 5     │ 14912.55 ms │         14887.37 ms │     no change │
│ QQuery 6     │  2044.07 ms │          2059.57 ms │     no change │
│ QQuery 7     │  1784.47 ms │          1873.16 ms │     no change │
│ QQuery 8     │   797.99 ms │           802.46 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 25506.95ms │
│ Total Time (support-u32-hashmap)   │ 25549.53ms │
│ Average Time (HEAD)                │  2834.11ms │
│ Average Time (support-u32-hashmap) │  2838.84ms │
│ Queries Faster                     │          1 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │          8 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │    15.40 ms │            15.56 ms │    no change │
│ QQuery 1     │    33.85 ms │            33.67 ms │    no change │
│ QQuery 2     │    81.80 ms │            80.27 ms │    no change │
│ QQuery 3     │    94.51 ms │            98.91 ms │    no change │
│ QQuery 4     │   580.34 ms │           587.97 ms │    no change │
│ QQuery 5     │   818.51 ms │           852.27 ms │    no change │
│ QQuery 6     │    22.18 ms │            23.65 ms │ 1.07x slower │
│ QQuery 7     │    36.35 ms │            36.60 ms │    no change │
│ QQuery 8     │   840.07 ms │           846.70 ms │    no change │
│ QQuery 9     │  1163.06 ms │          1126.59 ms │    no change │
│ QQuery 10    │   256.44 ms │           253.13 ms │    no change │
│ QQuery 11    │   284.27 ms │           275.93 ms │    no change │
│ QQuery 12    │   855.55 ms │           892.86 ms │    no change │
│ QQuery 13    │  1240.05 ms │          1256.85 ms │    no change │
│ QQuery 14    │   788.52 ms │           799.55 ms │    no change │
│ QQuery 15    │   756.94 ms │           759.38 ms │    no change │
│ QQuery 16    │  1588.12 ms │          1573.61 ms │    no change │
│ QQuery 17    │  1596.56 ms │          1595.53 ms │    no change │
│ QQuery 18    │  2844.62 ms │          2904.45 ms │    no change │
│ QQuery 19    │    82.55 ms │            85.96 ms │    no change │
│ QQuery 20    │  1133.70 ms │          1187.01 ms │    no change │
│ QQuery 21    │  1273.35 ms │          1310.45 ms │    no change │
│ QQuery 22    │  2112.94 ms │          2171.74 ms │    no change │
│ QQuery 23    │  7356.55 ms │          7416.88 ms │    no change │
│ QQuery 24    │   427.69 ms │           444.19 ms │    no change │
│ QQuery 25    │   299.39 ms │           302.95 ms │    no change │
│ QQuery 26    │   433.22 ms │           449.51 ms │    no change │
│ QQuery 27    │  1555.11 ms │          1553.18 ms │    no change │
│ QQuery 28    │ 11668.95 ms │         11887.97 ms │    no change │
│ QQuery 29    │   519.95 ms │           508.95 ms │    no change │
│ QQuery 30    │   768.44 ms │           778.51 ms │    no change │
│ QQuery 31    │   799.04 ms │           818.27 ms │    no change │
│ QQuery 32    │  2409.37 ms │          2374.14 ms │    no change │
│ QQuery 33    │  3138.93 ms │          3153.14 ms │    no change │
│ QQuery 34    │  3162.98 ms │          3160.19 ms │    no change │
│ QQuery 35    │  1212.81 ms │          1231.29 ms │    no change │
│ QQuery 36    │   123.25 ms │           125.52 ms │    no change │
│ QQuery 37    │    57.19 ms │            57.98 ms │    no change │
│ QQuery 38    │   127.02 ms │           124.89 ms │    no change │
│ QQuery 39    │   195.28 ms │           197.94 ms │    no change │
│ QQuery 40    │    47.18 ms │            48.68 ms │    no change │
│ QQuery 41    │    45.15 ms │            42.97 ms │    no change │
│ QQuery 42    │    38.79 ms │            39.72 ms │    no change │
└──────────────┴─────────────┴─────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 52886.01ms │
│ Total Time (support-u32-hashmap)   │ 53485.49ms │
│ Average Time (HEAD)                │  1229.91ms │
│ Average Time (support-u32-hashmap) │  1243.85ms │
│ Queries Faster                     │          0 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         42 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ support-u32-hashmap ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 100.68 ms │            99.40 ms │ no change │
│ QQuery 2     │  21.39 ms │            21.69 ms │ no change │
│ QQuery 3     │  32.64 ms │            32.28 ms │ no change │
│ QQuery 4     │  18.13 ms │            18.58 ms │ no change │
│ QQuery 5     │  48.89 ms │            49.42 ms │ no change │
│ QQuery 6     │  11.87 ms │            11.86 ms │ no change │
│ QQuery 7     │  87.37 ms │            83.45 ms │ no change │
│ QQuery 8     │  23.84 ms │            23.90 ms │ no change │
│ QQuery 9     │  53.64 ms │            53.53 ms │ no change │
│ QQuery 10    │  42.80 ms │            42.86 ms │ no change │
│ QQuery 11    │  11.21 ms │            11.22 ms │ no change │
│ QQuery 12    │  34.76 ms │            34.72 ms │ no change │
│ QQuery 13    │  26.15 ms │            26.24 ms │ no change │
│ QQuery 14    │   9.82 ms │             9.88 ms │ no change │
│ QQuery 15    │  19.41 ms │            19.59 ms │ no change │
│ QQuery 16    │  18.81 ms │            18.85 ms │ no change │
│ QQuery 17    │  96.08 ms │            94.52 ms │ no change │
│ QQuery 18    │ 193.90 ms │           189.03 ms │ no change │
│ QQuery 19    │  25.54 ms │            25.12 ms │ no change │
│ QQuery 20    │  33.31 ms │            31.69 ms │ no change │
│ QQuery 21    │ 148.25 ms │           146.30 ms │ no change │
│ QQuery 22    │  15.18 ms │            15.31 ms │ no change │
└──────────────┴───────────┴─────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1073.66ms │
│ Total Time (support-u32-hashmap)   │ 1059.45ms │
│ Average Time (HEAD)                │   48.80ms │
│ Average Time (support-u32-hashmap) │   48.16ms │
│ Queries Faster                     │         0 │
│ Queries Slower                     │         0 │
│ Queries with No Change             │        22 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@jonathanc-n
Copy link
Contributor Author

Those benchmarks make sense, just saves memory.

@alamb alamb requested review from Dandandan and Copilot June 19, 2025 11:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for using 32-bit row indices in the hash join executor when the build side has fewer than u32::MAX rows to save memory.

  • Introduce JoinHashMapType trait with JoinHashMapU32 and JoinHashMapU64 implementations
  • Change all callers and constructors of the old JoinHashMap to use Box<dyn JoinHashMapType>
  • Update memory estimation to pick u32 or u64 variant based on row count and add corresponding tests

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
joins/utils.rs Remove obsolete JoinHashMap export, keep only JoinHashMapType
joins/symmetric_hash_join.rs Update get_matched_indices calls to accept boxed iterator
joins/stream_join_utils.rs Implement JoinHashMapType for PruningJoinHashMap using new helpers
joins/join_hash_map.rs Define JoinHashMapType, JoinHashMapU32, JoinHashMapU64, and generic helpers
joins/hash_join.rs Construct boxed hash map variant, update memory estimation, and adjust tests
Comments suppressed due to low confidence (3)

datafusion/physical-plan/src/joins/join_hash_map.rs:38

  • [nitpick] Clarify this comment to indicate that the chained list uses either Vec<u32> or Vec<u64> rather than both. E.g., “stored as either Vec<u32> or Vec<u64> based on size requirements.”
/// The indices (values) are stored in a separate chained list stored as `Vec<u32>` `Vec<u64>`.

datafusion/physical-plan/src/joins/hash_join.rs:3581

  • [nitpick] The test name now differs from the u32 variant (which uses collisions plural). For consistency, consider renaming both tests to match a common pattern, e.g., join_with_hash_collision_u32 and join_with_hash_collision_u64.
    fn join_with_hash_collisions_u64() -> Result<()> {

datafusion/physical-plan/src/joins/join_hash_map.rs:93

  • [nitpick] Add a doc comment to JoinHashMapType explaining its purpose and when each method should be used; this will help maintainers understand the runtime‐selected index strategy.
pub trait JoinHashMapType: Send + Sync {

@jonathanc-n
Copy link
Contributor Author

@Dandandan Would you be able to take a look? thanks!

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @jonathanc-n

I am wondering if we get some improvements on larger joins (i.e. indices don't fit in cpu cache).

@jonathanc-n
Copy link
Contributor Author

@Dandandan I can see if I can run some benchmarks.

@alamb this should be good to go, i'll see if I can implement some much needed hash join spilling after this gets merged 🚀

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing support-u32-hashmap (6c1543d) to ebb8e95 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

🤖: Benchmark completed

Details

Comparing HEAD and support-u32-hashmap
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2104.73 ms │          1846.85 ms │ +1.14x faster │
│ QQuery 1     │   694.10 ms │           813.41 ms │  1.17x slower │
│ QQuery 2     │  1312.83 ms │          1507.45 ms │  1.15x slower │
│ QQuery 3     │   691.45 ms │           678.94 ms │     no change │
│ QQuery 4     │  1373.17 ms │          1348.65 ms │     no change │
│ QQuery 5     │ 15124.71 ms │         15473.54 ms │     no change │
│ QQuery 6     │  2047.34 ms │          2060.53 ms │     no change │
│ QQuery 7     │  1873.73 ms │          1838.61 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 25222.05ms │
│ Total Time (support-u32-hashmap)   │ 25567.98ms │
│ Average Time (HEAD)                │  3152.76ms │
│ Average Time (support-u32-hashmap) │  3196.00ms │
│ Queries Faster                     │          1 │
│ Queries Slower                     │          2 │
│ Queries with No Change             │          5 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.43 ms │             2.18 ms │ +1.11x faster │
│ QQuery 1     │    35.82 ms │            34.78 ms │     no change │
│ QQuery 2     │    82.71 ms │            82.00 ms │     no change │
│ QQuery 3     │    97.18 ms │            97.71 ms │     no change │
│ QQuery 4     │   649.57 ms │           589.40 ms │ +1.10x faster │
│ QQuery 5     │   880.40 ms │           885.32 ms │     no change │
│ QQuery 6     │     2.31 ms │             2.19 ms │ +1.06x faster │
│ QQuery 7     │    38.57 ms │            39.24 ms │     no change │
│ QQuery 8     │   875.15 ms │           871.19 ms │     no change │
│ QQuery 9     │  1204.58 ms │          1197.94 ms │     no change │
│ QQuery 10    │   256.41 ms │           264.73 ms │     no change │
│ QQuery 11    │   295.15 ms │           294.16 ms │     no change │
│ QQuery 12    │   876.87 ms │           898.49 ms │     no change │
│ QQuery 13    │  1253.67 ms │          1260.00 ms │     no change │
│ QQuery 14    │   805.60 ms │           822.48 ms │     no change │
│ QQuery 15    │   775.34 ms │           792.68 ms │     no change │
│ QQuery 16    │  1582.94 ms │          1617.44 ms │     no change │
│ QQuery 17    │  1575.57 ms │          1620.19 ms │     no change │
│ QQuery 18    │  2812.98 ms │          2877.62 ms │     no change │
│ QQuery 19    │    86.50 ms │            86.36 ms │     no change │
│ QQuery 20    │  1147.33 ms │          1142.38 ms │     no change │
│ QQuery 21    │  1304.30 ms │          1299.10 ms │     no change │
│ QQuery 22    │  2109.41 ms │          2165.23 ms │     no change │
│ QQuery 23    │  7382.16 ms │          7578.04 ms │     no change │
│ QQuery 24    │   433.78 ms │           449.56 ms │     no change │
│ QQuery 25    │   305.40 ms │           308.23 ms │     no change │
│ QQuery 26    │   428.78 ms │           444.90 ms │     no change │
│ QQuery 27    │  1537.21 ms │          1565.57 ms │     no change │
│ QQuery 28    │ 12883.98 ms │         11966.19 ms │ +1.08x faster │
│ QQuery 29    │   532.77 ms │           513.35 ms │     no change │
│ QQuery 30    │   765.17 ms │           783.38 ms │     no change │
│ QQuery 31    │   770.79 ms │           808.37 ms │     no change │
│ QQuery 32    │  2367.00 ms │          2361.17 ms │     no change │
│ QQuery 33    │  3106.32 ms │          3186.17 ms │     no change │
│ QQuery 34    │  3283.80 ms │          3176.69 ms │     no change │
│ QQuery 35    │  1228.46 ms │          1254.86 ms │     no change │
│ QQuery 36    │   122.51 ms │           122.45 ms │     no change │
│ QQuery 37    │    54.42 ms │            52.77 ms │     no change │
│ QQuery 38    │   118.02 ms │           118.88 ms │     no change │
│ QQuery 39    │   196.50 ms │           194.35 ms │     no change │
│ QQuery 40    │    40.97 ms │            40.46 ms │     no change │
│ QQuery 41    │    39.08 ms │            38.28 ms │     no change │
│ QQuery 42    │    33.07 ms │            33.30 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 54381.00ms │
│ Total Time (support-u32-hashmap)   │ 53939.79ms │
│ Average Time (HEAD)                │  1264.67ms │
│ Average Time (support-u32-hashmap) │  1254.41ms │
│ Queries Faster                     │          4 │
│ Queries Slower                     │          0 │
│ Queries with No Change             │         39 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  98.54 ms │            97.07 ms │     no change │
│ QQuery 2     │  20.04 ms │            20.71 ms │     no change │
│ QQuery 3     │  32.35 ms │            31.90 ms │     no change │
│ QQuery 4     │  18.71 ms │            18.54 ms │     no change │
│ QQuery 5     │  50.28 ms │            50.68 ms │     no change │
│ QQuery 6     │  11.77 ms │            11.75 ms │     no change │
│ QQuery 7     │  93.46 ms │            88.13 ms │ +1.06x faster │
│ QQuery 8     │  25.49 ms │            25.07 ms │     no change │
│ QQuery 9     │  54.54 ms │            53.28 ms │     no change │
│ QQuery 10    │  43.48 ms │            42.65 ms │     no change │
│ QQuery 11    │  11.37 ms │            11.35 ms │     no change │
│ QQuery 12    │  36.02 ms │            34.57 ms │     no change │
│ QQuery 13    │  26.86 ms │            26.67 ms │     no change │
│ QQuery 14    │   9.77 ms │             9.57 ms │     no change │
│ QQuery 15    │  19.01 ms │            18.98 ms │     no change │
│ QQuery 16    │  18.38 ms │            18.23 ms │     no change │
│ QQuery 17    │  96.47 ms │            97.63 ms │     no change │
│ QQuery 18    │ 185.86 ms │           201.01 ms │  1.08x slower │
│ QQuery 19    │  24.92 ms │            24.58 ms │     no change │
│ QQuery 20    │  31.25 ms │            32.14 ms │     no change │
│ QQuery 21    │ 146.24 ms │           143.79 ms │     no change │
│ QQuery 22    │  14.79 ms │            14.93 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1069.58ms │
│ Total Time (support-u32-hashmap)   │ 1073.22ms │
│ Average Time (HEAD)                │   48.62ms │
│ Average Time (support-u32-hashmap) │   48.78ms │
│ Queries Faster                     │         1 │
│ Queries Slower                     │         1 │
│ Queries with No Change             │        20 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb

This comment was marked as outdated.

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing support-u32-hashmap (6c1543d) to ebb8e95 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

🤖: Benchmark completed

Details

Comparing HEAD and support-u32-hashmap
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1930.70 ms │          1957.94 ms │    no change │
│ QQuery 1     │   677.21 ms │           798.33 ms │ 1.18x slower │
│ QQuery 2     │  1357.45 ms │          1481.82 ms │ 1.09x slower │
│ QQuery 3     │   681.52 ms │           678.49 ms │    no change │
│ QQuery 4     │  1382.40 ms │          1383.83 ms │    no change │
│ QQuery 5     │ 15028.67 ms │         15023.36 ms │    no change │
│ QQuery 6     │  2054.14 ms │          2068.44 ms │    no change │
│ QQuery 7     │  1933.85 ms │          1881.38 ms │    no change │
└──────────────┴─────────────┴─────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 25045.95ms │
│ Total Time (support-u32-hashmap)   │ 25273.59ms │
│ Average Time (HEAD)                │  3130.74ms │
│ Average Time (support-u32-hashmap) │  3159.20ms │
│ Queries Faster                     │          0 │
│ Queries Slower                     │          2 │
│ Queries with No Change             │          6 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.21 ms │             2.71 ms │  1.22x slower │
│ QQuery 1     │    33.32 ms │            34.20 ms │     no change │
│ QQuery 2     │    79.77 ms │            80.70 ms │     no change │
│ QQuery 3     │    99.32 ms │            98.22 ms │     no change │
│ QQuery 4     │   592.98 ms │           588.77 ms │     no change │
│ QQuery 5     │   827.02 ms │           838.44 ms │     no change │
│ QQuery 6     │     2.24 ms │             2.25 ms │     no change │
│ QQuery 7     │    38.09 ms │            38.69 ms │     no change │
│ QQuery 8     │   853.29 ms │           853.08 ms │     no change │
│ QQuery 9     │  1136.14 ms │          1152.72 ms │     no change │
│ QQuery 10    │   261.54 ms │           261.42 ms │     no change │
│ QQuery 11    │   285.91 ms │           293.48 ms │     no change │
│ QQuery 12    │   843.87 ms │           862.33 ms │     no change │
│ QQuery 13    │  1263.17 ms │          1227.42 ms │     no change │
│ QQuery 14    │   794.84 ms │           811.95 ms │     no change │
│ QQuery 15    │   758.54 ms │           773.98 ms │     no change │
│ QQuery 16    │  1625.44 ms │          1593.33 ms │     no change │
│ QQuery 17    │  1583.75 ms │          1604.20 ms │     no change │
│ QQuery 18    │  2820.29 ms │          2840.30 ms │     no change │
│ QQuery 19    │    86.91 ms │            84.85 ms │     no change │
│ QQuery 20    │  1149.21 ms │          1149.69 ms │     no change │
│ QQuery 21    │  1284.54 ms │          1308.73 ms │     no change │
│ QQuery 22    │  2113.86 ms │          2149.07 ms │     no change │
│ QQuery 23    │  7351.12 ms │          7555.84 ms │     no change │
│ QQuery 24    │   441.28 ms │           445.38 ms │     no change │
│ QQuery 25    │   298.98 ms │           310.10 ms │     no change │
│ QQuery 26    │   443.05 ms │           434.69 ms │     no change │
│ QQuery 27    │  1540.43 ms │          1535.79 ms │     no change │
│ QQuery 28    │ 12749.97 ms │         11922.38 ms │ +1.07x faster │
│ QQuery 29    │   539.74 ms │           515.15 ms │     no change │
│ QQuery 30    │   772.49 ms │           781.73 ms │     no change │
│ QQuery 31    │   788.43 ms │           797.62 ms │     no change │
│ QQuery 32    │  2360.44 ms │          2395.64 ms │     no change │
│ QQuery 33    │  3139.22 ms │          3142.80 ms │     no change │
│ QQuery 34    │  3171.83 ms │          3192.95 ms │     no change │
│ QQuery 35    │  1223.89 ms │          1247.48 ms │     no change │
│ QQuery 36    │   119.12 ms │           123.31 ms │     no change │
│ QQuery 37    │    51.06 ms │            50.90 ms │     no change │
│ QQuery 38    │   117.58 ms │           121.28 ms │     no change │
│ QQuery 39    │   192.58 ms │           197.01 ms │     no change │
│ QQuery 40    │    43.25 ms │            41.98 ms │     no change │
│ QQuery 41    │    39.19 ms │            38.33 ms │     no change │
│ QQuery 42    │    32.36 ms │            32.12 ms │     no change │
└──────────────┴─────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 53952.27ms │
│ Total Time (support-u32-hashmap)   │ 53533.03ms │
│ Average Time (HEAD)                │  1254.70ms │
│ Average Time (support-u32-hashmap) │  1244.95ms │
│ Queries Faster                     │          1 │
│ Queries Slower                     │          1 │
│ Queries with No Change             │         41 │
│ Queries with Failure               │          0 │
└────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ support-u32-hashmap ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  99.36 ms │            97.70 ms │     no change │
│ QQuery 2     │  21.37 ms │            20.73 ms │     no change │
│ QQuery 3     │  32.39 ms │            32.45 ms │     no change │
│ QQuery 4     │  18.84 ms │            18.05 ms │     no change │
│ QQuery 5     │  50.41 ms │            50.26 ms │     no change │
│ QQuery 6     │  11.88 ms │            11.61 ms │     no change │
│ QQuery 7     │  89.55 ms │            84.48 ms │ +1.06x faster │
│ QQuery 8     │  25.58 ms │            23.78 ms │ +1.08x faster │
│ QQuery 9     │  54.98 ms │            53.07 ms │     no change │
│ QQuery 10    │  42.29 ms │            42.47 ms │     no change │
│ QQuery 11    │  11.34 ms │            11.28 ms │     no change │
│ QQuery 12    │  35.44 ms │            35.15 ms │     no change │
│ QQuery 13    │  26.09 ms │            25.54 ms │     no change │
│ QQuery 14    │   9.81 ms │             9.69 ms │     no change │
│ QQuery 15    │  19.41 ms │            18.82 ms │     no change │
│ QQuery 16    │  17.99 ms │            17.77 ms │     no change │
│ QQuery 17    │  96.76 ms │            93.99 ms │     no change │
│ QQuery 18    │ 190.99 ms │           189.44 ms │     no change │
│ QQuery 19    │  24.77 ms │            24.99 ms │     no change │
│ QQuery 20    │  31.07 ms │            31.07 ms │     no change │
│ QQuery 21    │ 145.13 ms │           145.34 ms │     no change │
│ QQuery 22    │  14.74 ms │            14.71 ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                  │ 1070.19ms │
│ Total Time (support-u32-hashmap)   │ 1052.40ms │
│ Average Time (HEAD)                │   48.65ms │
│ Average Time (support-u32-hashmap) │   47.84ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         0 │
│ Queries with No Change             │        20 │
│ Queries with Failure               │         0 │
└────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

TLDR this branch looks good from my performance perspective. Thank you @jonathanc-n and @Dandandan

@alamb alamb merged commit 985eb49 into apache:main Jul 8, 2025
27 checks passed
@jonathanc-n jonathanc-n deleted the support-u32-hashmap branch October 10, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support u32 indices in HashJoinExec

3 participants