Skip to content

Conversation

@akurmustafa
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Vec<Arc<dynPhysicalExpr>>s inside EquivalenceClass is conceptually set. Previously, since Arc<dyn PhysicalExpr> didnot support Eq, we couldn't use it insideHashSet (or IndexSet). With recent changes in Datafusion, we can do so now. This PR converts Vec<Arc<dynPhysicalExpr>> into IndexSet<Arc<dynPhysicalExpr>> to make implementation simpler.

What changes are included in this PR?

Are these changes tested?

Existing tests

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates common Related to common crate labels Dec 1, 2024
@github-actions github-actions bot removed the common Related to common crate label Dec 1, 2024
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! Thanks @akurmustafa 👍

/// The expressions in this equivalence class. The order doesn't
/// matter for equivalence purposes
///
/// TODO: use a HashSet for this instead of a Vec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed it

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome -- thanks @akurmustafa and @Weijun-H -- I am also running some benchmarks to see if it improves performance for planning

Self { exprs }
pub fn new(exprs: Vec<Arc<dyn PhysicalExpr>>) -> Self {
Self {
exprs: exprs.into_iter().collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@Weijun-H Weijun-H merged commit fdb221f into apache:main Dec 3, 2024
25 checks passed
@Weijun-H
Copy link
Member

Weijun-H commented Dec 3, 2024

Thanks @akurmustafa and @alamb

@alamb
Copy link
Contributor

alamb commented Dec 3, 2024

Hmm, I seem to see a slight slow down in planning 🤔 maybe worth some profiling

++ critcmp main refactor_hashset_exprs
group                                         main                                   refactor_hashset_exprs
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1471.3±32.58µs        ? ?/sec    1.01  1484.4±20.25µs        ? ?/sec
logical_select_all_from_1000                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00  1177.1±16.70µs        ? ?/sec    1.00  1172.2±16.21µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1137.3±16.03µs        ? ?/sec    1.01  1144.0±13.67µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1122.4±16.65µs        ? ?/sec    1.02  1144.8±25.55µs        ? ?/sec
physical_intersection                         1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_join_consider_sort                   1.00      3.2±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_join_distinct                        1.01  1122.7±16.06µs        ? ?/sec    1.00  1117.1±17.36µs        ? ?/sec
physical_many_self_joins                      1.00     17.0±0.09ms        ? ?/sec    1.00     17.0±0.11ms        ? ?/sec
physical_plan_clickbench_all                  1.00    225.4±1.82ms        ? ?/sec    1.01    227.7±1.99ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      3.3±0.04ms        ? ?/sec    1.01      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      4.3±0.05ms        ? ?/sec    1.01      4.4±0.06ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      4.5±0.06ms        ? ?/sec    1.01      4.5±0.05ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.06ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      4.2±0.05ms        ? ?/sec    1.01      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      4.4±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      4.3±0.06ms        ? ?/sec    1.03      4.4±0.17ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.04ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.8±0.06ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      4.5±0.05ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      3.3±0.05ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.07ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      5.0±0.06ms        ? ?/sec    1.01      5.0±0.06ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.8±0.09ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      3.9±0.05ms        ? ?/sec    1.01      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      3.6±0.06ms        ? ?/sec    1.02      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.05ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      4.7±0.07ms        ? ?/sec    1.02      4.8±0.11ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.9±0.10ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      3.5±0.05ms        ? ?/sec    1.02      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     16.6±0.21ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
...

@Weijun-H
Copy link
Member

Weijun-H commented Dec 4, 2024

Hmm, I seem to see a slight slow down in planning 🤔 maybe worth some profiling

++ critcmp main refactor_hashset_exprs
group                                         main                                   refactor_hashset_exprs
-----                                         ----                                   ----------------------
logical_aggregate_with_join                   1.00  1471.3±32.58µs        ? ?/sec    1.01  1484.4±20.25µs        ? ?/sec
logical_select_all_from_1000                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00  1177.1±16.70µs        ? ?/sec    1.00  1172.2±16.21µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1137.3±16.03µs        ? ?/sec    1.01  1144.0±13.67µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1122.4±16.65µs        ? ?/sec    1.02  1144.8±25.55µs        ? ?/sec
physical_intersection                         1.00      2.4±0.04ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_join_consider_sort                   1.00      3.2±0.02ms        ? ?/sec    1.00      3.3±0.02ms        ? ?/sec
physical_join_distinct                        1.01  1122.7±16.06µs        ? ?/sec    1.00  1117.1±17.36µs        ? ?/sec
physical_many_self_joins                      1.00     17.0±0.09ms        ? ?/sec    1.00     17.0±0.11ms        ? ?/sec
physical_plan_clickbench_all                  1.00    225.4±1.82ms        ? ?/sec    1.01    227.7±1.99ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      3.3±0.04ms        ? ?/sec    1.01      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      4.3±0.05ms        ? ?/sec    1.01      4.4±0.06ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      4.5±0.06ms        ? ?/sec    1.01      4.5±0.05ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.06ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      4.2±0.05ms        ? ?/sec    1.01      4.2±0.06ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      4.4±0.07ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      4.3±0.06ms        ? ?/sec    1.03      4.4±0.17ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.04ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      3.8±0.06ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      3.6±0.04ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      4.5±0.05ms        ? ?/sec    1.01      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      3.3±0.05ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.6±0.05ms        ? ?/sec    1.01      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      4.6±0.08ms        ? ?/sec    1.01      4.6±0.07ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      5.0±0.06ms        ? ?/sec    1.01      5.0±0.06ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.8±0.09ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      3.9±0.05ms        ? ?/sec    1.01      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      3.6±0.06ms        ? ?/sec    1.02      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.05ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      4.7±0.07ms        ? ?/sec    1.02      4.8±0.11ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      5.8±0.07ms        ? ?/sec    1.01      5.9±0.10ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      3.5±0.05ms        ? ?/sec    1.02      3.6±0.04ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     16.6±0.21ms        ? ?/sec    1.00     16.6±0.18ms        ? ?/sec
...

track #13638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants