-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Minor: Simplify + document EliminateCrossJoin better
#10427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| pub fn check_all_columns_from_schema( | ||
| columns: &HashSet<Column>, | ||
| schema: DFSchemaRef, | ||
| schema: &DFSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes the schema by reference rather than value now.
Since DFSchemaRef is an Arc I don't expect this to make any measurable performance improvement, but I think it makes it clearer what is going on and avoids a bunch of clones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| &left, | ||
| &mut all_inputs, | ||
| &mut possible_join_keys, | ||
| &possible_join_keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_inner_join does not modify possible_join_keys so update the signature to make that clear
| Ok(true) | ||
| } | ||
|
|
||
| /// Finds the next to join with the left input plan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time studying this code so wanted to document what it is doing
|
@jackwener I wonder if you have time to review this PR? |
| right_key: &Expr, | ||
| left_schema: DFSchemaRef, | ||
| right_schema: DFSchemaRef, | ||
| left_schema: &DFSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats interesting, should we a reference instead of Arc whenever possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general we should use a reference when the underlying callsite doesn't actually need a owned copy
So in this case, the callsite just needs a &DFSchema so there is no need to go through the overhead of creating the Arc
I think the runtime overhead of creating another Arc is likely to be unmeasurable in all but the most extreme circumstances. However I think the code is often easier to reason about
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb nice clean up
jackwener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice job to me, thanks @alamb
|
Thanks for the reviews @jackwener and @comphead ❤️ -- this now opens up a few more PRs I have queued up to make this code faster (by reducing cloning) #10430 |
Which issue does this PR close?
Part of #10287
Rationale for this change
I am trying to avoid copying as much in
EliminateCrossJoinand I noticed a fewthings that can be simpler. While this avoids some copies I don't think it will really impact performance much.
Instead this is a step towards a larger refactor to avoid the actual plan copies
What changes are included in this PR?
&mutto make it clear what is being changedDFSchemaRefAre these changes tested?
By existing CI
Are there any user-facing changes?
No