-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Materialize dictionaries in group keys #8291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Materialize dictionaries in group keys #8291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @qrilka !
Can we please also add a test for this change too (via an explain plan that shows the schema?)
Maybe we could add a way to show schema in DisplayableExecutionPlan 🤔
| /// returns schema with dictionary group keys materialized as their value types | ||
| /// The actual convertion happens in `RowConverter` and we don't do unnecessary | ||
| /// conversion back into dictionaries | ||
| fn materialize_dict_group_keys(schema: &Schema, group_count: usize) -> Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupValues is a trait and there are two implementations of it -- GroupValueRows and GroupValuesPrimitive
This conversion only applies to GroupValueRows (though GroupValuesPrimitive won't be used for input dictionary columns anyways)
I wonder if there is some way to move the conversion into GroupValueRows so it is clearer from the context when this is needed or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I do understand your point and it saw this when creating my PR but I don't quite see a good way around this: the problem is that GroupValues get's created only during execution in AggregateExec::execute_typed but the correct ("materialized") schema looks to be needed in AggregateExec itself, i.e. already at the planning stage (as a part of impl ExecutionPlan for AggregateExec).
Maybe I miss some option here?
|
@alamb I think the following test looks to be enough: Adding schemas into |
Sorry for the late reply @qrilka -- what I was thinking is "if someone undid this code change, would any of the tests fail to know we broke it" I think the answer is no. So that is what I was thinking about the test. I am not sure if you are proposing a new test or saying an existing test is sufficient |
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
2b07114 to
2194cca
Compare
|
@alamb I was proposing a new test (which is what is in my quote) now I added it into the PR. Please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @qrilka
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
|
This PR seems to have caused a regression: #8738 |
Which issue does this PR close?
Closes #7647.
Rationale for this change
From the ticket:
Given that group keys inherently have few repeated values, especially when grouping on a single column, the use of dictionary encoding is unlikely to be yielding significant returns
What changes are included in this PR?
Now
AggregateExecincludesoriginal_schemawhich allows us to differentiate between original schema with dictionaries in place andschemawhere we materialize them as their value types.Are these changes tested?
Tested locally with
cargo testAre there any user-facing changes?
The main user-facing change will be in different type returned for group keys using dictionaries.