Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Dec 7, 2023

Which issue does this PR close?

Follow #1841
Closes #1109

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Dec 7, 2023
@Weijun-H Weijun-H changed the title refactor: support bitmap for u8/16 and i8/16 in approx_distinct refactor: support bitmap for u8/16 and i8/16 in approx_distinct Dec 7, 2023
}

fn size(&self) -> usize {
self.bitmap.serialized_size()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure is a proper way to measure roaring bitmap

@korowa
Copy link
Contributor

korowa commented Dec 27, 2023

Thank you @Weijun-H!

This PR, though, looks like the implementation of CountDistinct accumulator, and it doesn't seem that any approximation is performed in the code -- won't it be better to use these changes as default CountDistinct implementation for specified data types?

@alamb
Copy link
Contributor

alamb commented Jan 27, 2024

@korowa
Copy link
Contributor

korowa commented Jan 28, 2024

I wonder what we should do with this PR now we have

While I'm still not sure if this PR fits ApproxDistinct functionality, I suppose it might be a viable replacement for HashSets in regular CountDistinct -- so it at least worth checking / benchmarking within #1823 (paying special attention to memory consumption of bitmap-based accumulator)

@alamb
Copy link
Contributor

alamb commented Jan 29, 2024

Thanks @Weijun-H and @korowa -- I'll mark this PR as a draft now and if someone finds time to do the benchmarks we can reopen it with that in consideration

@alamb alamb marked this pull request as draft January 29, 2024 11:57
@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 17, 2024
@github-actions github-actions bot closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

approx_distinct should be leveraging bitmap for counting u8/16 and i8/16

3 participants