Skip to content

Support merge for Distribution #15290

@xudong963

Description

@xudong963

Is your feature request related to a problem or challenge?

I'm working on the ticket: #10316.

Given that, we'll replace all Precision with Distribution: synnada-ai#63. So, while I make the design for #10316, I presumably use Distribution in statistics.

There is a spot where I'll do the merge for statistics, and it'll be spread to the Distribution.

The specific case is that I need to compute the partition-level statistics, aka, files will be grouped as the filegroup, each file group will be treated as a partition, and different partitions will be processed in parallel. So, the partition-level statistics will be from the merge of the files in a filegroup.

Describe the solution you'd like

Create a function that combines their statistical properties into a new distribution. The most appropriate approach is to create a GenericDistribution that approximates the mixture of the two input distributions.

pub fn merge_distributions(a: &Distribution, b: &Distribution) -> Result<Distribution> {
    ...
}

I'll open a PR and we can do more discussions based on the PR.

Describe alternatives you've considered

No

Additional context

No

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions