-
Couldn't load subscription status.
- Fork 1.7k
Add Aggregation fuzzer framework #12667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
😍 😍 😍 😍 |
d1f5aea to
6ad27f5
Compare
|
Rest works:
|
b059373 to
cf82eb7
Compare
|
@alamb I found the new fuzz cases(based on aggregation fuzzer) failed after I rebase main. The failed new case: I am more sure now, it may be due to the process of null, when I switch the Updated |
|
I found the reason and have submitted a pr to fix it #12758 |
Testing for the win! |
|
The tests' run time after modifying |
|
@alamb I have added tests to cover more aggregations:
I make it through the simple implementation about randomly select sql memtioned in #12667 (comment)
Now the testcase may be like: // Build fuzzer
let fuzzer = builder
.data_gen_config(data_gen_config)
.data_gen_rounds(16)
.add_sql("SELECT sum(a) FROM fuzz_table")
.add_sql("SELECT sum(distinct a) FROM fuzz_table")
.add_sql("SELECT max(a) FROM fuzz_table")
.add_sql("SELECT min(a) FROM fuzz_table")
.add_sql("SELECT count(a) FROM fuzz_table")
.add_sql("SELECT count(distinct a) FROM fuzz_table")
.add_sql("SELECT avg(a) FROM fuzz_table")
.table_name("fuzz_table")
.build();And unforunately, seems a bug about |
Yes, let's handle that as a separate PR / issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again @Rachelint -- I think this is great and I'll plan to merge it tomorrow unless anyone else would like time to review
Then we can work on follow on items in other PRs
| } | ||
|
|
||
| /// Fuzz test for group by `sting + int64` | ||
| #[tokio::test(flavor = "multi_thread")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant like
#[tokio::test]
async fn test_basic_string_aggr_group_by_mixed_string_int64_1() {
...
}
#[tokio::test]
async fn test_basic_string_aggr_group_by_mixed_string_int64_2() {
...
}
Rather than a single test that was multi-threaded
|
Thanks again @Rachelint -- this is so great |
|
I am working on filing the error that this found for min/max |
|
I also have some ideas on how to improve the error reporting here |
Nice, the error reporting is indeed not so readable currently |
Maybe #5131 will help? The error seems similar. |
|
#12832 has a proposal for improved error reporting |
| let fuzzer = builder | ||
| .data_gen_config(data_gen_config) | ||
| .data_gen_rounds(8) | ||
| // FIXME: Encounter error in min/max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed fix in #12834
|
BTW this found another bug in #12792 See https://github.com/apache/datafusion/actions/runs/11272685143/job/31348663515?pr=12792 ❤️ |
Which issue does this PR close?
Part of #12114
Rationale for this change
The aggregation fuzz tests maybe not enough, becasue there are so many optimizations in aggregation in datafusion.
For improving the testing, the first thing to do may be building a framework to make creating testcase easier.
This pr aimed to introduce such a framework.
What changes are included in this PR?
Design an
AggregationFuzzerto make creating testcases for aggregation easier.Two key components in
AggregationFuzzerareDatasetGeneratorandSessionContextGenerator.In
AggregationFuzzer(with a deinfedqueryrepresented by sql currently):Multiple
datasets will be generated byDatasetGeneratorfirstly.And then for a specific
dataset, multipleSessionContextwith the random params about aggreagtion, and a baselineSessionContextwill be generated.Still for the specific
dataset, each randomly generatedSessionContext+basline result(generated fromdataset+ baslineSessionContext) will be used to buildAggregationFuzzTestTask.In
AggregationFuzzTestTask, we will run thequerywithdataset+ relatedSessionContext, then we compare the result withbasline result. If inconsistency found, report it and panic.Finally, all
AggregationFuzzTestTasks will be run in parallel inAggregationFuzzerAre these changes tested?
Test by uts.
Are there any user-facing changes?
No,