-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
JOB (Join Order Benchmark) was proposed by a research team from TUM in the paper "How Good Are Query Optimizers, Really?".
It is also used in HyPer, DuckDB, and CedarDB. It is a good benchmark for testing join ordering and join operators. It is also part of DuckDB's regression test suite.
I think if we add this test suite, it will also help with improvements like those discussed in #7955.
Describe the solution you'd like
JOB utilize the IMDB datasets. These datasets are provided in csv.gz format and represent real-world data, making them ideal for testing datafusion.
task
- Convert the dataset from
csv.gzformat toParquet. - Add the IMDB license to the LICENSE.
- add benchmark queries.
- Integrate the benchmark suite into
dfbench.
Once everything is set up, we will be able to easily run benchmarks using the following command:
cargo run --bin dfbench --imdb --query=5
I would like to work on this!
Can someone help me understand the usual process for adding a third-party license in a Apache project ?
Describe alternatives you've considered
No response
Additional context
No response