-
Notifications
You must be signed in to change notification settings - Fork 253
feat: Use unified allocator for execution iterators #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8123627
9f15b91
748ca02
32e2f50
37f58cd
6b8361d
0e22296
49995dd
92fcd32
4a4e9c1
5cda687
acdefa5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,10 +27,6 @@ import org.apache.comet.CometConf | |
|
|
||
| class CometTPCDSQuerySuite | ||
| extends { | ||
| override val excludedTpcdsQueries: Set[String] = Set() | ||
|
|
||
| // This is private in `TPCDSBase` and `excludedTpcdsQueries` is private too. | ||
| // So we cannot override `excludedTpcdsQueries` to exclude the queries. | ||
| val tpcdsAllQueries: Seq[String] = Seq( | ||
| "q1", | ||
| "q2", | ||
|
|
@@ -112,7 +108,9 @@ class CometTPCDSQuerySuite | |
| "q69", | ||
| "q70", | ||
| "q71", | ||
| "q72", | ||
| // TODO: unknown failure (seems memory usage over Github action runner) in CI with q72 in | ||
| // https://github.com/apache/datafusion-comet/pull/613. | ||
| // "q72", | ||
| "q73", | ||
| "q74", | ||
| "q75", | ||
|
|
@@ -141,9 +139,45 @@ class CometTPCDSQuerySuite | |
| "q98", | ||
| "q99") | ||
|
|
||
| // TODO: enable the 3 queries after fixing the issues #1358. | ||
| override val tpcdsQueries: Seq[String] = | ||
| tpcdsAllQueries.filterNot(excludedTpcdsQueries.contains) | ||
| val tpcdsAllQueriesV2_7_0: Seq[String] = Seq( | ||
| "q5a", | ||
| "q6", | ||
| "q10a", | ||
| "q11", | ||
| "q12", | ||
| "q14", | ||
| "q14a", | ||
| "q18a", | ||
| "q20", | ||
| "q22", | ||
| "q22a", | ||
| "q24", | ||
| "q27a", | ||
| "q34", | ||
| "q35", | ||
| "q35a", | ||
| "q36a", | ||
| "q47", | ||
| "q49", | ||
| "q51a", | ||
| "q57", | ||
| "q64", | ||
| "q67a", | ||
| "q70a", | ||
| // TODO: unknown failure (seems memory usage over Github action runner) in CI with q72-v2.7 | ||
| // in https://github.com/apache/datafusion-comet/pull/613. | ||
| // "q72", | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In latest run, I saw
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found a few particular queries (q72, q16) seems to use more memory than others. q72 cannot be run through sort merge join config now in the CI runner due to its resource limit, but I can run it locally.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will investigate the two queries further but they seem not related to the changes here. |
||
| "q74", | ||
| "q75", | ||
| "q77a", | ||
| "q78", | ||
| "q80a", | ||
| "q86a", | ||
| "q98") | ||
|
|
||
| override val tpcdsQueries: Seq[String] = tpcdsAllQueries | ||
|
|
||
| override val tpcdsQueriesV2_7_0: Seq[String] = tpcdsAllQueriesV2_7_0 | ||
| } | ||
| with CometTPCDSQueryTestSuite | ||
| with ShimCometTPCDSQuerySuite { | ||
|
|
@@ -157,9 +191,11 @@ class CometTPCDSQuerySuite | |
| conf.set(CometConf.COMET_EXEC_ENABLED.key, "true") | ||
| conf.set(CometConf.COMET_EXEC_ALL_OPERATOR_ENABLED.key, "true") | ||
| conf.set(CometConf.COMET_EXEC_SHUFFLE_ENABLED.key, "true") | ||
| conf.set(CometConf.COMET_MEMORY_OVERHEAD.key, "20g") | ||
| conf.set(CometConf.COMET_MEMORY_OVERHEAD.key, "15g") | ||
| conf.set(CometConf.COMET_SHUFFLE_ENFORCE_MODE_ENABLED.key, "true") | ||
| conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true") | ||
|
Comment on lines
+195
to
+196
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before we can close #387 we should either change the default for
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be a separate PR but we should not close the issue when we merge this PR
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me create another issue for this PR.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created #648 for this PR. |
||
| conf.set(MEMORY_OFFHEAP_ENABLED.key, "true") | ||
| conf.set(MEMORY_OFFHEAP_SIZE.key, "20g") | ||
| conf.set(MEMORY_OFFHEAP_SIZE.key, "15g") | ||
| conf | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 on skipping running the official q72 query by default (because it is so ridiculous), especially in CI. However, maybe we should consider running an optimized version where the join order is sensible, which makes it at least 10x faster and uses far less memory. I will file a follow on issue to discuss this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of q72 is to test vendors join reordering rules, and that isn't really very relevant to Spark or Comet since Spark queries typically don't have access to statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the version I have been using locally. Since we are not aiming to run the official TPC-DS benchmarks, but just our derived benchmarks, and also given that we are comparing Spark to Comet for the same queries, I think this would be fine to use by default as long it is well documented in our benchmarking guide.
I do think we should still test with the original q72 as a separate exercise though, because if Spark can run it then Comet should be able to as well (with the same memory configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, Spark has the capacity to do join reordering if statistics are available but it relies on enabling CBO features which are disabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. As I mentioned earlier, I will investigate q72 further to see why it requires extra memory in Comet. Just disable it to unblock this PR.