forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] master from apache:master #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What changes were proposed in this pull request? This PR makes Bloom filter join use larger number of bits to build Bloom filter if row count is exist. ### Why are the changes needed? To fix Bloom filter join cannot filter out more data when CBO is enabled. For example: TPC-DS q64: CBO is enabled | CBO is disabled -- | -- <img width="282" height="600" alt="image" src="https://user-images.githubusercontent.com/5399861/187076753-2e9ccc72-0289-4537-a6d9-3a01a37bf6cd.png"> | <img width="373" height="600" alt="image" src="https://user-images.githubusercontent.com/5399861/187076786-c982e711-52e2-4199-ba42-e1100f57287b.png"> <img width="532" height="400" alt="image" src="https://user-images.githubusercontent.com/5399861/187075553-bd6956b7-8f1f-4df5-82b7-d010defb6d21.png"> | <img width="622" height="400" alt="image" src="https://user-images.githubusercontent.com/5399861/187075588-254c3246-b9af-403c-8df7-d8344fd1d2a4.png"> After this PR: Build bloom filter | Filter data -- | -- <img width="262" height="600" alt="image" src="https://user-images.githubusercontent.com/5399861/187075676-85b2afae-03a0-4430-9c4e-2679c6ef62f7.png"> | <img width="509" height="600" alt="image" src="https://user-images.githubusercontent.com/5399861/187075713-41173dc1-d01d-476a-b218-5c67be823e1b.png"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #37697 from wangyum/SPARK-40248. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Signed-off-by: Yuming Wang <[email protected]>
…error classes ### What changes were proposed in this pull request? This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the complex type creator expressions, includes: 1. CreateMap (3): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214 2. CreateNamedStruct (3): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457 3. UpdateFields (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673 ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? 1. Add new UT 2. Update existed UT 3. Pass GA Closes #38463 from panbingkun/SPARK-40374. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? Every time entries in offset log or commit log needs to be access, we read from disk which is slow. Can a cache of recent entries to speed up reads. There is already an existing implementation of a caching mechanism in OffsetSeqLog. Lets replace it with an implementation in HDFSMetadataLog (parent class) so that we can support reading from in memory cache for both offset log and commit log. ### Why are the changes needed? Improve read speeds for entries in offset log and commit log ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests should suffice Closes #38430 from jerrypeng/SPARK-40957. Authored-by: Jerry Peng <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
### What changes were proposed in this pull request? This pr aims upgrade RoaringBitmap 0.9.35 ### Why are the changes needed? This version bring some bug fix: - RoaringBitmap/RoaringBitmap#587 - RoaringBitmap/RoaringBitmap#588 other changes as follows: RoaringBitmap/RoaringBitmap@0.9.32...0.9.35 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #38465 from LuciferYang/rbitmap-0935. Authored-by: yangjie01 <[email protected]> Signed-off-by: Sean Owen <[email protected]>
pull bot
pushed a commit
that referenced
this pull request
Dec 21, 2023
…HAVING
### What changes were proposed in this pull request?
This PR enhanced the analyzer to handle the following pattern properly.
```
Sort
- Filter
- Aggregate
```
### Why are the changes needed?
```
spark-sql (default)> CREATE TABLE t1 (flag BOOLEAN, dt STRING);
spark-sql (default)> SELECT LENGTH(dt),
> COUNT(t1.flag)
> FROM t1
> GROUP BY LENGTH(dt)
> HAVING COUNT(t1.flag) > 1
> ORDER BY LENGTH(dt);
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `dt` cannot be resolved. Did you mean one of the following? [`length(dt)`, `count(flag)`].; line 6 pos 16;
'Sort ['LENGTH('dt) ASC NULLS FIRST], true
+- Filter (count(flag)#60L > cast(1 as bigint))
+- Aggregate [length(dt#9)], [length(dt#9) AS length(dt)#59, count(flag#8) AS count(flag)#60L]
+- SubqueryAlias spark_catalog.default.t1
+- Relation spark_catalog.default.t1[flag#8,dt#9] parquet
```
The above code demonstrates the failure case, the query failed during the analysis phase when both `HAVING` and `ORDER BY` clauses are present, but successful if only one is present.
### Does this PR introduce _any_ user-facing change?
Yes, maybe we can call it a bugfix.
### How was this patch tested?
New UTs are added
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes apache#44352 from pan3793/SPARK-28386.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
pull bot
pushed a commit
that referenced
this pull request
Nov 22, 2024
…ead pool ### What changes were proposed in this pull request? This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`. https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64 ### Why are the changes needed? This is helpful during JVM investigation. **BEFORE (4.0.0-preview2)** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28217 | grep qtp "qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable [0x0000000314592000] "qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable [0x000000031479e000] "qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable [0x00000003149aa000] "qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable [0x0000000314bb6000] "qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable [0x0000000314dc2000] "qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable [0x0000000314fce000] "qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable [0x00000003151da000] "qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable [0x00000003153e6000] "qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable [0x00000003155f2000] "qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition [0x00000003157fe000] ``` **AFTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28317 | grep StandaloneRestServer "StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable [0x000000032cfce000] "StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable [0x000000032d1da000] "StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable [0x000000032d3e6000] "StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable [0x000000032d5f2000] "StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable [0x000000032d7fe000] "StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable [0x000000032da0a000] "StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable [0x000000032dc16000] "StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable [0x000000032de22000] "StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable [0x000000032e02e000] "StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition [0x000000032e23a000] ``` ### Does this PR introduce _any_ user-facing change? No, the thread names are accessed during the debugging. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48924 from dongjoon-hyun/SPARK-50385. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: panbingkun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )