forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] master from apache:master #848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…the `yarn` module fail ### What changes were proposed in this pull request? This pr adds functionality to upload the contents of the `target/test/data/` directory when the `yarn` module tests fail in GitHub Actions. This directory stores information related to Yarn Applications, such as logs and job configurations. The upload of this directory's contents facilitates troubleshooting when `yarn` module tests encounter failures. ### Why are the changes needed? This facilitates troubleshooting when tests for the `yarn` module fail. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - It has been confirmed that the expected files can be downloaded when test failures occur in the `yarn` module. https://github.com/LuciferYang/spark/actions/runs/15019569409  After unzipping, the contents of the folder are as follows:  ### Was this patch authored or co-authored using generative AI tooling? No Closes #50891 from LuciferYang/upload-yarn-app-log. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…ipeline Status` ### What changes were proposed in this pull request? This pr fix the heading level of `Daily Build Pipeline Status`. ### Why are the changes needed? Fix the heading level ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual check ### Was this patch authored or co-authored using generative AI tooling? No Closes #50899 from LuciferYang/SPARK-52080-FOLLOWUP-2. Authored-by: yangjie01 <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…nect35` and `build_coverage` ### What changes were proposed in this pull request? 1, Add missing workflow `build_python_connect35` and `build_coverage` 2, reorder python workflows; 3, update the workflow names; ### Why are the changes needed? to monitor the status ### Does this PR introduce _any_ user-facing change? no, only for contributors ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #50900 from zhengruifeng/infra_daily. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0. ### Why are the changes needed? The new version bring some bug fixes, like: - apache/arrow-java#627 - apache/arrow-java#654 - apache/arrow-java#656 - apache/arrow-java#693 - apache/arrow-java#705 - apache/arrow-java#707 - apache/arrow-java#722 In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600)) the full release note as follows: - https://github.com/apache/arrow-java/releases/tag/v18.3.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons ### Was this patch authored or co-authored using generative AI tooling? No Closes #50892 from LuciferYang/arrow-java-18.3.0. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in #50057, and the author suggests using real-world data for zstd compression benchmark. ### Why are the changes needed? Add a new benchmark for zstd with more reasonable data. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU 3.00GHz zstd-jni:1.5.6-10 ``` ================================================================================================ Benchmark ZStandardCompressionCodec ================================================================================================ OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------- Compression 4 times at level 1 without buffer pool 2737 2742 6 0.0 684299199.3 1.0X Compression 4 times at level 2 without buffer pool 4217 4218 2 0.0 1054165072.5 0.6X Compression 4 times at level 3 without buffer pool 5660 5661 2 0.0 1414928809.8 0.5X Compression 4 times at level 1 with buffer pool 2739 2743 6 0.0 684719746.2 1.0X Compression 4 times at level 2 with buffer pool 4186 4191 8 0.0 1046477235.5 0.7X Compression 4 times at level 3 with buffer pool 5663 5667 5 0.0 1415762083.2 0.5X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------- Decompression 4 times from level 1 without buffer pool 943 950 10 0.0 235749387.0 1.0X Decompression 4 times from level 2 without buffer pool 1239 1244 6 0.0 309753079.0 0.8X Decompression 4 times from level 3 without buffer pool 1468 1484 23 0.0 366946390.8 0.6X Decompression 4 times from level 1 with buffer pool 933 942 9 0.0 233286880.8 1.0X Decompression 4 times from level 2 with buffer pool 1142 1171 40 0.0 285605190.0 0.8X Decompression 4 times from level 3 with buffer pool 1394 1404 13 0.0 348546518.3 0.7X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 3: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 1889 1899 14 0.0 472156817.0 1.0X Parallel Compression with 1 workers 1715 1717 2 0.0 428826617.0 1.1X Parallel Compression with 2 workers 904 906 2 0.0 225890052.0 2.1X Parallel Compression with 4 workers 539 548 8 0.0 134735732.5 3.5X Parallel Compression with 8 workers 540 548 9 0.0 134889447.5 3.5X Parallel Compression with 16 workers 577 589 23 0.0 144182540.7 3.3X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 9: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 9555 9567 18 0.0 2388642623.3 1.0X Parallel Compression with 1 workers 7973 8006 47 0.0 1993145509.0 1.2X Parallel Compression with 2 workers 5070 5071 1 0.0 1267405763.3 1.9X Parallel Compression with 4 workers 4420 4421 1 0.0 1104977620.3 2.2X Parallel Compression with 8 workers 4790 4800 15 0.0 1197417939.0 2.0X Parallel Compression with 16 workers 5000 5003 5 0.0 1249965510.5 1.9X ``` zstd-jni:1.5.7-3 ``` ================================================================================================ Benchmark ZStandardCompressionCodec ================================================================================================ OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------- Compression 4 times at level 1 without buffer pool 2700 2709 13 0.0 674967564.0 1.0X Compression 4 times at level 2 without buffer pool 4148 4149 0 0.0 1037124857.0 0.7X Compression 4 times at level 3 without buffer pool 5660 5682 31 0.0 1414968620.0 0.5X Compression 4 times at level 1 with buffer pool 2718 2728 14 0.0 679514554.3 1.0X Compression 4 times at level 2 with buffer pool 4130 4131 2 0.0 1032476406.2 0.7X Compression 4 times at level 3 with buffer pool 5571 5576 6 0.0 1392871057.5 0.5X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------- Decompression 4 times from level 1 without buffer pool 942 951 9 0.0 235523684.5 1.0X Decompression 4 times from level 2 without buffer pool 1248 1270 31 0.0 311906360.5 0.8X Decompression 4 times from level 3 without buffer pool 1472 1475 4 0.0 368071680.5 0.6X Decompression 4 times from level 1 with buffer pool 939 956 18 0.0 234631810.0 1.0X Decompression 4 times from level 2 with buffer pool 1249 1261 16 0.0 312318610.5 0.8X Decompression 4 times from level 3 with buffer pool 1475 1475 0 0.0 368765939.3 0.6X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 3: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 1865 1873 11 0.0 466278397.5 1.0X Parallel Compression with 1 workers 1785 1793 10 0.0 446359936.8 1.0X Parallel Compression with 2 workers 945 953 10 0.0 236142005.8 2.0X Parallel Compression with 4 workers 559 577 29 0.0 139754505.5 3.3X Parallel Compression with 8 workers 537 555 13 0.0 134328778.3 3.5X Parallel Compression with 16 workers 587 614 23 0.0 146784965.5 3.2X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 9: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 9365 9375 14 0.0 2341247379.0 1.0X Parallel Compression with 1 workers 8022 8022 0 0.0 2005448255.8 1.2X Parallel Compression with 2 workers 5054 5069 22 0.0 1263445148.8 1.9X Parallel Compression with 4 workers 4372 4394 31 0.0 1092926980.8 2.1X Parallel Compression with 8 workers 4785 4805 28 0.0 1196282275.0 2.0X Parallel Compression with 16 workers 5012 5028 23 0.0 1252925049.5 1.9X ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50857 from pan3793/SPARK-52078. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…UES / ITERATOR for transformWithState in PySpark ### What changes were proposed in this pull request? This PR proposes to squeeze the protocol of MapState KEYS / VALUES / ITERATOR for transformWithState in PySpark, which will help a lot on dealing with small map on MapState. Here are the changes: * MapState.keys(), MapState.values(), MapState.iterator() no longer requires additional request to notice there is no further data to read. * We inline the data into proto message, to ease of determine whether the iterator has fully consumed or not. This change is the same mechanism we applied for ListState.get(). We got performance improvement in the prior case, and we also see this change to be helpful on our internal benchmark. ### Why are the changes needed? To optimize further on MapState operations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50885 from HeartSaVioR/SPARK-52127. Authored-by: Jungtaek Lim <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )