Skip to content

Conversation

@pull
Copy link

@pull pull bot commented May 15, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

LuciferYang and others added 6 commits May 15, 2025 10:20
…the `yarn` module fail

### What changes were proposed in this pull request?
This pr adds functionality to upload the contents of the `target/test/data/` directory when the `yarn` module tests fail in GitHub Actions. This directory stores information related to Yarn Applications, such as logs and job configurations. The upload of this directory's contents facilitates troubleshooting when `yarn` module tests encounter failures.

### Why are the changes needed?
This facilitates troubleshooting when tests for the `yarn` module fail.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- It has been confirmed that the expected files can be downloaded when test failures occur in the `yarn` module.
https://github.com/LuciferYang/spark/actions/runs/15019569409

![image](https://github.com/user-attachments/assets/fd22f7d9-64fd-4d6f-a0d8-5e13c70de272)

After unzipping, the contents of the folder are as follows:

![image](https://github.com/user-attachments/assets/a2dce30f-add2-4671-9f64-2c26a25a8bab)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50891 from LuciferYang/upload-yarn-app-log.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
…ipeline Status`

### What changes were proposed in this pull request?
This pr fix the heading level of `Daily Build Pipeline Status`.

### Why are the changes needed?
Fix the heading level

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual check

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50899 from LuciferYang/SPARK-52080-FOLLOWUP-2.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…nect35` and `build_coverage`

### What changes were proposed in this pull request?
1, Add missing workflow `build_python_connect35` and `build_coverage`
2, reorder python workflows;
3, update the workflow names;

### Why are the changes needed?
to monitor the status

### Does this PR introduce _any_ user-facing change?
no, only for contributors

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #50900 from zhengruifeng/infra_daily.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request?
This pr aims to upgrade `arrow-java` from 18.2.0 to 18.3.0.

### Why are the changes needed?
The new version bring some bug fixes, like:

- apache/arrow-java#627
- apache/arrow-java#654
- apache/arrow-java#656
- apache/arrow-java#693
- apache/arrow-java#705
- apache/arrow-java#707
- apache/arrow-java#722

In addition, the new version introduces a cascading upgrade for flatbuffers-java([ from 24.3.25 to 25.1.24 ](apache/arrow-java#600))

the full release note as follows:
- https://github.com/apache/arrow-java/releases/tag/v18.3.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50892 from LuciferYang/arrow-java-18.3.0.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?

We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in #50057, and the author suggests using real-world data for zstd compression benchmark.

### Why are the changes needed?

Add a new benchmark for zstd with more reasonable data.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU  3.00GHz

zstd-jni:1.5.6-10
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2737           2742           6          0.0   684299199.3       1.0X
Compression 4 times at level 2 without buffer pool           4217           4218           2          0.0  1054165072.5       0.6X
Compression 4 times at level 3 without buffer pool           5660           5661           2          0.0  1414928809.8       0.5X
Compression 4 times at level 1 with buffer pool              2739           2743           6          0.0   684719746.2       1.0X
Compression 4 times at level 2 with buffer pool              4186           4191           8          0.0  1046477235.5       0.7X
Compression 4 times at level 3 with buffer pool              5663           5667           5          0.0  1415762083.2       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            943            950          10          0.0   235749387.0       1.0X
Decompression 4 times from level 2 without buffer pool           1239           1244           6          0.0   309753079.0       0.8X
Decompression 4 times from level 3 without buffer pool           1468           1484          23          0.0   366946390.8       0.6X
Decompression 4 times from level 1 with buffer pool               933            942           9          0.0   233286880.8       1.0X
Decompression 4 times from level 2 with buffer pool              1142           1171          40          0.0   285605190.0       0.8X
Decompression 4 times from level 3 with buffer pool              1394           1404          13          0.0   348546518.3       0.7X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1889           1899          14          0.0   472156817.0       1.0X
Parallel Compression with 1 workers                1715           1717           2          0.0   428826617.0       1.1X
Parallel Compression with 2 workers                 904            906           2          0.0   225890052.0       2.1X
Parallel Compression with 4 workers                 539            548           8          0.0   134735732.5       3.5X
Parallel Compression with 8 workers                 540            548           9          0.0   134889447.5       3.5X
Parallel Compression with 16 workers                577            589          23          0.0   144182540.7       3.3X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9555           9567          18          0.0  2388642623.3       1.0X
Parallel Compression with 1 workers                7973           8006          47          0.0  1993145509.0       1.2X
Parallel Compression with 2 workers                5070           5071           1          0.0  1267405763.3       1.9X
Parallel Compression with 4 workers                4420           4421           1          0.0  1104977620.3       2.2X
Parallel Compression with 8 workers                4790           4800          15          0.0  1197417939.0       2.0X
Parallel Compression with 16 workers               5000           5003           5          0.0  1249965510.5       1.9X
```

zstd-jni:1.5.7-3
```
================================================================================================
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------
Compression 4 times at level 1 without buffer pool           2700           2709          13          0.0   674967564.0       1.0X
Compression 4 times at level 2 without buffer pool           4148           4149           0          0.0  1037124857.0       0.7X
Compression 4 times at level 3 without buffer pool           5660           5682          31          0.0  1414968620.0       0.5X
Compression 4 times at level 1 with buffer pool              2718           2728          14          0.0   679514554.3       1.0X
Compression 4 times at level 2 with buffer pool              4130           4131           2          0.0  1032476406.2       0.7X
Compression 4 times at level 3 with buffer pool              5571           5576           6          0.0  1392871057.5       0.5X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------
Decompression 4 times from level 1 without buffer pool            942            951           9          0.0   235523684.5       1.0X
Decompression 4 times from level 2 without buffer pool           1248           1270          31          0.0   311906360.5       0.8X
Decompression 4 times from level 3 without buffer pool           1472           1475           4          0.0   368071680.5       0.6X
Decompression 4 times from level 1 with buffer pool               939            956          18          0.0   234631810.0       1.0X
Decompression 4 times from level 2 with buffer pool              1249           1261          16          0.0   312318610.5       0.8X
Decompression 4 times from level 3 with buffer pool              1475           1475           0          0.0   368765939.3       0.6X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 3:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                1865           1873          11          0.0   466278397.5       1.0X
Parallel Compression with 1 workers                1785           1793          10          0.0   446359936.8       1.0X
Parallel Compression with 2 workers                 945            953          10          0.0   236142005.8       2.0X
Parallel Compression with 4 workers                 559            577          29          0.0   139754505.5       3.3X
Parallel Compression with 8 workers                 537            555          13          0.0   134328778.3       3.5X
Parallel Compression with 16 workers                587            614          23          0.0   146784965.5       3.2X

OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic
Intel(R) Core(TM) i5-9500 CPU  3.00GHz
Parallel Compression at level 9:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers                9365           9375          14          0.0  2341247379.0       1.0X
Parallel Compression with 1 workers                8022           8022           0          0.0  2005448255.8       1.2X
Parallel Compression with 2 workers                5054           5069          22          0.0  1263445148.8       1.9X
Parallel Compression with 4 workers                4372           4394          31          0.0  1092926980.8       2.1X
Parallel Compression with 8 workers                4785           4805          28          0.0  1196282275.0       2.0X
Parallel Compression with 16 workers               5012           5028          23          0.0  1252925049.5       1.9X
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50857 from pan3793/SPARK-52078.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
…UES / ITERATOR for transformWithState in PySpark

### What changes were proposed in this pull request?

This PR proposes to squeeze the protocol of MapState KEYS / VALUES / ITERATOR for transformWithState in PySpark, which will help a lot on dealing with small map on MapState.

Here are the changes:

* MapState.keys(), MapState.values(), MapState.iterator() no longer requires additional request to notice there is no further data to read.
  * We inline the data into proto message, to ease of determine whether the iterator has fully consumed or not.

This change is the same mechanism we applied for ListState.get(). We got performance improvement in the prior case, and we also see this change to be helpful on our internal benchmark.

### Why are the changes needed?

To optimize further on MapState operations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50885 from HeartSaVioR/SPARK-52127.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants