Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 11, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

amaliujia and others added 8 commits October 11, 2022 17:31
### What changes were proposed in this pull request?

Support Column Alias in the Connect DSL (thus in Connect proto).

### Why are the changes needed?

Column alias is a part of dataframe API , meanwhile we need column alias to support `withColumn` etc. API.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38174 from amaliujia/alias.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…t `min_count`

### What changes were proposed in this pull request?
Make `_reduce_for_stat_function` in `groupby` accept `min_count`

### Why are the changes needed?
to simplify the implementations

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing UTs

Closes #38201 from zhengruifeng/ps_groupby_mc.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…eric type

### What changes were proposed in this pull request?
This pr aims to fix following Java compilation warnings related to generic type:

```
2022-10-08T01:43:33.6487078Z /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: warning: [rawtypes] found raw type: HashMap
2022-10-08T01:43:33.6487456Z     return new HashMap();
2022-10-08T01:43:33.6487682Z                ^
2022-10-08T01:43:33.6487957Z   missing type arguments for generic class HashMap<K,V>
2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
2022-10-08T01:43:33.6489211Z     V extends Object declared in class HashMap

2022-10-08T01:50:21.5951932Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: warning: [rawtypes] found raw type: Map
2022-10-08T01:50:21.5999993Z       createPartitions(new InternalRow[]{ident}, new Map[]{properties});
2022-10-08T01:50:21.6000343Z                                                      ^
2022-10-08T01:50:21.6000642Z   missing type arguments for generic class Map<K,V>
2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
2022-10-08T01:50:21.6002109Z     V extends Object declared in interface Map

2022-10-08T01:50:21.6006655Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: warning: [rawtypes] found raw type: Literal
2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) {
2022-10-08T01:50:21.6007395Z                                 ^
2022-10-08T01:50:21.6007673Z   missing type arguments for generic class Literal<T>
2022-10-08T01:50:21.6008032Z   where T is a type-variable:
2022-10-08T01:50:21.6008324Z     T extends Object declared in interface Literal

2022-10-08T01:50:21.6008785Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: warning: [rawtypes] found raw type: Comparable
2022-10-08T01:50:21.6009223Z   public static class Coord implements Comparable {
2022-10-08T01:50:21.6009503Z                                        ^
2022-10-08T01:50:21.6009791Z   missing type arguments for generic class Comparable<T>
2022-10-08T01:50:21.6010137Z   where T is a type-variable:
2022-10-08T01:50:21.6010433Z     T extends Object declared in interface Comparable
2022-10-08T01:50:21.6010976Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: warning: [unchecked] unchecked method invocation: method sort in class Collections is applied to given types
2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
2022-10-08T01:50:21.6011714Z                       ^
2022-10-08T01:50:21.6012050Z   required: List<T>
2022-10-08T01:50:21.6012296Z   found: ArrayList<Coord>
2022-10-08T01:50:21.6012604Z   where T is a type-variable:
2022-10-08T01:50:21.6012926Z     T extends Comparable<? super T> declared in method <T>sort(List<T>)

2022-10-08T02:13:38.0769617Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: warning: [rawtypes] found raw type: AbstractWriterAppender
2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
2022-10-08T02:13:38.0770645Z     ^
2022-10-08T02:13:38.0770947Z   missing type arguments for generic class AbstractWriterAppender<M>
2022-10-08T02:13:38.0771330Z   where M is a type-variable:
2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class AbstractWriterAppender

2022-10-08T02:13:38.0774487Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: warning: [rawtypes] found raw type: Layout
2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
2022-10-08T02:13:38.0775173Z         ^
2022-10-08T02:13:38.0775441Z   missing type arguments for generic class Layout<T>
2022-10-08T02:13:38.0775849Z   where T is a type-variable:
2022-10-08T02:13:38.0776359Z     T extends Serializable declared in interface Layout

2022-10-08T02:19:55.0035795Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:17:  [rawtypes] found raw type: SparkAvroKeyRecordWriter
2022-10-08T02:19:55.0037287Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:13:  [unchecked] unchecked call to SparkAvroKeyRecordWriter(Schema,GenericData,CodecFactory,OutputStream,int,Map<String,String>) as a member of the raw type SparkAvroKeyRecordWriter
2022-10-08T02:19:55.0038442Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:31:  [rawtypes] found raw type: DataFileWriter
2022-10-08T02:19:55.0039370Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:27:  [unchecked] unchecked call to DataFileWriter(DatumWriter<D>) as a member of the raw type DataFileWriter

```

### Why are the changes needed?
Fix Java compilation warnings.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions.

Closes #38198 from LuciferYang/fix-java-warn.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…pts to make code more portable

### What changes were proposed in this pull request?
Consistently invoke bash with /usr/bin/env bash in scripts to make code more portable

### Why are the changes needed?
some bash still use  #!/bin/bash

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
no need test

Closes #38191 from huangxiaopingRD/script.

Authored-by: huangxiaoping <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
…CY_ERROR_TEMP_2076-2100

### What changes were proposed in this pull request?

This PR proposes to migrate 25 execution errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_2076` to `_LEGACY_ERROR_TEMP_2100`.

The error classes are prefixed with `_LEGACY_ERROR_TEMP_` indicates the dev-facing error messages, and won't be exposed to end users.

### Why are the changes needed?

To speed-up the error class migration.

The migration on temporary error classes allow us to analyze the errors, so we can detect the most popular error classes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *SQLQuerySuite"
```

Closes #38122 from itholic/SPARK-40540-2076-2100.

Lead-authored-by: itholic <[email protected]>
Co-authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Closes #38165 from verhovsky/error-grammar.

Authored-by: Boris Verkhovskiy <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request?
Reduce the shuffle size of ALS by using `Array[V]` instead of `BoundedPriorityQueue[V]` in ser/deser
this is a corresponding change of #37918 on the `.mllib` side

### Why are the changes needed?
Reduce the shuffle size of ALS

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing UT

Closes #38203 from zhengruifeng/ml_topbykey.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
… in ANSI mode

### What changes were proposed in this pull request?

#38022 introduces an optional feature for supporting double-quoted identifiers. The feature is controlled by a flag `spark.sql.ansi.double_quoted_identifiers` which is independent from the flag `spark.sql.ansi.enabled`.
This is inconsistent with another ANSI SQL feature "Enforce ANSI reserved keywords": https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#sql-keywords-optional-disabled-by-default, which is only available when `spark.sql.ansi.enabled` is true.

Thus, to make the ANSI flags consistent, I suggest making double-quoted identifiers only available under ANSI SQL mode.
Other than that, this PR renames it from `spark.sql.ansi.double_quoted_identifiers` to `spark.sql.ansi.doubleQuotedIdentifiers`
### Why are the changes needed?

To make the ANSI SQL related features consistent.

### Does this PR introduce _any_ user-facing change?

No, the feature is not released yet.

### How was this patch tested?

New SQL test input file under ANSI mode.

Closes #38147 from gengliangwang/doubleQuoteFlag.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request?

Handle the case where the PR body is empty, when merging a PR with the merge script.

### Why are the changes needed?

The script fails otherwise.
Although we should not have empty PR descriptions, it should at least not break the script.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #38207 from srowen/DevMergePrBody.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
aokolnychyi and others added 11 commits October 11, 2022 13:41
…l commands

### What changes were proposed in this pull request?

This PR adds runtime group filtering for group-based row-level operations.

### Why are the changes needed?

These changes are needed to avoid rewriting unnecessary groups as the data skipping during job planning is limited and can still report false positive groups to rewrite.

### Does this PR introduce _any_ user-facing change?

This PR leverages existing APIs.

### How was this patch tested?

This PR comes with tests.

Closes #36304 from aokolnychyi/spark-38959.

Lead-authored-by: Anton Okolnychyi <[email protected]>
Co-authored-by: aokolnychyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…protobuf

From SandishKumarHN(sanysandishgmail.com) and Mohan Parthasarathy(mposdev21gmail.com)

# Introduction

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is widely used in Kafka-based data pipelines. Unlike Avro, Spark does not have native support for protobuf. This PR provides two new functions from_protobuf/to_protobuf to read and write Protobuf data within a data frame.

The implementation is closely modeled after Avro implementation so that it is easy to understand and review the changes.

Following is an example of typical usage.

```scala
// `from_proto` requires absolute path of Protobuf schema file
// and the protobuf message within the file
val userProtoFile = "./examples/src/main/resources/user.desc"
val userProtoMsg = "User"

val df = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "proto-topic-in")
  .load()

// 1. Decode the Protobuf data into a struct;
// 2. Filter by column `favorite_color`;
// 3. Encode the column `name` in Protobuf format.
val output = df
  .select(from_protobuf('value, userProtoFile, userProtoMsg) as 'user)
  .where("user.favorite_color == \"red\"")
  .select(to_protobuf($"user.name", userProtoFile, userProtoMsg) as 'value)

val query = output
  .writeStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("topic", "proto-topic-out")
  .start()
```

The new functions are very similar to Avro

- from_protobuf requires the proto descriptor file and the message type within that file which is similar to from_avro requiring the JSON schema.
- to_protobuf is similar to to_avro and does not require the proto descriptor file as it can build the schema (protobuf descriptor) from the catalyst types. Similarly, to_proto (like to_avro) can also take in the descriptor file for describing the schema

## What is supported

- Protobuf format proto3 is supported ( Even though proto2 and proto3 are inter-operable, we have explicitly tested only with proto3)
- Google protobuf supported types
    - Scalar value types
    - Enumerations
    - Message types as field types
    - Nested Messages
    - Maps
    - Unknown fields are well-formed protocol buffer serialized data representing fields that the parser does not recognize. Original version of proto3 did not include this when there are parsing problems. This feature is needed to detect schemas that does not match the message type and needed to support FAIL_SAFE and PERMISSIVE mode. This feature is available in proto3 with version. 3.5 onwards

## What is not supported

- Any requires the knowledge of the underlying object type when deserializing the message and generally not considered type safe
- OneOf requires the knowledge of the object type that was encoded when deserializing the message
- Custom Options is an advanced feature within protobuf where the users can define their own options
- Catalyst types that are not natively supported in protobuf. This happens normally during serialization and an exception will be thrown when following types are encountered
    - DecimalType
    - DateType
    - TimestampType

## Test cases covered

Tests have been written to test at different levels

- from_protobuf / to_protobuf (ProtoFunctionSuite)
- ProtoToCatalyst / CatalystToProto (ProtoCatalystDataConversionSuite)
- ProtoDeserializer / ProtoSerializer (ProtoSerdeSuite)

### ProtoFunctionSuite

A bunch of roundtrip tests go through to_protobuf(from_proto) or from_protobuf(to_proto) and compare the results. It also repeats some of the tests where to_protobuf is called without a descriptor file where the protobuf descriptor is built from the catalyst types.

- roundtrip in to_protobuf and from_protobuf for struct for protobuf scalar types
- roundtrip in to_protobuf(without descriptor params) and from_proto - struct for protobuf scalar types
- roundtrip in from_protobuf and to_protobuf - Repeated protobuf types
- roundtrip in from_protobuf and to_protobuf - Repeated Message Once
- roundtrip in from_protobuf and to_protobuf - Repeated Message Twice
- roundtrip in from_protobuf and to_protobuf - Map
- roundtrip in from_protobuf and to_protobuf - Enum
- roundtrip in from_protobuf and to_protobuf - Multiple Message
- roundtrip in to_protobuf and from_protobuf - with null

### ProtoSerdeSuite

- Test basic conversion - serialize(deserialize(message)) == message
- Fail to convert with field type mismatch - Make sure the right exception is thrown for incompatible schema for serializer and deserializer
- Fail to convert with missing nested Protobuf fields
- Fail to convert with deeply nested field type mismatch
- Fail to convert with missing Catalyst fields

### ****ProtoCatalystDataConversionSuite****

- ProtoToCatalyst(to_protobuf(basic_catalyst_types )): Boolean,Integer,Double,Float,Binary,String,Byte,Shost
- Handle unsupported input of Message type: Serialize a message first and deserialize using a bad schema. Test with FAILFAST to get an exception and PERMISSIVE to get a null row
- filter push-down to proto deserializer: Filtering the rows based on the filter during proto deserialization
- Test ProtoDeserializer with binary message type

### ****Cluster Testing****
Recent(10-04-2022) changes have been tested with the configurations listed below.
Job: Kafka + Spark Structure Streaming
2 executors, each with 2048m and 2 cores
150-200 events/second each event having 100 fields(basic types, message, map type, enum)

Closes #37972 from SandishKumarHN/SPARK_PROTO_1.

Lead-authored-by: SandishKumarHN <[email protected]>
Co-authored-by: Sandish Kumar Hebbani Naga <[email protected]>
Co-authored-by: Mohan Parthasarathy <[email protected]>
Co-authored-by: sandishkumarhn <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
…ground

### What changes were proposed in this pull request?
Append jline option "-Djline.terminal=jline.UnsupportedTerminal" to  enable the Beeline process to run in background.

### Why are the changes needed?
Currently, if we execute spark Beeline in background, the Beeline process stops immediately.
<img width="1350" alt="image" src="https://user-images.githubusercontent.com/88070094/194742935-8235b1ba-386e-4470-b182-873ef185e19f.png">

### Does this PR introduce _any_ user-facing change?
User will be able to execute Spark Beeline in background.

### How was this patch tested?

1. Start Spark ThriftServer
2. Execute command `./bin/beeline -u "jdbc:hive2://localhost:10000" -e "select 1;" &`
3. Verify Beeline process output in console:
<img width="1407" alt="image" src="https://user-images.githubusercontent.com/88070094/194743153-ff3f1d19-ac23-443b-97a6-f024719008cd.png">

### Note

Beeline works fine on Windows when backgrounded:
![image](https://user-images.githubusercontent.com/88070094/194743797-7dc4fc21-dec6-4056-8b13-21fc96f1476e.png)

Closes #38172 from zhouyifan279/SPARK-8731.

Authored-by: zhouyifan279 <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
### What changes were proposed in this pull request?
This PR is a follow-up for SPARK-40416. It updates all subquery tests that throw exceptions to also check the correct error classes and query contexts.

### Why are the changes needed?

To improve the test coverage for subquery error classes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #38210 from allisonwang-db/spark-40416-update-tests.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…CY_ERROR_TEMP_2101-2125

### What changes were proposed in this pull request?

This PR proposes to migrate 25 execution errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_2101` to `_LEGACY_ERROR_TEMP_2125`.

The error classes are prefixed with `_LEGACY_ERROR_TEMP_` indicates the dev-facing error messages, and won't be exposed to end users.

### Why are the changes needed?

To speed-up the error class migration.

The migration on temporary error classes allow us to analyze the errors, so we can detect the most popular error classes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *SQLQuerySuite"
```

Closes #38123 from itholic/SPARK-40540-2101-2125.

Authored-by: itholic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…files

### What changes were proposed in this pull request?

The current char/varchar feature relies on the data source to take care of all the write paths and ensure the char/varchar semantic (length check, string padding). This is good for the read performance, but has problems if some write paths did not follow the char/varchar semantic. e.g. a parquet table can be written by old Spark versions that do not have char/varchar type, or by other systems that do not recognize Spark char/varchar type.

This PR adds read-side string padding for the char type, so that we can still guarantee the char type semantic if the underlying data is valid (not over length). Char type is rarely used for legacy reasons and the perf doesn't matter that much, correctness is more important here. People can still disable read-side padding via a config if they are sure the data was written properly, such as benchmarks.

Note, we don't add read-side length check as varchar type is widely used and we don't want to introduce perf regression for the common case. Another reason is it's better to avoid invalid data at the write side, and read-side check won't help much.

### Why are the changes needed?

to better enforce char type semantic

### Does this PR introduce _any_ user-facing change?

Yes. Now Spark can still return padding char type values correctly even if the data source writer wrote the char type value without padding.

### How was this patch tested?

updated tests

Closes #38151 from cloud-fan/char.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…or Java 18+

### What changes were proposed in this pull request?
This pr add `-Djdk.reflect.useDirectMethodHandle=false` to `JavaModuleOptions ` and `maven/sbt` `extraJavaTestArgs` to make  Spark use `UnsafeFieldAccessor` as default  with Java 18/19 to avoid the bad case described in SPARK-40729 .

### Why are the changes needed?
After  [JEP 416: Reimplement Core Reflection with Method Handles](https://openjdk.org/jeps/416),  `MethodHandleAccessor 'is the default reflection implementation of Java, but in Spark it will cause the bad case mentioned in SPARK-40729,  so add `-Djdk.reflect.useDirectMethodHandle=false` as a workaround for Java 18/19.

### Does this PR introduce _any_ user-facing change?
No, The new option will not affect Java versions below 18

### How was this patch tested?

- Pass GitHub Actions
- Manual test:

1.  run `repl` module test with Java 18/19

**Before**

```
- broadcast vars *** FAILED ***
  isContain was true Interpreter output contained 'Exception':
  Welcome to
        ____              __
       / __/__  ___ _____/ /__
      _\ \/ _ \/ _ `/ __/  '_/
     /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
        /_/

  Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 19)
  Type in expressions to have them evaluated.
  Type :help for more information.

  scala>
  scala> array: Array[Int] = Array(0, 0, 0, 0, 0)

  scala> broadcastArray: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)

  scala> java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$3029/0x0000000801d80a30.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
    at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
    at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
    at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
    at java.base/java.lang.reflect.Field.set(Field.java:820)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
    at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
    at org.apache.spark.rdd.RDD.map(RDD.scala:412)
    ... 93 elided
  Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$3029/0x0000000801d80a30.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
    at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
    ... 105 more

  scala>
  scala> java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$3061/0x0000000801e01000.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
    at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:145)
    at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
    at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
    at java.base/java.lang.reflect.Field.set(Field.java:820)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
    at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
    at org.apache.spark.rdd.RDD.map(RDD.scala:412)
    ... 93 elided
  Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$3061/0x0000000801e01000.arg$1/putField, from class java.lang.Object (module java.base)
    at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3511)
    at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3502)
    at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1630)
    at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:145)
    ... 105 more

  scala>      |
  scala> :quit (ReplSuite.scala:83)
```

**After**

```
Run completed in 1 minute, 12 seconds.
Total number of tests run: 44
Suites: completed 7, aborted 0
Tests: succeeded 44, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

2. test spark-shell with Java 18/19:

**Before**

```
bin/spark-shell --master local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/11 19:13:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/11 19:13:08 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local, app id = local-1665486788733).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
      /_/

Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 18.0.2.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

var array = new Array[Int](5)
val broadcastArray = sc.broadcast(array)
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
array(0) = 5
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()

// Exiting paste mode, now interpreting.

java.lang.InternalError: java.lang.IllegalAccessException: final field has no write access: $Lambda$2396/0x00000008015b2e70.arg$1/putField, from class java.lang.Object (module java.base)
  at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:167)
  at java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:176)
  at java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1184)
  at java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1153)
  at java.base/java.lang.reflect.Field.set(Field.java:820)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
  at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:413)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
  at org.apache.spark.rdd.RDD.map(RDD.scala:412)
  ... 43 elided
Caused by: java.lang.IllegalAccessException: final field has no write access: $Lambda$2396/0x00000008015b2e70.arg$1/putField, from class java.lang.Object (module java.base)
  at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955)
  at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectField(MethodHandles.java:3494)
  at java.base/java.lang.invoke.MethodHandles$Lookup.unreflectSetter(MethodHandles.java:3485)
  at java.base/java.lang.invoke.MethodHandleImpl$1.unreflectField(MethodHandleImpl.java:1637)
  at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.jav
```

**After**

```
bin/spark-shell --master local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/11 19:11:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/11 19:11:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://localhost:4041
Spark context available as 'sc' (master = local, app id = local-1665486681920).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.0-SNAPSHOT
      /_/

Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 18.0.2.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

var array = new Array[Int](5)
val broadcastArray = sc.broadcast(array)
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()
array(0) = 5
sc.parallelize(0 to 4).map(x => broadcastArray.value(x)).collect()

// Exiting paste mode, now interpreting.

array: Array[Int] = Array(5, 0, 0, 0, 0)
broadcastArray: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)
res0: Array[Int] = Array(5, 0, 0, 0, 0)
```

Closes #38190 from LuciferYang/repl-19.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…classes

### What changes were proposed in this pull request?

In the PR, I propose to use error classes in the case of type check failure in arithmetic expressions.

### Why are the changes needed?

Migration onto error classes unifies Spark SQL error messages.

### Does this PR introduce _any_ user-facing change?

Yes. The PR changes user-facing error messages.

### How was this patch tested?

```
build/sbt "sql/testOnly *SQLQueryTestSuite"
build/sbt "test:testOnly org.apache.spark.SparkThrowableSuite"
build/sbt "test:testOnly *ExpressionTypeCheckingSuite"
build/sbt "test:testOnly *ArithmeticExpressionSuite"
```

Closes #38208 from lvshaokang/SPARK-40361.

Authored-by: lvshaokang <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request?
In the PR, I propose to use `checkError()` to check the error class, message parameters and the query context. After the PR #37916, all parsing exceptions should have an error class.

### Why are the changes needed?
1. Checking of the error classes plus the query context improves the test coverage.
2. Eliminating the dependency of error messages. So, tech editors will be able to modify `error-classes.json` w/o fixing the test suite.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt "test:testOnly *ErrorParserSuite"
```

Closes #38204 from MaxGekk/intercept-parsing-error-class.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…CY_ERROR_TEMP_2126-2150

### What changes were proposed in this pull request?

This PR proposes to migrate 25 execution errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_2126` to `_LEGACY_ERROR_TEMP_2150`.

The error classes are prefixed with `_LEGACY_ERROR_TEMP_` indicates the dev-facing error messages, and won't be exposed to end users.

### Why are the changes needed?

To speed-up the error class migration.

The migration on temporary error classes allow us to analyze the errors, so we can detect the most popular error classes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *SQLQuerySuite"
$ build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite"
```

Closes #38127 from itholic/SPARK-40540-2126-2150.

Authored-by: itholic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…onFactor` to support `Double` values

### What changes were proposed in this pull request?

This PR aims to improve `spark.sql.adaptive.skewJoin.skewedPartitionFactor` to support float values by converging to `doubleConf` from `intConf`.

### Why are the changes needed?

Like `spark.sql.adaptive.rebalancePartitionsSmallPartitionFactor`, this allows users to use the configuration more flexibly.

### Does this PR introduce _any_ user-facing change?

Yes, but it will accept all previous Integer configuration values.

### How was this patch tested?

Pass the CIs with the changed default value, `5.0`.

Closes #38225 from dongjoon-hyun/SPARK-40772.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@github-actions github-actions bot added the DOCS label Oct 12, 2022
LuciferYang and others added 5 commits October 12, 2022 18:39
### What changes were proposed in this pull request?
This pr add add `dist` to  `fileset` of`maven-clean-plugin` to make `mvn clean` can delete the `dist` dir which created by `dev/make-distribution.sh`.

### Why are the changes needed?
`dev/make-distribution.sh` will create a dist dir but no one cleaned it up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Local test to confirm that `dist` dir can be cleaned

Closes #38215 from LuciferYang/clean-dist.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
… consistent with Pandas

### What changes were proposed in this pull request?
in `Series.autocorr`, rename `periods` as `lag`

### Why are the changes needed?
when implementing the `Series.autocorr` in my first PS PR #36048 , I wrongly follow the parameter name `min_periods` in `Series.corr`, it should be `lag` to be the same with [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.autocorr.html)

### Does this PR introduce _any_ user-facing change?
no, since 3.4 is not released

### How was this patch tested?
existing UTs

Closes #38216 from zhengruifeng/ps_ser_autocorr_rename_parameter.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request?
Use `inline` instead of `explode` in `corrwith`

### Why are the changes needed?
do not need the temporary column

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing UTs

Closes #38221 from zhengruifeng/ps_df_corrwith_inline.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…by `checkstyle` to `31.0.1-jre`

### What changes were proposed in this pull request?
SPARK-40071 upgrade checkstyle to 9.3, but checkstyle 9.3 uses guava 31.0.1-jre:

https://github.com/checkstyle/checkstyle/blob/5c1903792f8432243cc8ae5cd79a03a004d3c09c/pom.xml#L250-L253

<img width="455" alt="image" src="https://user-images.githubusercontent.com/1475305/195262034-95036047-f37d-46a2-84e2-8975be3ab261.png">

So this pr upgrade it.

### Why are the changes needed?
Upgrade matched dependencies for `checkstyle`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #38217 from LuciferYang/SPARK-40766.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…alculateSingleLocationSize#getPathSize` method

### What changes were proposed in this pull request?
This pr change the 2nd input parameter from `Path` to `FileStatus` to avoid redundant `fs.getFileStatus(path)` in each recursive call.

### Why are the changes needed?
Reduce one dfs operation in each recursive call.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #38214 from LuciferYang/opt-getPathSize.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@wangyum wangyum merged commit 7ba70f0 into wangyum:master Oct 13, 2022
wangyum pushed a commit that referenced this pull request Jan 9, 2023
### What changes were proposed in this pull request?
Currently, Spark DS V2 aggregate push-down doesn't supports project with alias.

Refer https://github.com/apache/spark/blob/c91c2e9afec0d5d5bbbd2e155057fe409c5bb928/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L96

This PR let it works good with alias.

**The first example:**
the origin plan show below:
```
Aggregate [DEPT#0], [DEPT#0, sum(mySalary#8) AS total#14]
+- Project [DEPT#0, SALARY#2 AS mySalary#8]
   +- ScanBuilderHolder [DEPT#0, NAME#1, SALARY#2, BONUS#3], RelationV2[DEPT#0, NAME#1, SALARY#2, BONUS#3] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession77978658,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions5f8da82)
```
If we can complete push down the aggregate, then the plan will be:
```
Project [DEPT#0, SUM(SALARY)#18 AS sum(SALARY#2)#13 AS total#14]
+- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee
```
If we can partial push down the aggregate, then the plan will be:
```
Aggregate [DEPT#0], [DEPT#0, sum(cast(SUM(SALARY)#18 as decimal(20,2))) AS total#14]
+- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee
```

**The second example:**
the origin plan show below:
```
Aggregate [myDept#33], [myDept#33, sum(mySalary#34) AS total#40]
+- Project [DEPT#25 AS myDept#33, SALARY#27 AS mySalary#34]
   +- ScanBuilderHolder [DEPT#25, NAME#26, SALARY#27, BONUS#28], RelationV2[DEPT#25, NAME#26, SALARY#27, BONUS#28] test.employee, JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession25c4f621,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions345d641e)
```
If we can complete push down the aggregate, then the plan will be:
```
Project [DEPT#25 AS myDept#33, SUM(SALARY)#44 AS sum(SALARY#27)#39 AS total#40]
+- RelationV2[DEPT#25, SUM(SALARY)#44] test.employee
```
If we can partial push down the aggregate, then the plan will be:
```
Aggregate [myDept#33], [DEPT#25 AS myDept#33, sum(cast(SUM(SALARY)#56 as decimal(20,2))) AS total#52]
+- RelationV2[DEPT#25, SUM(SALARY)#56] test.employee
```

### Why are the changes needed?
Alias is more useful.

### Does this PR introduce _any_ user-facing change?
'Yes'.
Users could see DS V2 aggregate push-down supports project with alias.

### How was this patch tested?
New tests.

Closes apache#35932 from beliefer/SPARK-38533_new.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit f327dad)
Signed-off-by: Wenchen Fan <[email protected]>
pull bot pushed a commit that referenced this pull request May 1, 2024
… spark docker image

### What changes were proposed in this pull request?
The pr aims to update the packages name removed in building the spark docker image.

### Why are the changes needed?
When our default image base was switched from `ubuntu 20.04` to `ubuntu 22.04`, the unused installation package in the base image has changed, in order to eliminate some warnings in building images and free disk space more accurately, we need to correct it.

Before:
```
#35 [29/31] RUN apt-get remove --purge -y     '^aspnet.*' '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*'     snapd google-chrome-stable microsoft-edge-stable firefox     azure-cli google-cloud-sdk mono-devel powershell libgl1-mesa-dri || true
#35 0.489 Reading package lists...
#35 0.505 Building dependency tree...
#35 0.507 Reading state information...
#35 0.511 E: Unable to locate package ^aspnet.*
#35 0.511 E: Couldn't find any package by glob '^aspnet.*'
#35 0.511 E: Couldn't find any package by regex '^aspnet.*'
#35 0.511 E: Unable to locate package ^dotnet-.*
#35 0.511 E: Couldn't find any package by glob '^dotnet-.*'
#35 0.511 E: Couldn't find any package by regex '^dotnet-.*'
#35 0.511 E: Unable to locate package ^llvm-.*
#35 0.511 E: Couldn't find any package by glob '^llvm-.*'
#35 0.511 E: Couldn't find any package by regex '^llvm-.*'
#35 0.511 E: Unable to locate package ^mongodb-.*
#35 0.511 E: Couldn't find any package by glob '^mongodb-.*'
#35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed
#35 0.511 Package 'php' is not installed, so not removed
#35 0.511 : Couldn't find any package by regex '^mongodb-.*'
#35 0.511 E: Unable to locate package snapd
#35 0.511 E: Unable to locate package google-chrome-stable
#35 0.511 E: Unable to locate package microsoft-edge-stable
#35 0.511 E: Unable to locate package firefox
#35 0.511 E: Unable to locate package azure-cli
#35 0.511 E: Unable to locate package google-cloud-sdk
#35 0.511 E: Unable to locate package mono-devel
#35 0.511 E: Unable to locate package powershell
#35 DONE 0.5s

#36 [30/31] RUN apt-get autoremove --purge -y
#36 0.063 Reading package lists...
#36 0.079 Building dependency tree...
#36 0.082 Reading state information...
#36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
#36 DONE 0.4s
```

After:
```
#38 [32/36] RUN apt-get remove --purge -y     'gfortran-11' 'humanity-icon-theme' 'nodejs-doc' || true
#38 0.066 Reading package lists...
#38 0.087 Building dependency tree...
#38 0.089 Reading state information...
#38 0.094 The following packages were automatically installed and are no longer required:
#38 0.094   at-spi2-core bzip2-doc dbus-user-session dconf-gsettings-backend
#38 0.095   dconf-service gsettings-desktop-schemas gtk-update-icon-cache
#38 0.095   hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data
#38 0.095   libatspi2.0-0 libbz2-dev libcairo-gobject2 libcolord2 libdconf1 libepoxy0
#38 0.095   libgfortran-11-dev libgtk-3-common libjs-highlight.js libllvm11
#38 0.095   libncurses-dev libncurses5-dev libphobos2-ldc-shared98 libreadline-dev
#38 0.095   librsvg2-2 librsvg2-common libvte-2.91-common libwayland-client0
#38 0.095   libwayland-cursor0 libwayland-egl1 libxdamage1 libxkbcommon0
#38 0.095   session-migration tilix-common xkb-data
#38 0.095 Use 'apt autoremove' to remove them.
#38 0.096 The following packages will be REMOVED:
#38 0.096   adwaita-icon-theme* gfortran* gfortran-11* humanity-icon-theme* libgtk-3-0*
#38 0.096   libgtk-3-bin* libgtkd-3-0* libvte-2.91-0* libvted-3-0* nodejs-doc*
#38 0.096   r-base-dev* tilix* ubuntu-mono*
#38 0.248 0 upgraded, 0 newly installed, 13 to remove and 0 not upgraded.
#38 0.248 After this operation, 99.6 MB disk space will be freed.
...
(Reading database ... 70597 files and directories currently installed.)
#38 0.304 Removing r-base-dev (4.1.2-1ubuntu2) ...
#38 0.319 Removing gfortran (4:11.2.0-1ubuntu1) ...
#38 0.340 Removing gfortran-11 (11.4.0-1ubuntu1~22.04) ...
#38 0.356 Removing tilix (1.9.4-2build1) ...
#38 0.377 Removing libvted-3-0:amd64 (3.10.0-1ubuntu1) ...
#38 0.392 Removing libvte-2.91-0:amd64 (0.68.0-1) ...
#38 0.407 Removing libgtk-3-bin (3.24.33-1ubuntu2) ...
#38 0.422 Removing libgtkd-3-0:amd64 (3.10.0-1ubuntu1) ...
#38 0.436 Removing nodejs-doc (12.22.9~dfsg-1ubuntu3.4) ...
#38 0.457 Removing libgtk-3-0:amd64 (3.24.33-1ubuntu2) ...
#38 0.488 Removing ubuntu-mono (20.10-0ubuntu2) ...
#38 0.754 Removing humanity-icon-theme (0.6.16) ...
#38 1.362 Removing adwaita-icon-theme (41.0-1ubuntu1) ...
#38 1.537 Processing triggers for libc-bin (2.35-0ubuntu3.6) ...
#38 1.566 Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
#38 1.577 Processing triggers for libglib2.0-0:amd64 (2.72.4-0ubuntu2.2) ...
(Reading database ... 56946 files and directories currently installed.)
#38 1.645 Purging configuration files for libgtk-3-0:amd64 (3.24.33-1ubuntu2) ...
#38 1.657 Purging configuration files for ubuntu-mono (20.10-0ubuntu2) ...
#38 1.670 Purging configuration files for humanity-icon-theme (0.6.16) ...
#38 1.682 Purging configuration files for adwaita-icon-theme (41.0-1ubuntu1) ...
#38 DONE 1.7s

#39 [33/36] RUN apt-get autoremove --purge -y
#39 0.061 Reading package lists...
#39 0.075 Building dependency tree...
#39 0.077 Reading state information...
#39 0.083 The following packages will be REMOVED:
#39 0.083   at-spi2-core* bzip2-doc* dbus-user-session* dconf-gsettings-backend*
#39 0.083   dconf-service* gsettings-desktop-schemas* gtk-update-icon-cache*
#39 0.083   hicolor-icon-theme* libatk-bridge2.0-0* libatk1.0-0* libatk1.0-data*
#39 0.083   libatspi2.0-0* libbz2-dev* libcairo-gobject2* libcolord2* libdconf1*
#39 0.083   libepoxy0* libgfortran-11-dev* libgtk-3-common* libjs-highlight.js*
#39 0.083   libllvm11* libncurses-dev* libncurses5-dev* libphobos2-ldc-shared98*
#39 0.083   libreadline-dev* librsvg2-2* librsvg2-common* libvte-2.91-common*
#39 0.083   libwayland-client0* libwayland-cursor0* libwayland-egl1* libxdamage1*
#39 0.083   libxkbcommon0* session-migration* tilix-common* xkb-data*
#39 0.231 0 upgraded, 0 newly installed, 36 to remove and 0 not upgraded.
#39 0.231 After this operation, 124 MB disk space will be freed.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46258 from panbingkun/remove_packages_on_ubuntu.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
pull bot pushed a commit that referenced this pull request Jul 21, 2025
…ingBuilder`

### What changes were proposed in this pull request?

This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression.

### Why are the changes needed?

Since Java 9, `String Concatenation` has been handled better by default.

| ID | DESCRIPTION |
| - | - |
| JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) |

For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly.

**CODE CHANGE**
```java

- return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-   .append("appId", appId)
-   .append("execId", execId)
-   .append("blockIds", Arrays.toString(blockIds))
-   .toString();
+ return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" +
+     Arrays.toString(blockIds) + "]";
```

**BEFORE**
```
  public java.lang.String toString();
    Code:
       0: new           #39                 // class org/apache/commons/lang3/builder/ToStringBuilder
       3: dup
       4: aload_0
       5: getstatic     #41                 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle;
       8: invokespecial #47                 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V
      11: ldc           #50                 // String appId
      13: aload_0
      14: getfield      #7                  // Field appId:Ljava/lang/String;
      17: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      20: ldc           #55                 // String execId
      22: aload_0
      23: getfield      #13                 // Field execId:Ljava/lang/String;
      26: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      29: ldc           #56                 // String blockIds
      31: aload_0
      32: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      35: invokestatic  #57                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      38: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      41: invokevirtual #61                 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String;
      44: areturn
```

**AFTER**
```
  public java.lang.String toString();
    Code:
       0: aload_0
       1: getfield      #7                  // Field appId:Ljava/lang/String;
       4: aload_0
       5: getfield      #13                 // Field execId:Ljava/lang/String;
       8: aload_0
       9: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      12: invokestatic  #39                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      15: invokedynamic #43,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      20: areturn
```

### Does this PR introduce _any_ user-facing change?

No. This is an `toString` implementation improvement.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51572 from dongjoon-hyun/SPARK-52880.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.