Skip to content

Commit f72d217

Browse files
rdblueHyukjinKwon
authored andcommitted
[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix.
## What changes were proposed in this pull request? Update to Parquet Java 1.10.1. ## How was this patch tested? Added a test from HyukjinKwon that validates the notEq case from SPARK-26677. Closes #23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug. Lead-authored-by: Ryan Blue <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: Ryan Blue <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent a5427a0 commit f72d217

File tree

4 files changed

+26
-11
lines changed

4 files changed

+26
-11
lines changed

dev/deps/spark-deps-hadoop-2.7

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -161,13 +161,13 @@ orc-shims-1.5.4.jar
161161
oro-2.0.8.jar
162162
osgi-resource-locator-1.0.1.jar
163163
paranamer-2.8.jar
164-
parquet-column-1.10.0.jar
165-
parquet-common-1.10.0.jar
166-
parquet-encoding-1.10.0.jar
164+
parquet-column-1.10.1.jar
165+
parquet-common-1.10.1.jar
166+
parquet-encoding-1.10.1.jar
167167
parquet-format-2.4.0.jar
168-
parquet-hadoop-1.10.0.jar
168+
parquet-hadoop-1.10.1.jar
169169
parquet-hadoop-bundle-1.6.0.jar
170-
parquet-jackson-1.10.0.jar
170+
parquet-jackson-1.10.1.jar
171171
protobuf-java-2.5.0.jar
172172
py4j-0.10.8.1.jar
173173
pyrolite-4.13.jar

dev/deps/spark-deps-hadoop-3.1

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -178,13 +178,13 @@ orc-shims-1.5.4.jar
178178
oro-2.0.8.jar
179179
osgi-resource-locator-1.0.1.jar
180180
paranamer-2.8.jar
181-
parquet-column-1.10.0.jar
182-
parquet-common-1.10.0.jar
183-
parquet-encoding-1.10.0.jar
181+
parquet-column-1.10.1.jar
182+
parquet-common-1.10.1.jar
183+
parquet-encoding-1.10.1.jar
184184
parquet-format-2.4.0.jar
185-
parquet-hadoop-1.10.0.jar
185+
parquet-hadoop-1.10.1.jar
186186
parquet-hadoop-bundle-1.6.0.jar
187-
parquet-jackson-1.10.0.jar
187+
parquet-jackson-1.10.1.jar
188188
protobuf-java-2.5.0.jar
189189
py4j-0.10.8.1.jar
190190
pyrolite-4.13.jar

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@
132132
<!-- note that this should be compatible with Kafka brokers version 0.10 and up -->
133133
<kafka.version>2.1.0</kafka.version>
134134
<derby.version>10.12.1.1</derby.version>
135-
<parquet.version>1.10.0</parquet.version>
135+
<parquet.version>1.10.1</parquet.version>
136136
<orc.version>1.5.4</orc.version>
137137
<orc.classifier>nohive</orc.classifier>
138138
<hive.parquet.version>1.6.0</hive.parquet.version>

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -890,6 +890,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext
890890
}
891891
}
892892
}
893+
894+
test("SPARK-26677: negated null-safe equality comparison should not filter matched row groups") {
895+
(true :: false :: Nil).foreach { vectorized =>
896+
withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString) {
897+
withTempPath { path =>
898+
// Repeated values for dictionary encoding.
899+
Seq(Some("A"), Some("A"), None).toDF.repartition(1)
900+
.write.parquet(path.getAbsolutePath)
901+
val df = spark.read.parquet(path.getAbsolutePath)
902+
checkAnswer(stripSparkFilter(df.where("NOT (value <=> 'A')")), df)
903+
}
904+
}
905+
}
906+
}
907+
893908
}
894909

895910
object TestingUDT {

0 commit comments

Comments
 (0)