[SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side #21224

cloud-fan · 2018-05-03T05:26:34Z

What changes were proposed in this pull request?

This PR is extracted from #21190 , to make it easier to backport.

ParquetFilters is used in the file scan function, which is executed in executor side, so we can't call conf.parquetFilterPushDownDate there.

How was this patch tested?

it's tested in #21190

cloud-fan · 2018-05-03T05:27:11Z

cc @gatorsmile

SparkQA · 2018-05-03T05:34:52Z

Test build #90096 has finished for PR 21224 at commit c58baad.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-05-03T11:10:59Z

Test build #90110 has finished for PR 21224 at commit d7dc8a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-05-03T13:09:40Z

@cloud-fan, the change seems fine but would there be any clever trick to test this? Seems we could very likely do the similar thing by mistake.

jiangxb1987

LGTM

HyukjinKwon

Ah, I missed the PR description. So, this will be tested with TaskContext and thread local. Sure, we can talk about it there.

LGTM too.

dongjoon-hyun · 2018-05-03T17:21:53Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala

      sparkSession.sessionState.conf.parquetFilterPushDown
    // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
    val returningBatch = supportBatch(sparkSession, resultSchema)
+    val pushDownDate = sqlConf.parquetFilterPushDownDate


Can we pass pushed instead of declaring new pushDownDate?
The following can be handled at line 345 here, not inside (file: PartitionedFile) => {}

// Try to push down filters when filter push-down is enabled. val pushed = if (enableParquetFilterPushDown) { filters // Collects all converted Parquet filter predicates. Notice that not all predicates can be // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap` // is used here. .flatMap(new ParquetFilters(pushDownDate).createFilter(requiredSchema, _)) .reduceOption(FilterApi.and) } else { None }

no we can't, see #21086

Ah, I see. Thank you, @cloud-fan !

cloud-fan · 2018-05-04T01:28:53Z

thanks, merging to master!

HyukjinKwon · 2018-05-04T01:30:26Z

and branch 2-3 too ..?

cloud-fan · 2018-05-04T04:53:06Z

I realized #21086 is only in master, so this bug doesn't exist in 2.3

… on the driver ## What changes were proposed in this pull request? This is a followup of apache#20136 . apache#20136 didn't really work because in the test, we are using local backend, which shares the driver side `SparkEnv`, so `SparkEnv.get.executorId == SparkContext.DRIVER_IDENTIFIER` doesn't work. This PR changes the check to `TaskContext.get != null`, and move the check to `SQLConf.get`, and fix all the places that violate this check: * `InMemoryTableScanExec#createAndDecompressColumn` is executed inside `rdd.map`, we can't access `conf.offHeapColumnVectorEnabled` there. apache#21223 merged * `DataType#sameType` may be executed in executor side, for things like json schema inference, so we can't call `conf.caseSensitiveAnalysis` there. This contributes to most of the code changes, as we need to add `caseSensitive` parameter to a lot of methods. * `ParquetFilters` is used in the file scan function, which is executed in executor side, so we can't can't call `conf.parquetFilterPushDownDate` there. apache#21224 merged * `WindowExec#createBoundOrdering` is called on executor side, so we can't use `conf.sessionLocalTimezone` there. apache#21225 merged * `JsonToStructs` can be serialized to executors and evaluate, we should not call `SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA)` in the body. apache#21226 merged ## How was this patch tested? existing test Author: Wenchen Fan <[email protected]> Closes apache#21190 from cloud-fan/minor.

cloud-fan mentioned this pull request May 3, 2018

[SPARK-22938][SQL][followup] Assert that SQLConf.get is accessed only on the driver #21190

Closed

ParquetFilters should not access SQLConf at executor side

d7dc8a8

cloud-fan force-pushed the minor2 branch from c58baad to d7dc8a8 Compare May 3, 2018 07:21

jiangxb1987 approved these changes May 3, 2018

View reviewed changes

HyukjinKwon approved these changes May 3, 2018

View reviewed changes

dongjoon-hyun reviewed May 3, 2018

View reviewed changes

asfgit closed this in 0c23e25 May 4, 2018

[SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side #21224

[SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side #21224

Uh oh!

Conversation

cloud-fan commented May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented May 3, 2018

Uh oh!

SparkQA commented May 3, 2018

Uh oh!

SparkQA commented May 3, 2018

Uh oh!

HyukjinKwon commented May 3, 2018

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 4, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun May 4, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 4, 2018

Uh oh!

HyukjinKwon commented May 4, 2018

Uh oh!

cloud-fan commented May 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan commented May 3, 2018 •

edited

Loading

dongjoon-hyun May 3, 2018 •

edited

Loading