[SPARK-7449][SQL]: RDD Schema mismatch fix #5986

zhzhan · 2015-05-07T20:05:33Z

There may be mismatch between RDD schema and relation schema, I think the conversion should use RDD schema.

AmplabJenkins · 2015-05-07T20:07:11Z

Merged build triggered.

AmplabJenkins · 2015-05-07T20:07:18Z

Merged build started.

SparkQA · 2015-05-07T20:08:05Z

Test build #32145 has started for PR 5986 at commit c16babb.

marmbrus · 2015-05-07T20:10:58Z

test case?

zhzhan · 2015-05-07T20:42:46Z

@marmbrus I will find a test case for it.

I hit this issue when the inside the relation I didn't overwrite needConversion using following.
override def needConversion = false

createPhysicalRDD try to use rowToRowRdd to convert to catalyst type. For example, if we do
select b, c from table where a > 1, the output is b, c, a, but the table schema is a, b, c. It have type conflicts.

SparkQA · 2015-05-07T22:02:59Z

Test build #32145 has finished for PR 5986 at commit c16babb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-07T22:03:03Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-07T22:03:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32145/
Test PASSed.

…ource input conversion In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns. This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests: > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested. Closes #5986. Author: Josh Rosen <[email protected]> Author: Cheng Lian <[email protected]> Author: Cheng Lian <[email protected]> Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits: e71c866 [Josh Rosen] Re-fix bug so that the tests pass again 56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites 2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator 6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion. 5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858 8ba195c [Cheng Lian] Merge 9968fba into 6166473 9968fba [Cheng Lian] Tests the data type conversion code paths (cherry picked from commit 0c33c7b) Signed-off-by: Yin Huai <[email protected]>

…ource input conversion In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns. This patch incorporates apache#6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests: > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested. Closes apache#5986. Author: Josh Rosen <[email protected]> Author: Cheng Lian <[email protected]> Author: Cheng Lian <[email protected]> Closes apache#6400 from JoshRosen/SPARK-7858 and squashes the following commits: e71c866 [Josh Rosen] Re-fix bug so that the tests pass again 56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites 2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator 6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion. 5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858 8ba195c [Cheng Lian] Merge 9968fba into 6166473 9968fba [Cheng Lian] Tests the data type conversion code paths

SPARK-7449: Schema fix

c16babb

zhzhan changed the title ~~[SPARK-7449][SQL]: Schema fix~~ [SPARK-7449][SQL]: RDD Schema mismatch fix May 7, 2015

JoshRosen mentioned this pull request May 26, 2015

[SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion #6400

Closed

asfgit closed this in 0c33c7b May 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[SPARK-7449][SQL]: RDD Schema mismatch fix #5986

[SPARK-7449][SQL]: RDD Schema mismatch fix #5986

Uh oh!

zhzhan commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

marmbrus commented May 7, 2015

Uh oh!

zhzhan commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[SPARK-7449][SQL]: RDD Schema mismatch fix #5986

[SPARK-7449][SQL]: RDD Schema mismatch fix #5986

Uh oh!

Conversation

zhzhan commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

marmbrus commented May 7, 2015

Uh oh!

zhzhan commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants