-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion #6400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9968fba
8ba195c
5a00e66
6cd7366
2169a0f
56b13e5
e71c866
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,12 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { | |
| df.filter('a > 1 && 'p1 < 2).select('b, 'p1), | ||
| for (i <- 2 to 3; _ <- Seq("foo", "bar")) yield Row(s"val_$i", 1)) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JoshRosen Found that I made a mistake in this // Simple projection and partition pruning
checkAnswer(
df.filter('a > 1 && 'p1 < 2).select('b, 'p1),
for (i <- 2 to 3; j <- Seq("foo", "bar")) yield Row(s"val_$i", j))There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure that e.g. at the top of val partitionedTestDF1 = (for {
i <- 1 to 3
p2 <- Seq("foo", "bar")
} yield (i, s"val_$i", 1, p2)).toDF("a", "b", "p1", "p2")
val partitionedTestDF2 = (for {
i <- 1 to 3
p2 <- Seq("foo", "bar")
} yield (i, s"val_$i", 2, p2)).toDF("a", "b", "p1", "p2")There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think the type of cc @liancheng There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, sorry my bad. |
||
|
|
||
| // Project many copies of columns with different types (reproduction for SPARK-7858) | ||
| checkAnswer( | ||
| df.filter('a > 1 && 'p1 < 2).select('b, 'b, 'b, 'b, 'p1, 'p1, 'p1, 'p1), | ||
| for (i <- 2 to 3; _ <- Seq("foo", "bar")) | ||
| yield Row(s"val_$i", s"val_$i", s"val_$i", s"val_$i", 1, 1, 1, 1)) | ||
|
|
||
| // Self-join | ||
| df.registerTempTable("t") | ||
| withTempTable("t") { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a while loop will be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the return value of the
mapPartitionscall, which must be a Scala iterator. Note that we do use a while loop to iterate over the columns.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am fine with the current version. We are not slower than the previous version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably faster than the older version since it doesn't use an unnecessary bufferedIterator :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also realized that after I made my comment.