-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27271][SQL] Migrate Text to File Data Source V2 #24207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #103922 has finished for PR 24207 at commit
|
|
Test build #103943 has finished for PR 24207 at commit
|
|
retest this please |
|
Test build #103962 has finished for PR 24207 at commit
|
|
Test build #104004 has finished for PR 24207 at commit
|
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| val maxSplitBytes = FilePartition.maxSplitBytes(sparkSession, selectedPartitions) | ||
| val splitFiles = selectedPartitions.flatMap { partition => | ||
| partition.files.flatMap { file => | ||
| partition.files.filter(_.getLen > 0).flatMap { file => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@cloud-fan @HyukjinKwon @dongjoon-hyun This is ready for review. |
|
Test build #104025 has finished for PR 24207 at commit
|
|
Looks good too. Let me take a closer look if that's not merged soon before getting it in. |
|
@cloud-fan @HyukjinKwon @dongjoon-hyun This is ready for review again. |
|
Thank you for updating, @gengliangwang . I'll take a look today. |
|
Test build #104141 has finished for PR 24207 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOutputWriter.scala
Outdated
Show resolved
Hide resolved
...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/text/TextDataSourceV2.scala
Outdated
Show resolved
Hide resolved
...in/scala/org/apache/spark/sql/execution/datasources/v2/text/TextPartitionReaderFactory.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/text/TextTable.scala
Outdated
Show resolved
Hide resolved
|
@gengliangwang . The PR looks correct to me. I left only a few minor comments. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
|
I am working on another PR about file source V2 framework. I will revisit this one later. |
|
Test build #104381 has finished for PR 24207 at commit
|
|
@dongjoon-hyun @HyukjinKwon @cloud-fan This PR is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you, @gengliangwang , @HyukjinKwon , @MaxGekk .
…FileSuite ### What changes were proposed in this pull request? Add V1/V2 tests for TextSuite and WholeTextFileSuite ### Why are the changes needed? This is missing part since #24207. We should have these tests for test coverage. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit tests. Closes #28335 from gengliangwang/testV2Suite. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…FileSuite ### What changes were proposed in this pull request? Add V1/V2 tests for TextSuite and WholeTextFileSuite ### Why are the changes needed? This is missing part since #24207. We should have these tests for test coverage. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit tests. Closes #28335 from gengliangwang/testV2Suite. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 16b9615) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Migrate Text source to File Data Source V2
How was this patch tested?
Unit test