Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

As the discussion in #16481 and #18975 (comment)
Currently the BaseRelation returned by dataSource.writeAndRead only used in CreateDataSourceTableAsSelect, planForWriting and writeAndRead has some common code paths.
In this patch I removed the writeAndRead function and added the getRelation function which only use in CreateDataSourceTableAsSelectCommand while saving data to non-existing table.

How was this patch tested?

Existing UT

@SparkQA
Copy link

SparkQA commented Dec 11, 2017

Test build #84707 has finished for PR 19941 at commit 1481163.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 11, 2017

Test build #84730 has finished for PR 19941 at commit 1481163.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

Maybe something wrong with the Jenkins env? I found the build from 84707 to 84733 failed in same reason:

[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/R/run-tests.sh ; received return code 255


providingClass.newInstance() match {
case dataSource: CreatableRelationProvider =>
case dataSource: RelationProvider =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the reviewers, current implementation here made datasource should both inherit RelationProvider and CreatableRelationProvider(like the UT), because here just want get the relation by using RelationProvider.createRelation but not save a DF to a destination by using CreatableRelationProvider.createRelation. I need more advise to do better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different public traits. We are unable to change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If here use CreatableRelationProvider.createRelation, will it write to destination one more time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will write to the destination, but InsertIntoDataSourceDirCommand only supports FileFormat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84754 has finished for PR 19941 at commit 1481163.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84759 has finished for PR 19941 at commit 1481163.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists
try {
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query))
dataSource.writeAndRead(saveMode, query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing it to

      sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)).toRdd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented like this at first, but after checking this patch (https://github.com/apache/spark/pull/18064/files), I changed to current implementation, is the wrapping of execution id unnecessary here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not needed. You can try it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, test the query "INSERT OVERWRITE DIRECTORY '/home/liyuanjian/tmp' USING json SELECT 1 AS a, 'c' as b;".
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we revert all the changes except this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert done.
Sorry, maybe I misunderstand your words of 'get rid of dataSource.writeAndRead'. Like you and Wenchen's discussion in #16481, shouldn't we make writeAndRead just return a BaseRelation without write to the destination? Thank you for your patient reply.

@SparkQA
Copy link

SparkQA commented Dec 14, 2017

Test build #84916 has finished for PR 19941 at commit 41a2c15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84939 has finished for PR 19941 at commit 5221c7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 3775dd3 Dec 15, 2017
@xuanyuanking xuanyuanking deleted the SPARK-22753 branch December 15, 2017 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants