Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented May 8, 2018

What changes were proposed in this pull request?

This pr added benchmark code DataSourceReadBenchmark for orc, paruqet, csv, and json based on the existing ParquetReadBenchmark and OrcReadBenchmark.

How was this patch tested?

N/A

@maropu
Copy link
Member Author

maropu commented May 8, 2018

I'll add benchmark results just after #21070 merged.

@maropu
Copy link
Member Author

maropu commented May 8, 2018

Also, I'll make a follow-up pr for pushdown benchmarks;
master...maropu:UpdateParquetBenchmark

@maropu
Copy link
Member Author

maropu commented May 8, 2018

@gatorsmile @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented May 8, 2018

Test build #90359 has finished for PR 21266 at commit 09b3920.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu . Since you are merging ParquetReadBenchmark and OrcReadBenchmark benchmarks, let's remove OrcReadBenchmark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we still need to OrcReadBenchmark to compare native orc with Hive built-in orc?

Copy link
Member

@dongjoon-hyun dongjoon-hyun May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Never mind. I deleted the previous comment. The scope is only testing native orc here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks!

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90392 has finished for PR 21266 at commit 2813706.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the DataSourceReadBenchmark branch from 8aedbf0 to 171e89a Compare May 10, 2018 04:23
@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90437 has finished for PR 21266 at commit 8aedbf0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the DataSourceReadBenchmark branch from 171e89a to 1d93d99 Compare May 10, 2018 04:26
@SparkQA
Copy link

SparkQA commented May 10, 2018

Test build #90438 has finished for PR 21266 at commit 1d93d99.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the DataSourceReadBenchmark branch from 1d93d99 to fc96adb Compare May 14, 2018 08:35
@SparkQA
Copy link

SparkQA commented May 14, 2018

Test build #90572 has finished for PR 21266 at commit fc96adb.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 14, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 14, 2018

Test build #90579 has finished for PR 21266 at commit fc96adb.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this type of comments should not be included by scala file in this directory.
If you want to put this comment, this scala file should be put into sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ where files are not translated to doc.

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/DataSourceReadBenchmark.java:5: error: unknown tag: this
[error]  *  spark-submit --class <this class> <spark sql test jar>
[error]                          ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/DataSourceReadBenchmark.java:5: error: unknown tag: spark
[error]  *  spark-submit --class <this class> <spark sql test jar>
[error]     

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@maropu maropu force-pushed the DataSourceReadBenchmark branch from 3b6f541 to fad31b7 Compare May 21, 2018 04:03
@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90875 has finished for PR 21266 at commit 3b6f541.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the DataSourceReadBenchmark branch from fad31b7 to d8c308f Compare May 21, 2018 04:15
@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90877 has finished for PR 21266 at commit fad31b7.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90879 has finished for PR 21266 at commit d8c308f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 21, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 21, 2018

Test build #90884 has finished for PR 21266 at commit d8c308f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us explicitly set these confs? Here, we are expecting the perf number when ORC_COPY_BATCH_TO_SPARK is set to false. Please also double check the other benchmarks and add the related confs too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that ORC_COPY_BATCH_TO_SPARK=false in other tests (I didn't find performance differences after explicitly setting false in line 50.
https://github.com/apache/spark/pull/21266/files#diff-ae11b49db05c9e6829cad071b112a742R50

@gatorsmile
Copy link
Member

@maropu Great work! Thanks for helping this!

@maropu maropu force-pushed the DataSourceReadBenchmark branch from 274aa8a to 5eab1a5 Compare May 23, 2018 02:37
@maropu
Copy link
Member Author

maropu commented May 23, 2018

I'll update the benchmark results soon.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91012 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91017 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91036 has finished for PR 21266 at commit 5eab1a5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented May 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91049 has finished for PR 21266 at commit 5eab1a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in 84557bc May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants