Skip to content

Conversation

peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

  1. Refactor DataSourceReadBenchmark

How was this patch tested?

Manually tested and regenerated results.

SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark"

Change-Id: Icfd0484c8e0fef2ed0b184e09e52db9432e0a250
* spark-submit --class <this class> <spark sql test jar>
* To run this benchmark:
* {{{
* 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @peter-toth . Could you run this command actually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/spark-submit --class org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark --jars core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

does work for me, but I checked in FilterPushdownBenchmark and it seems we don't mention other required jars.
Shall I modify the command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We noticed that the required jars are introduced during refactoring. So, we start to fix this guide in recent PRs like this.

* spark-submit --class <this class> <spark sql test jar>
* To run this benchmark:
* {{{
* 1. without sbt: bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run dev/scalastyle and fix this in your branch?

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Oct 10, 2018

Test build #97227 has finished for PR 22664 at commit cf61f1c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @peter-toth .
Could you review and merge peter-toth#1 which contains the result on EC2 r3.xlarge?

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97236 has finished for PR 22664 at commit 5bccfc6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Change-Id: If4fcfc27eb808c08246a8f7779fbe38a437a41a4
…o SPARK-25662

Change-Id: Ie5b0a3fa70b605c1655e3328f2c92ff179805f7d
@dongjoon-hyun
Copy link
Member

Could you add [SQL] before [TEST], too?

@peter-toth peter-toth changed the title [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to use main method [SPARK-25662][SQL][TEST] Refactor DataSourceReadBenchmark to use main method Oct 11, 2018
@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97237 has finished for PR 22664 at commit 7cef8db.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Hi, @dbtsai .
Since this is DataSourceReadBenchmark, could you review and merge this?

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97241 has finished for PR 22664 at commit 7cef8db.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dbtsai
Copy link
Member

dbtsai commented Oct 11, 2018

Thanks @dongjoon-hyun for ping me. LGTM too. We're working on some parquet reader improvement, and this will be useful.

@SparkQA
Copy link

SparkQA commented Oct 11, 2018

Test build #97272 has finished for PR 22664 at commit 7cef8db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 8115e6b Oct 11, 2018
@dongjoon-hyun
Copy link
Member

Thank you, @dbtsai and @peter-toth .

@peter-toth
Copy link
Contributor Author

Thanks for the review @dongjoon-hyun and @dbtsai .
I have one question though, I still don't see https://issues.apache.org/jira/browse/SPARK-25662 assigned to me. Could you please look into it?

@dbtsai
Copy link
Member

dbtsai commented Oct 12, 2018

@peter-toth I assigned to you. Thanks for contribution.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
… method

## What changes were proposed in this pull request?

1. Refactor DataSourceReadBenchmark

## How was this patch tested?

Manually tested and regenerated results.
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark"
```

Closes apache#22664 from peter-toth/SPARK-25662.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants