Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Sep 21, 2018

What changes were proposed in this pull request?

We need create a new trait to replace BenchmarkWithCodegen as BenchmarkWithCodegen extends from SparkFunSuite.

For example. when doing AggregateBenchmark refactor.
Before this change:

object AggregateBenchmark extends BenchmarkBase {

  lazy val sparkSession = SparkSession.builder
    .master("local[1]")
    .appName(this.getClass.getSimpleName)
    .config("spark.sql.shuffle.partitions", 1)
    .config("spark.sql.autoBroadcastJoinThreshold", 1)
    .getOrCreate()

  /** Runs function `f` with whole stage codegen on and off. */
  def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    val benchmark = new Benchmark(name, cardinality, output = output)

    benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
      f
    }

    benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
      f
    }

    benchmark.run()
  }

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

After this change:

object AggregateBenchmark extends SqlBasedBenchmark {

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

All these benchmarks will use this trait:

AggregateBenchmark
BenchmarkWideTable
JoinBenchmark
MiscBenchmark
ObjectHashAggregateExecBenchmark
SortBenchmark
UnsafeArrayDataBenchmark

How was this patch tested?

manual tests

@wangyum
Copy link
Member Author

wangyum commented Sep 21, 2018

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96452 has finished for PR 22522 at commit 275cc6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Common base trait for micro benchmarks that are supposed to run standalone (i.e. not together
* with other benchmarks).
*/
private[benchmark] trait RunBenchmarkWithCodegen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this extend BenchmarkBase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove private[benchmark] to be consistent with other benchmark classes.

Copy link
Member Author

@wangyum wangyum Sep 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends BenchmarkBase and add getSparkSession function thus subclass can build their own SparkSession:

trait RunBenchmarkWithCodegen extends BenchmarkBase {
val spark: SparkSession = getSparkSession
/** Subclass can override this function to build their own SparkSession */
def getSparkSession: SparkSession = {
SparkSession.builder()
.master("local[1]")
.appName(this.getClass.getCanonicalName)
.config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
.config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
.getOrCreate()
}
/** Runs function `f` with whole stage codegen on and off. */
def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
val benchmark = new Benchmark(name, cardinality, output = output)
benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = false)
f
}
benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = true)
f
}
benchmark.run()
}
}

@cloud-fan
Copy link
Contributor

I think this change is necessary, but I'd like to migrate one benchmark to use this new trait as an example. We can migrate others in follow up PRs.

@wangyum
Copy link
Member Author

wangyum commented Sep 22, 2018

Thanks @cloud-fan I have migrate AggregateBenchmark to use new trait.

@gengliangwang
Copy link
Member

@wangyum I have left my comment in #22484 .
Also, should we close this one and move to #22484 ?

@wangyum wangyum closed this Sep 22, 2018
@wangyum wangyum reopened this Sep 29, 2018
@wangyum
Copy link
Member Author

wangyum commented Sep 29, 2018

cc @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Sep 29, 2018

Test build #96783 has finished for PR 22522 at commit 20668ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper

@wangyum wangyum closed this Oct 1, 2018
@wangyum wangyum deleted the SPARK-25510 branch October 1, 2018 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants