[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing #23258

seancxmao · 2018-12-08T04:07:12Z

What changes were proposed in this pull request?

#20560/SPARK-23375 introduced an optimizer rule to eliminate redundant Sort. For a test case named "Sort metrics" in SQLMetricsSuite, because range is already sorted, sort is removed by the RemoveRedundantSorts, which makes this test case meaningless.

This PR modifies the query for testing Sort metrics and checks Sort exists in the plan.

How was this patch tested?

Modify the existing test case.

maropu · 2018-12-08T04:46:33Z

cc: @mgaido91

maropu · 2018-12-08T04:47:15Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

nit: unfold?

Ok, unfolded.

maropu · 2018-12-08T04:52:06Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

assert(df.queryExecution.executedPlan.find(_.isInstanceOf[SortExec]).isDefined)?

OK, I think this is more readable. fixed.

maropu · 2018-12-08T05:01:22Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

We need to use range here? How about just writing Seq(1, 3, 2, ...).toDF("id")?

Either is fine to me as we now add assert to make sure Sort node exist.

Using Seq instead of Range, we makes things simpler and more readable. I'll change to use Seq.

SparkQA · 2018-12-08T08:05:01Z

Test build #99856 has finished for PR 23258 at commit 6e36336.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-08T08:05:02Z

Test build #99852 has finished for PR 23258 at commit 408ccf8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-08T08:26:52Z

retest this please.

SparkQA · 2018-12-08T12:03:14Z

Test build #99864 has finished for PR 23258 at commit 6e36336.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-12-08T12:08:32Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

Thanks for pinging me @maropu. What is the point about checking that LocalTableScan contains no metrics?

I checked the original PR which introduced this UT by @sameeragarwal who can maybe help us stating the goal of the test here (unless someone else can answer me, because I have not understood it). It doesn't seem even related to the Sort operator to me. Maybe I am missing something.

@mgaido91 This case tries to check Sort (nodeId=0) metrics, rather than LocalTableScan. The second parameter (2) of testSparkPlanMetrics(df, 2, Map.empty) means expectedNumOfJobs rather than nodeId. The third parameter expectedMetrics will pass nodeId together with corresponding expected metrics. Because metrics of Sort node (including sortTime, peakMemory, spillSize) may change during each execution, unlike metrics like numOutputRows, we have no way to check these values.

can we check the metrics of SortExec here?

@cloud-fan This case tries to check metrics of SortExec, however these metrics (sortTime, peakMemory, spillSize) change each time the query is executed, they are not fixed. So far what I did is to check whether SortExec exists. Do you mean we should further check whether these metrics names exist? Though we can't know their values beforehand.

can we just check something like sortTime > 0?

+1 for @cloud-fan suggestion. I mean, if we cannot check their exact value, we should at least check that they exist/have reasonable values. Otherwise this UT is useless.

This makes sense! Let me try.

@seancxmao what do you think about updating this?

@srowen Thanks for reminding me. I'll update this later this week. So busy these days...

…ssing

SparkQA · 2018-12-25T08:05:01Z

Test build #100431 has finished for PR 23258 at commit 4ee2c8d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-25T08:05:01Z

Test build #100430 has finished for PR 23258 at commit 0f514fd.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-25T08:11:39Z

retest this please.

SparkQA · 2018-12-25T09:52:33Z

Test build #100432 has finished for PR 23258 at commit 4ee2c8d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-25T12:34:22Z

retest this please.

SparkQA · 2018-12-25T14:21:31Z

Test build #100436 has finished for PR 23258 at commit 4ee2c8d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-12-25T18:21:02Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+  }
+
+  private def stringToDuration(str: String): (Float, String) = {
+    val matcher = Pattern.compile("([0-9]+(\\.[0-9]+)?) (ms|s|m|h)").matcher(str)


we can compile the pattern only once here and in the other cases

@seancxmao can you address this comment? Thanks!

OK, I'll fix it.

Fixed with a new commit. I also extracted Pattern.compile("([0-9]+(\\.[0-9]+)?) (ms|s|m|h)").

mgaido91 · 2018-12-25T18:27:36Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+    assert(operatorName == "Sort")
+    // Check metrics values
+    val sortTimeStr = sortMetrics._2.get("sort time total (min, med, max)").get.toString
+    timingMetricStats(sortTimeStr).foreach { case (sortTime, _) => assert(sortTime >= 0) }


I think we can just check that the sum is (strictly) greater than 0. Checking that everything is >= 0. This can also simplify the whole code and avoid to add too many methods.

According to SortExec, sort time may be 0 because it is converted from nano-seconds to milli-seconds.

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala

Line 109 in 4ee2c8d

sortTime += sorter.getSortTimeNanos / 1000000

I tried to run with "sortTime > 0", sometimes it failed.

I see, but here you are checking also the min IIUC. Can the sum be 0 too?

Yes, I'm checking all 4 stats, including sum, min, med and max. And sum could also be 0. I ran the same query with spark-shell locally. below's the screenshot.

nit: assert(timingMetricStats(sortTimeStr).forall { case (sortTime, _) => sortTime >= 0 })?

Yes, it's better.

Fixed with a new commit.

I see, can we add a comment with the reason why it can be 0?

OK, I'll do it.

@mgaido91 I have added a comment to explain the reason with a new commit.

seancxmao · 2018-12-26T02:32:26Z

retest this please.

SparkQA · 2018-12-26T06:20:35Z

Test build #100445 has finished for PR 23258 at commit 4ee2c8d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-12-27T02:59:30Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+    // so Project here is not collapsed into LocalTableScan.
+    val df = Seq(1, 3, 2).toDF("id").sort('id)
+    val metrics = getSparkPlanMetrics(df, 2, Set(0))
+    val sortMetrics = metrics.get.get(0).get


Probably, it might be better to check assert(metrics.isDefined) for safeguard.

Yes, I'll fix it.

Fixed with a new commit.

maropu · 2018-12-27T03:04:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+   */
+  protected def timingMetricStats(metricStr: String): Seq[(Float, String)] = {
+    metricStats(metricStr).map(stringToDuration)
+  }


We need to put these helper functions here? That's because these functions are only used for test("Sort metrics") now ...

Yes, currently these functions are only used for test("Sort metrics"). What SQLMetricsSuite has been checking are almost all integer number metrics (e.g. "number of output rows", "records read", ...). However we should also check non-integer metrics, such as timing metric and size metric. These metrics are in the same format of "total (min, med, max)". These help functions could be used to check all these metrics. Please see the screenshot I posted above to see more timing or size metric examples (shuffle write, shuffle read, ...).

I think we can actually remove all them for now. I think we can just check that the metrics are defined, since we are not really checking their values (the only one for which we are ensuring something is the peak memory...). I'd propose defining a testSparkPlanMetricsPattern which is basically the same as testSparkPlanMetrics but instead of providing a value for each metric, we pass a pattern. What do you think?

It's a great idea to add a method similar to testSparkPlanMetrics. Let me try. I'd like to slightly change the method name to testSparkPlanMetricsWithPredicates, since we are actually passing in predicates.

As for checking metrics, checking ">= 0" is better than just checking whether it is defined. because size or timing SQLMetric could be initialized by non-0 values, e.g. -1.

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala

Lines 109 to 125 in 1e55f31

def createSizeMetric(sc: SparkContext, name: String): SQLMetric = {

// The final result of this metric in physical operator UI may look like:

// data size total (min, med, max):

// 100GB (100MB, 1GB, 10GB)

val acc = new SQLMetric(SIZE_METRIC, -1)

acc.register(sc, name = Some(s"$name total (min, med, max)"), countFailedValues = false)

acc

}

def createTimingMetric(sc: SparkContext, name: String): SQLMetric = {

// The final result of this metric in physical operator UI may looks like:

// duration(min, med, max):

// 5s (800ms, 1s, 2s)

val acc = new SQLMetric(TIMING_METRIC, -1)

acc.register(sc, name = Some(s"$name total (min, med, max)"), countFailedValues = false)

acc

}

In a new commit, I have added SQLMetricsTestUtils#testSparkPlanMetricsWithPredicates. In such a way, we simply need to provide a test spec in test("Sort metrics") to make the test case declarative rather than procedural.

To simplify timing and size metric testing, I added 2 common predicates, timingMetricAllStatsShould and sizeMetricAllStatsShould. These could be used for other metrics as long as they are timing or size metrics.

And I also modified the original testSparkPlanMetrics to make it a special case of testSparkPlanMetricsWithPredicates, where each expected metric value is converted to an equality predicate. This eliminated duplicate code as testSparkPlanMetrics and testSparkPlanMetricsWithPredicates are almost the same.

SparkQA · 2018-12-27T07:44:53Z

Test build #100466 has finished for PR 23258 at commit 1e55f31.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-27T08:00:51Z

retest this please.

SparkQA · 2018-12-27T12:21:10Z

Test build #100468 has finished for PR 23258 at commit 1e55f31.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91

@seancxmao I am quite against this idea of checking a predicate for sum, min, max, med...they can be different, and I'd argue that most likely we may be interested only in a specific one. I think the current approach with the new method is fine, but I think that we can use the regex in order to test that values are valid, rather than parsing the strings and then apply predicates. In this way we can also remove all the new added methods (we just need testSparkPlanMetricsWithPredicates).

mgaido91 · 2018-12-28T10:52:18Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+   * Call `df.collect()` and verify if the collected metrics satisfy the specified predicates.
+   * @param df `DataFrame` to run
+   * @param expectedNumOfJobs number of jobs that will run
+   * @param expectedMetricsPredicates the expected metrics predicates. The format is


nit: go to 100 chars and the next line has a bad indentation

Because usually metric values are numbers, so for metrics values, predicates could be more natural than regular expressions which are more suitable for text matching. For simple metric values, helper functions are not needed. However, timing and size metric values are a little complex:

timing metric value example: "\n96.2 MB (32.1 MB, 32.1 MB, 32.1 MB)"

size metric value example: "\n2.0 ms (1.0 ms, 1.0 ms, 1.0 ms)"

With helper functions, we extract stats (by timingMetricStats or sizeMetricStats method), then we can apply predicates to check any stats (all stats or any single one). timingMetricAllStatsShould and sizeMetricAllStatsShould are not required, they are something like syntax sugar to eliminate boilerplate code since timing and size metrics are frequently used. If we want to check any single value (e.g sum >=0), we can provide a predicate like below:

timingMetricStats(_)(0)._1 >= 0

BTW, may be timing and size metric values should be stored in a more structured way rather than pure text format (even with "\n" in values).

Yes, indentation is not right. I have fixed it in the new commit.

my point is: as of now, pattern matching is enough for what we need to check and we do not have a use case when we actually need to parse the exact values. Doing that, we can simplify this PR and reduce considerably the size of this change. So I think we should go this way. If in the future we will need something like you proposed here because we want to check the actual values, then we can introduce all the methods you are suggesting here. But as of know this can be skipped IMO.

This does look like a load of additional code that I think duplicates some existing code in Utils? is it really necessary to make some basic assertions about metric values?

@mgaido91 I agree. Thanks for your detailed and clear explanation. Checking metric values do make things unnecessarily complex.

@srowen As @mgaido91 said, currently it is not necessary to check metric values, pattern matching is enough, and we could eliminate these methods. As for code duplication, methods here are not duplicate with code in Utils. Utils provides a bunch of methods to do conversion between string and bytes, bytes there are of type Long. However bytes in metric values are of type Float, e.g. 96.2 MB.

Hi, I have switched to pattern matching and also removed unnecessary helper methods in the new commit.

SparkQA · 2018-12-28T14:15:33Z

Test build #100495 has finished for PR 23258 at commit c3336d8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-28T19:30:42Z

Test build #100505 has finished for PR 23258 at commit 75d0c08.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-29T08:05:02Z

Test build #100525 has finished for PR 23258 at commit 386a7e5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-29T08:12:18Z

retest this please.

SparkQA · 2018-12-29T12:04:37Z

Test build #100528 has finished for PR 23258 at commit 386a7e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-12-29T20:28:47Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+    // Because of SPARK-25267, ConvertToLocalRelation is disabled in the test cases of sql/core,
+    // so Project here is not collapsed into LocalTableScan.
+    val df = Seq(1, 3, 2).toDF("id").sort('id)
+    testSparkPlanMetrics(df, 2, Map(


I think we can get back to the previous idea of having another method for testing a predicate. And use the patters there to validate it. I am not a fan of introducing an enum for this.

Because expectedMetrics could provide several different types of information, including values, compiled pattern objects, pattern strings, or even predicate functions in future, we need to provide a method for each type. I'm OK to add new methods for different types rather than to use a single method with different flags, both has pros and cons. Just want to know more your idea.

Let me clarify. I'd introduce a testSparkPlanMetricsWithPredicates as you did before (you can take it from your previous commit). And here in the predicate we can use the patterns you introduced for checking them, so something like:

val timingMetricPattern = Pattern.compile(s"\\n$duration \$$duration, $duration, $duration\$") val sizeMetricPattern = Pattern.compile(s"\\n$bytes \$$bytes, $bytes, $bytes\$") def checkPattern(pattern: Pattern): (Any => Boolean) = { (in: Any) => pattern.matcher(in.toString).matches() } testSparkPlanMetricsWithPredicates(df, 2, Map( 0L -> (("Sort", Map( "sort time total (min, med, max)" -> checkPattern(sizeMetricPattern), "peak memory total (min, med, max)" -> checkPattern(timingMetricPattern), "spill size total (min, med, max)" -> checkPattern(sizeMetricPattern))))))

Got it. Thanks for your explanation and example.

I have changed to use testSparkPlanMetricsWithPredicates and checkPattern together.

mgaido91 · 2018-12-30T13:01:49Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+  protected val timingMetricPattern =
+    Pattern.compile(s"\\n$duration \\($duration, $duration, $duration\\)")
+
+  /** Generate a function to check the specified pattern.


nit: in the next line

mgaido91 · 2018-12-30T13:02:09Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+
+  /** Generate a function to check the specified pattern.
+   *
+   * @param pattern a pattern


not very useful, we can remove it

removed checkPattern method.

mgaido91 · 2018-12-30T13:03:32Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

        for (metricName <- expectedMetricsMap.keySet) {
          assert(expectedMetricsMap(metricName).toString === actualMetricsMap(metricName))
        }
      }


we can update this in order to avoid code duplication and reuse testSparkPlanMetrics.

I have updated testSparkPlanMetrics to invoke testSparkPlanMetricsWithPredicates to avoid code duplication.

mgaido91 · 2018-12-30T13:03:54Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+   * @param df `DataFrame` to run
+   * @param expectedNumOfJobs number of jobs that will run
+   * @param expectedMetricsPredicates the expected metrics predicates. The format is
+   * `nodeId -> (operatorName, metric name -> metric value predicate)`.


nit: indent

Fixed indentation.

mgaido91 · 2018-12-30T13:04:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala


  protected def statusStore: SQLAppStatusStore = spark.sharedState.statusStore

+  protected val bytes = "([0-9]+(\\.[0-9]+)?) (EiB|PiB|TiB|GiB|MiB|KiB|B)"


nit: private? and maybe close to where it is used?

I have inlined this in an initializer as @srowen suggested.

mgaido91 · 2018-12-30T13:04:46Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala


+  protected val bytes = "([0-9]+(\\.[0-9]+)?) (EiB|PiB|TiB|GiB|MiB|KiB|B)"
+
+  protected val duration = "([0-9]+(\\.[0-9]+)?) (ms|s|m|h)"


I have inlined this in an initializer as @srowen suggested.

mgaido91 · 2018-12-30T13:05:37Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+
+  protected val duration = "([0-9]+(\\.[0-9]+)?) (ms|s|m|h)"
+
+  // "\n96.2 MiB (32.1 MiB, 32.1 MiB, 32.1 MiB)"


Shall we say something more here? A line which explains what this is and then eg. and your example is fine IMHO.

I have added more comments.

srowen · 2018-12-30T12:57:25Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+  protected val duration = "([0-9]+(\\.[0-9]+)?) (ms|s|m|h)"
+
+  // "\n96.2 MiB (32.1 MiB, 32.1 MiB, 32.1 MiB)"
+  protected val sizeMetricPattern = Pattern.compile(s"\\n$bytes \\($bytes, $bytes, $bytes\\)")


Add .r to the end of these strings to make them a scala.util.matching.Regex automatically. That's more idiomatic for Scala. No need to import and use Java's Pattern.

srowen · 2018-12-30T12:58:11Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala


+  protected val bytes = "([0-9]+(\\.[0-9]+)?) (EiB|PiB|TiB|GiB|MiB|KiB|B)"
+
+  protected val duration = "([0-9]+(\\.[0-9]+)?) (ms|s|m|h)"


private? or you can inline this in an initializer below:

protected val sizeMetricPattern = { val bytes = ... "s\\n$bytes...".r }

srowen · 2018-12-30T13:00:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+      getSparkPlanMetrics(df, expectedNumOfJobs, expectedMetricsPredicates.keySet)
+    optActualMetrics.foreach { actualMetrics =>
+      assert(expectedMetricsPredicates.keySet === actualMetrics.keySet)
+      for (nodeId <- expectedMetricsPredicates.keySet) {


It might be a little cleaner to iterate over (key, value) pairs here and below rather than iterate over keys then get values:

for ((nodeId, (expectedNodeName, expectedMetricsPredicatesMap) <- expectedMetricsPredicates) {

srowen · 2018-12-30T13:09:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

+   * @param pattern a pattern
+   * @return a function to check the specified pattern
+   */
+  protected def checkPattern(pattern: Pattern): (Any => Boolean) = {


Is this method really needed? the only place it's used is the very specific method for testing metrics, and that always provides a regex. Just provide a map to regexes that you check against, rather than whole predicates?

Or, consider not compiling regexes above and keeping them as string patterns. Then, the predicate you pass is just something like sizeMetricPattern.matches(_). It means compiling the regex on every check, but for this test context, that's no big deal.

That would help limit the complexity of all this

I'd like to take the 2nd option.

SparkQA · 2018-12-30T16:00:17Z

Test build #100557 has finished for PR 23258 at commit c496c54.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-30T16:26:47Z

@mgaido91 @srowen I have made changes according to your comments.

srowen

OK, this is looking reasonable in scope for what it's doing

srowen · 2018-12-30T16:59:40Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

        assert(expectedNodeName === actualNodeName)
-        for (metricName <- expectedMetricsMap.keySet) {
-          assert(expectedMetricsMap(metricName).toString === actualMetricsMap(metricName))
+        for (metricName <- expectedMetricsPredicatesMap.keySet) {


You can use a similar iteration over the map here that avoid the keySet and get

Changed in the new commit.

mgaido91

LGTM apart one comment

mgaido91 · 2018-12-30T18:42:32Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

 import org.apache.spark.sql.catalyst.expressions.aggregate.{Final, Partial}
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
-import org.apache.spark.sql.execution.{FilterExec, RangeExec, SparkPlan, WholeStageCodegenExec}
+import org.apache.spark.sql.execution.{FilterExec, RangeExec, SortExec, SparkPlan, WholeStageCodegenExec}


unneeded change

Removed in the new commit.

SparkQA · 2018-12-30T20:09:23Z

Test build #100566 has finished for PR 23258 at commit 3ce0e03.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

seancxmao · 2018-12-30T22:45:50Z

@srowen @mgaido91 I have made changes according to your comments in the new commit.

mgaido91 · 2018-12-31T00:15:00Z

LGTM, thanks for your work on this @seancxmao!

srowen

Looks good, yeah, thanks for cleaning up the existing code a little along the way

seancxmao · 2018-12-31T01:23:12Z

Many thanks to your advices and time, really really helpful. This code could be used for #23224 (in progress), pipelineTime of WholeStageCodegen is just another case of timing metric.

SparkQA · 2018-12-31T02:27:54Z

Test build #100574 has finished for PR 23258 at commit 5e94a3e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2018-12-31T14:24:47Z

Merged to master

…ssing ## What changes were proposed in this pull request? apache#20560/[SPARK-23375](https://issues.apache.org/jira/browse/SPARK-23375) introduced an optimizer rule to eliminate redundant Sort. For a test case named "Sort metrics" in `SQLMetricsSuite`, because range is already sorted, sort is removed by the `RemoveRedundantSorts`, which makes this test case meaningless. This PR modifies the query for testing Sort metrics and checks Sort exists in the plan. ## How was this patch tested? Modify the existing test case. Closes apache#23258 from seancxmao/sort-metrics. Authored-by: seancxmao <[email protected]> Signed-off-by: Sean Owen <[email protected]>

seancxmao mentioned this pull request Dec 8, 2018

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

Closed

maropu reviewed Dec 8, 2018

View reviewed changes

mgaido91 reviewed Dec 8, 2018

View reviewed changes

seancxmao added 4 commits December 25, 2018 15:16

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is mi…

d40109f

…ssing

fix according to comments: make query more readable

42c77f2

check Sort metrics values

68dbdd7

MB to MiB per SPARK-25696

4ee2c8d

seancxmao force-pushed the sort-metrics branch from 0f514fd to 4ee2c8d Compare December 25, 2018 07:41

mgaido91 reviewed Dec 25, 2018

View reviewed changes

maropu reviewed Dec 27, 2018

View reviewed changes

fix according to comments

1e55f31

add testSparkPlanMetricsWithPredicates and comments for sort time

c3336d8

mgaido91 reviewed Dec 28, 2018

View reviewed changes

fix indentation

75d0c08

test metrics by pattern matching

386a7e5

mgaido91 reviewed Dec 29, 2018

View reviewed changes

use testSparkPlanMetricsWithPredicates and checkPattern together

c496c54

mgaido91 reviewed Dec 30, 2018

View reviewed changes

srowen reviewed Dec 30, 2018

View reviewed changes

fix according to comments

3ce0e03

srowen reviewed Dec 30, 2018

View reviewed changes

mgaido91 reviewed Dec 30, 2018

View reviewed changes

update according to comments

5e94a3e

srowen approved these changes Dec 31, 2018

View reviewed changes

srowen closed this in 0996b7c Dec 31, 2018

	def createSizeMetric(sc: SparkContext, name: String): SQLMetric = {
	// The final result of this metric in physical operator UI may look like:
	// data size total (min, med, max):
	// 100GB (100MB, 1GB, 10GB)
	val acc = new SQLMetric(SIZE_METRIC, -1)
	acc.register(sc, name = Some(s"$name total (min, med, max)"), countFailedValues = false)
	acc
	}

	def createTimingMetric(sc: SparkContext, name: String): SQLMetric = {
	// The final result of this metric in physical operator UI may looks like:
	// duration(min, med, max):
	// 5s (800ms, 1s, 2s)
	val acc = new SQLMetric(TIMING_METRIC, -1)
	acc.register(sc, name = Some(s"$name total (min, med, max)"), countFailedValues = false)
	acc
	}


		protected def statusStore: SQLAppStatusStore = spark.sharedState.statusStore

		protected val bytes = "([0-9]+(\\.[0-9]+)?) (EiB\|PiB\|TiB\|GiB\|MiB\|KiB\|B)"


		protected val bytes = "([0-9]+(\\.[0-9]+)?) (EiB\|PiB\|TiB\|GiB\|MiB\|KiB\|B)"

		protected val duration = "([0-9]+(\\.[0-9]+)?) (ms\|s\|m\|h)"


		protected val duration = "([0-9]+(\\.[0-9]+)?) (ms\|s\|m\|h)"

		// "\n96.2 MiB (32.1 MiB, 32.1 MiB, 32.1 MiB)"

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing #23258

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing #23258

Uh oh!

Conversation

seancxmao commented Dec 8, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented Dec 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 8, 2018

Uh oh!

SparkQA commented Dec 8, 2018

Uh oh!

seancxmao commented Dec 8, 2018

Uh oh!

SparkQA commented Dec 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

seancxmao commented Dec 25, 2018

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

seancxmao commented Dec 25, 2018

Uh oh!

SparkQA commented Dec 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seancxmao Dec 28, 2018 •

edited

Loading