Skip to content

Conversation

@cloud-fan
Copy link
Contributor

It's a follow-up of #5154, we can speed up scala udf evaluation by create type converter in advance.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@cloud-fan
Copy link
Contributor Author

Use the same benchmark:

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.types._

case class Floor(child: Expression) extends UnaryExpression with Predicate {
  override def foldable = child.foldable
  def nullable = child.nullable
  override def toString = s"Floor $child"

  override def eval(input: Row): Any = {
    child.eval(input) match {
      case null => null
      case ts: Int => ts - ts % 300
    }
  }
}

object T {
  def benchmark(count: Int, expr: Expression): Unit = {
    var i = 0
    val row = new GenericRow(Array[Any](123, 21, 42))
    val s = System.currentTimeMillis()
    while (i < count) {
      expr.eval(row)
      i += 1
    }
    val e = System.currentTimeMillis()

    println (s"${expr.getClass.getSimpleName}  -- ${e - s} ms")
  }
  def main(args: Array[String]) {
    def func(ts: Int) = ts - ts % 300
    val udf0 = ScalaUdf(func _, IntegerType, BoundReference(0, IntegerType, true) :: Nil)
    val udf1 = Floor(BoundReference(0, IntegerType, true))

    benchmark(1000000, udf0)
    benchmark(1000000, udf0)
    benchmark(1000000, udf0)

    benchmark(1000000, udf1)
    benchmark(1000000, udf1)
    benchmark(1000000, udf1)
  }
}

before:
ScalaUdf -- 151 ms
ScalaUdf -- 127 ms
ScalaUdf -- 128 ms
Floor -- 23 ms
Floor -- 4 ms
Floor -- 5 ms

after:
ScalaUdf -- 28 ms
ScalaUdf -- 12 ms
ScalaUdf -- 8 ms
Floor -- 22 ms
Floor -- 4 ms
Floor -- 4 ms

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32813 has started for PR 6182 at commit 241cfe9.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32813 has finished for PR 6182 at commit 241cfe9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32813/
Test PASSed.

@JoshRosen
Copy link
Contributor

LGTM; seems like a pretty straightforward optimization. I'm a bit new to Spark SQL, though, so I'll wait for committer with SQL experience to do the final sign-off on this.

@yhuai
Copy link
Contributor

yhuai commented May 17, 2015

LGTM

@yhuai
Copy link
Contributor

yhuai commented May 17, 2015

I am merging it to master and branch 1.4.

asfgit pushed a commit that referenced this pull request May 17, 2015
It's a follow-up of #5154, we can speed up scala udf evaluation by create type converter in advance.

Author: Wenchen Fan <[email protected]>

Closes #6182 from cloud-fan/tmp and squashes the following commits:

241cfe9 [Wenchen Fan] use converter in ScalaUdf

(cherry picked from commit 2f22424)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in 2f22424 May 17, 2015
@cloud-fan cloud-fan deleted the tmp branch May 18, 2015 02:05
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
It's a follow-up of apache#5154, we can speed up scala udf evaluation by create type converter in advance.

Author: Wenchen Fan <[email protected]>

Closes apache#6182 from cloud-fan/tmp and squashes the following commits:

241cfe9 [Wenchen Fan] use converter in ScalaUdf
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
It's a follow-up of apache#5154, we can speed up scala udf evaluation by create type converter in advance.

Author: Wenchen Fan <[email protected]>

Closes apache#6182 from cloud-fan/tmp and squashes the following commits:

241cfe9 [Wenchen Fan] use converter in ScalaUdf
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
It's a follow-up of apache#5154, we can speed up scala udf evaluation by create type converter in advance.

Author: Wenchen Fan <[email protected]>

Closes apache#6182 from cloud-fan/tmp and squashes the following commits:

241cfe9 [Wenchen Fan] use converter in ScalaUdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants