Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented May 23, 2018

What changes were proposed in this pull request?

Current code block manipulation API is immature and hacky. We need a formal API to manipulate code blocks.

The basic idea is making JavaCode as TreeNode. So we can use familiar transform API to manipulate code blocks and expressions in code blocks.

For example, we can replace SimpleExprValue in a code block like this:

code.transformExprValues {
  case SimpleExprValue("1 + 1", _) => aliasedParam
}

The example use case is splitting code to methods.

For example, we have an ExprCode containing generated code. But it is too long and we need to split it as method. Because statement-based expressions can't be directly passed into. We need to transform them as variables first:

def getExprValues(block: Block): Set[ExprValue] = block match {
  case c: CodeBlock =>
    c.blockInputs.collect {
      case e: ExprValue => e
    }.toSet
  case _ => Set.empty
}

def currentCodegenInputs(ctx: CodegenContext): Set[ExprValue] = {
  // Collects current variables in ctx.currentVars and ctx.INPUT_ROW.
  // It looks roughly like...
  ctx.currentVars.flatMap { v =>
    getExprValues(v.code) ++ Set(v.value, v.isNull)
  }.toSet + ctx.INPUT_ROW
}

// A code block of an expression contains too long code, making it as method
if (eval.code.length > 1024) {
  val setIsNull = if (!eval.isNull.isInstanceOf[LiteralValue]) {
    ...
  } else {
    ""
  }

  // Pick up variables and statements necessary to pass in.
  val currentVars = currentCodegenInputs(ctx)
  val varsPassIn = getExprValues(eval.code).intersect(currentVars)
  val aliasedExprs = HashMap.empty[SimpleExprValue, VariableValue]

  // Replace statement-based expressions which can't be directly passed in the method.
  val newCode = eval.code.transform {
    case block =>
      block.transformExprValues {
        case s: SimpleExprValue(_, javaType) if varsPassIn.contains(s) =>
          if (aliasedExprs.contains(s)) {
            aliasedExprs(s)
          } else {
            val aliasedVariable = JavaCode.variable(ctx.freshName("aliasedVar"), javaType)
            aliasedExprs += s -> aliasedVariable
            varsPassIn += aliasedVariable
            aliasedVariable
          }
      }
  }

  val params = varsPassIn.filter(!_.isInstanceOf[SimpleExprValue])).map { variable =>
    s"${variable.javaType.getName} ${variable.variableName}"
  }.mkString(", ")

  val funcName = ctx.freshName("nodeName")
  val javaType = CodeGenerator.javaType(dataType)
  val newValue = JavaCode.variable(ctx.freshName("value"), dataType)
  val funcFullName = ctx.addNewFunction(funcName,
    s"""
      |private $javaType $funcName($params) {
      |  $newCode
      |  $setIsNull
      |  return ${eval.value};
      |}
    """.stripMargin))

  eval.value = newValue
  val args = varsPassIn.filter(!_.isInstanceOf[SimpleExprValue])).map { variable =>
    s"${variable.variableName}"
  }

  // Create a code block to assign statements to aliased variables.
  val createVariables = aliasedExprs.foldLeft(EmptyBlock) { (block, (statement, variable)) =>
    block + code"${statement.javaType.getName} $variable = $statement;"
  }
  eval.code = createVariables + code"$javaType $newValue = $funcFullName($args);"
}

How was this patch tested?

Added unite tests.

@viirya
Copy link
Member Author

viirya commented May 23, 2018

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91014 has finished for PR 21405 at commit 2d77332.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 23, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91021 has finished for PR 21405 at commit 2d77332.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 23, 2018

retest this please.


// We want to replace all occurrences of `expr` with the variable `aliasedParam`.
val aliasedCode = code.transformExprValues {
case SimpleExprValue("1 + 1", _) => aliasedParam
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know the current code works correctly. How about replacing _ with CodeGenerator.javaClass(IntegerType)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@mgaido91
Copy link
Contributor

mgaido91 commented May 23, 2018

@viirya sorry do you already have some use-cases, goals in mind for this API? May you please explain me them? Thanks.

@viirya
Copy link
Member Author

viirya commented May 23, 2018

@mgaido91 Yes, as @rednaxelafx said in previous RP, #19813 will be the first use case. I'd like to use this API to transform statement-based expressions in code blocks.

For example, code contains Java code of an expression. We find it is too long and we need to split it out as function. Because statement-based expressions can't be directly passed into. We need to transform them as variables first. I think the usage roughly would looks like:

val collectedStatements = HashMap[ExprValue, ExprValue]
val transformed = code.transformExprValues {
  case s @ SimpleExprValue(_, javaType) =>
    if (collectedStatements.contains(s)) {
      collectedStatements(s)
    } else {
      val aliasedVariable = JavaCode.variable(ctx.freshName("var"), javaType)
      collectedStatements += s -> aliasedVariable
      aliasedVariable
    }
}
val createVariables = collectedStatements.foldLeft(EmptyBlock) { (block, (statement, variable)) =>
  block + code"${statement.javaType.getName} $variable = $statement;"
}
ev.copy(code = createVariables + transformed)

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91031 has finished for PR 21405 at commit 2d77332.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

can we give a concrete use case in the PR description? I think splitting code into methods is a good one, but we need to make it completed, e.g. how the method is generated.

@viirya
Copy link
Member Author

viirya commented May 24, 2018

@cloud-fan Thanks. I give a use case of splitting code into method in the PR description. I think it can show the basic idea and it is what I hope to make.

* Apply a map function to each java expression codes present in this java code, and return a new
* java code based on the mapped java expression codes.
*/
def transformExprValues(f: PartialFunction[ExprValue, ExprValue]): this.type = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I wrong or we are not updating the exprValues here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformExprValues will create a new instance Block because a block is immutable. So once there are change, new block should take new exprValues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can we add some tests for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will add them in next commit. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for it.

@viirya
Copy link
Member Author

viirya commented May 25, 2018

@cloud-fan I found that the way to collect method parameters in the use case can be simplified. Updated in the PR description. Please take a look when you have time. Thanks.

@viirya
Copy link
Member Author

viirya commented May 29, 2018

@cloud-fan @hvanhovell Do you have any comment/suggestion on this change? Thanks.

@SparkQA
Copy link

SparkQA commented May 29, 2018

Test build #91251 has finished for PR 21405 at commit e30be7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 4, 2018

ping @cloud-fan @hvanhovell @kiszk

* Trait representing an opaque fragments of java code.
*/
trait JavaCode {
trait JavaCode extends TreeNode[JavaCode] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we only make Block extends TreeNode?

Copy link
Member Author

@viirya viirya Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Currently ExprValue doesn't have to be TreeNode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update in next commit.

override def hashCode(): Int = value.hashCode() * 31 + javaType.hashCode()
}

case class LiteralExpr(override val value: String, override val javaType: Class[_])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this change?


// The expressions to be evaluated inside this block.
// All expressions to be evaluated inside this block and underlying blocks.
def exprValues: Set[ExprValue]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it, as discussed in other PR previously.


test ("transform expr in nested blocks") {
val expr = JavaCode.expression("1 + 1", IntegerType)
val isNull = JavaCode.isNullVariable("expr1_isNull")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, shall we remove isNullVariable and always use expression(..., BooleanType)? Or we can add a bunch of intExpression(...), booleanExpression(...) etc.

Copy link
Member Author

@viirya viirya Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine and maybe more clear. I was thinking the name of isNullVariable is a bit confusing at the first look actually.


val block = code"${subBlocks(0)}\n${subBlocks(1)}\n${subBlocks(2)}"
val transformedBlock = block.transform {
case b: Block => b.transformExprValues {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I'd image transformExprValues will also transform child blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the this should be

block.transformExprValues {
  case SimpleExprValue ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I may call it transformAllExprValues.

@SparkQA
Copy link

SparkQA commented Jul 5, 2018

Test build #92637 has finished for PR 21405 at commit ce8d9ed.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait JavaCode
  • trait Block extends TreeNode[Block] with JavaCode

@cloud-fan
Copy link
Contributor

LGTM, can you update the demo in your PR description? thanks!

@SparkQA
Copy link

SparkQA commented Jul 5, 2018

Test build #92640 has finished for PR 21405 at commit 707dd6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2018

Test build #92642 has finished for PR 21405 at commit c9dee44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class LiteralValue(val value: String, val javaType: Class[_]) extends ExprValue with Serializable

@viirya
Copy link
Member Author

viirya commented Jul 5, 2018

@cloud-fan Updated. Do you think I should address the comment #21405 (comment) to let transformExprValues also transforms child blocks?

@cloud-fan
Copy link
Contributor

we can do it later, when we do need transformAllExprValues

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Looking forward to seeing how we can solve the method splitting problem with this infra :)

@asfgit asfgit closed this in 32cfd3e Jul 5, 2018
@viirya viirya deleted the codeblock-api branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants