Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 30, 2018

What changes were proposed in this pull request?

This patch tries to implement this proposal to add an API for handling expression code generation. It should allow us to manipulate how to generate codes for expressions.

In details, this adds an new abstraction CodeBlock to JavaCode. CodeBlock holds the code snippet and inputs for generating actual java code.

For example, in following java code:

  int ${variable} = 1;
  boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)};

variable, isNull are two VariableValue and CodeGenerator.defaultValue(BooleanType) is a string. They are all inputs to this code block and held by CodeBlock representing this code.

For codegen, we provide a specified string interpolator code, so you can define a code like this:

  val codeBlock =
    code"""
         |int ${variable} = 1;
         |boolean ${isNull} = ${CodeGenerator.defaultValue(BooleanType)};
        """.stripMargin
  // Generates actual java code.
  codeBlock.toString

Because those inputs are held separately in CodeBlock before generating code, we can safely manipulate them, e.g., replacing statements to aliased variables, etc..

How was this patch tested?

Added tests.

@viirya
Copy link
Member Author

viirya commented Apr 30, 2018

@hvanhovell
Copy link
Contributor

@viirya I just glanced over this. We currently use a lot of mutable code for in code generation with - TBH - way too complex side effects. Your proposal adds another layer of side effects.

Why don't we add a bit more structure to the JavaCode abstraction, and figure out that we need splitting from that? It will probably be more involved than this because you probably cannot fix the names upfront.

@viirya
Copy link
Member Author

viirya commented May 1, 2018

retest this please.

@viirya viirya changed the title [SPARK-24121][SQL] Add API for handling expression code generation [SPARK-24121][SQL][WIP] Add API for handling expression code generation May 1, 2018
@viirya
Copy link
Member Author

viirya commented May 1, 2018

@hvanhovell Thanks for your comment!

I tried to add new JavaCode abstraction Block which holds the code to generate. I hope this approach can be less side effect. Please comment if you have time to take a look. I'll be in vacation in next few days, but I will try to follow your comment if any. Thanks!

@viirya
Copy link
Member Author

viirya commented May 1, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 1, 2018

Test build #90004 has finished for PR 21193 at commit 369242a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

@SparkQA
Copy link

SparkQA commented May 1, 2018

Test build #90009 has finished for PR 21193 at commit 8e2db8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90059 has finished for PR 21193 at commit 5fe425c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

@viirya
Copy link
Member Author

viirya commented May 2, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90083 has finished for PR 21193 at commit 5fe425c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

@maropu
Copy link
Member

maropu commented May 3, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90092 has finished for PR 21193 at commit 5fe425c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

@viirya
Copy link
Member Author

viirya commented May 3, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90104 has finished for PR 21193 at commit 5fe425c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExprCode(var code: Block, var isNull: ExprValue, var value: ExprValue)
  • trait Block extends JavaCode
  • implicit class BlockHelper(val sc: StringContext) extends AnyVal
  • case class CodeBlock(codeParts: Seq[String], exprValues: Seq[Any]) extends Block
  • case class Blocks(blocks: Seq[Block]) extends Block
  • trait ExprValue extends JavaCode

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @viirya. I do like this approach.


object ExprValue {
implicit def exprValueToString(exprValue: ExprValue): String = exprValue.toString
implicit def exprValueToString(exprValue: ExprValue): String = exprValue.code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use toString?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because toString will call exprValueToString. We should have only one place to convert an ExprValue to string.

EmptyBlock
} else {
args.foreach {
case _: ExprValue => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why true? Can't we just do nothing after the =>?

implicit def blocksToBlock(blocks: Seq[Block]): Block = Blocks(blocks)

implicit class BlockHelper(val sc: StringContext) extends AnyVal {
def code(args: Any*): Block = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chat about adding checkLengths(args)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

val expressions = exprValues.iterator
var buf = new StringBuffer(strings.next)
while (strings.hasNext) {
if (expressions.hasNext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed (especially if we add the checkLengths), because if it is not true we are in a bad situation

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90120 has finished for PR 21193 at commit 00bef6b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) extends Block

@viirya
Copy link
Member Author

viirya commented May 3, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90135 has finished for PR 21193 at commit 00bef6b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[Any]) extends Block

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90139 has finished for PR 21193 at commit 162deb2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-24121][SQL][WIP] Add API for handling expression code generation [SPARK-24121][SQL] Add API for handling expression code generation May 4, 2018
@viirya
Copy link
Member Author

viirya commented May 4, 2018

I think this can be reviewed for now. Thanks. cc @cloud-fan @hvanhovell @kiszk @maropu

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor comments

// as a function before. In that case, we just re-use it.
ExprCode(ctx.registerComment(this.toString), subExprState.isNull, subExprState.value)
ExprCode(ctx.registerComment(this.toString), subExprState.isNull,
subExprState.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

buf append inputs.next
buf append StringContext.treatEscapes(strings.next)
}
buf.toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an assert that assert(!inputs.hasNext)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not needed because we have checkLengths at line 159. This is the same approach which is used by the StringContext in the scala library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

ev.copy(code = s"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
s"$className.getInputFilePath();", isNull = FalseLiteral)
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
code"$className.getInputFilePath();", isNull = FalseLiteral)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Blocks.code is defined as override def code: String = blocks.map(_.toString).mkString("\n"), is it OK here?

maybe we can do

val typeDef = s"final ${CodeGenerator.javaType(dataType)}"
ev.code(code = code"$typeDef ${ev.value} = $className.getInputFilePath();", ...)

val className = InputFileBlockHolder.getClass.getName.stripSuffix("$")
ev.copy(code = s"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
s"$className.getStartOffset();", isNull = FalseLiteral)
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val className = InputFileBlockHolder.getClass.getName.stripSuffix("$")
ev.copy(code = s"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
s"$className.getLength();", isNull = FalseLiteral)
ev.copy(code = code"final ${CodeGenerator.javaType(dataType)} ${ev.value} = " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

test("Throws exception when interpolating unexcepted object in code block") {
val obj = TestClass(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can simply use a Tuple2 here, like 1 -> 1

case _: SimpleExprValue => aliasedParam
case other => other
}
val aliasedCode = CodeBlock(code.asInstanceOf[CodeBlock].codeParts, aliasedInputs).stripMargin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the follow up, we may need a better API to do this. The current way is a little hacky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @cloud-fan 's comment above. The current demonstrated way of manipulating code is less than ideal. But at least this PR does provide a good carrier for us to build the future code manipulation API so it's good enough for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agreed. In the follow up, we definitely should have a better API for manipulating code. I think here it is used to show what we can do for CodeBlock right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just think about this and have initial idea. I'd prototype it locally and make follow up for your review after this.

val str = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]"
val code = s"${ctx.registerComment(str)}\n" + (if (nullable) {
s"""
val code = code"${ctx.registerComment(str)}\n" + (if (nullable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the \n is not needed now, since we always add \n between code blocks.


case class Blocks(blocks: Seq[Block]) extends Block {
override lazy val exprValues: Set[ExprValue] = blocks.flatMap(_.exprValues).toSet
override def code: String = blocks.map(_.toString).mkString("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also make it lazy val?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the goals of this is letting us changing the string representation of an expression, so making it a lazy val may be fine for now, but would impede us from doing that. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on how we would implement code rewriting later. If we wanna keep JavaCode immutable, then lazy val is OK here, because once we rewrite the code, we will have a new Blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. Ideally I think JavaCode should be immutable. Once we change code in it, we should have a new object instead of chancing current one.

protected def evaluateVariables(variables: Seq[ExprCode]): String = {
val evaluate = variables.filter(_.code != "").map(_.code.trim).mkString("\n")
variables.foreach(_.code = "")
val evaluate = variables.filter(_.code.toString != "").map(_.code.toString).mkString("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: _.code.toString.nonEmpty

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love having more structure in the codegen framework. Thanks for tackling this, @viirya !

Left some nitpick comments inline. Overall I'd say this does a good job in laying down the foundations, but there's still a lot to be done for building up a nice API for the actual code manipulation.

@viirya do you have concrete follow-up ideas / tasks that would use this API? If so, it'd be nice to see how you're planning to use the API. I imagine the original intent of #19813 would be the main use case?

case other => throw new IllegalArgumentException(
s"Can not interpolate ${other.getClass.getName} into code block.")
}
CodeBlock(sc.parts, args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: does it make sense to eagerly fold the literal args into the parts, instead of carrying them over into CodeBlock? That's similar to what you're already doing with CodeBlock.exprValues, but just done a bit more eagerly.

Basically, the only things that are safe to manipulate are the structured ones like ExprValue and Block; none of the others are safe to change because we don't know what's in there. So no need to even carry them as separate argument into CodeBlock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea. We should not change literal args. Keeping them in block inputs seems meaningless.

}

test("replace expr values in code block") {
val statement = JavaCode.expression("1 + 1", IntegerType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we rename this statement variable to something like expr or someExpr or something? It's an expression not a statement.

case _: SimpleExprValue => aliasedParam
case other => other
}
val aliasedCode = CodeBlock(code.asInstanceOf[CodeBlock].codeParts, aliasedInputs).stripMargin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @cloud-fan 's comment above. The current demonstrated way of manipulating code is less than ideal. But at least this PR does provide a good carrier for us to build the future code manipulation API so it's good enough for now.

@viirya
Copy link
Member Author

viirya commented May 18, 2018

@rednaxelafx Thanks for your comments! Yes, #19813 is should be the first use case. However, I'd like to first have follow up for some points we discussed so far, e.g., preventing string interpolation, manipulating code API, etc.. So this API can be more easily to be used.

@SparkQA
Copy link

SparkQA commented May 19, 2018

Test build #90833 has finished for PR 21193 at commit 2ca9741.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 19, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 19, 2018

Test build #90835 has finished for PR 21193 at commit 2ca9741.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 20, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 20, 2018

Test build #90858 has finished for PR 21193 at commit 2ca9741.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 21, 2018

@cloud-fan @rednaxelafx Your last comments are addressed. Please check if you have other comments. Thanks for review.

// Folds eagerly the literal args into the code parts.
private def foldLiteralArgs(parts: Seq[String], args: Seq[Any]): (Seq[String], Seq[Any]) = {
val codeParts = ArrayBuffer.empty[String]
val blockInputs = ArrayBuffer.empty[Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make the type JavaCode instead of Any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. JavaCode is better.

val inputs = args.iterator
val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH)

buf append strings.next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use java style here? buf.append(strings.next).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments! LGTM now, no more comments from my side.
Just anxiously waiting to play with it more and also the polished code manipulation API ;-)

assert(code.toString == "boolean expr1_isNull = false;")
}

test("Literals are folded into string code parts instead of block inputs") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like it!

val buf = new StringBuilder(Block.CODE_BLOCK_BUFFER_LENGTH)

buf append strings.next
while (strings.hasNext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90931 has finished for PR 21193 at commit 96c594a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[JavaCode]) extends Block

@viirya
Copy link
Member Author

viirya commented May 22, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90941 has finished for PR 21193 at commit 96c594a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CodeBlock(codeParts: Seq[String], blockInputs: Seq[JavaCode]) extends Block

@cloud-fan
Copy link
Contributor

LGTM, can we fix the conflict?

@viirya
Copy link
Member Author

viirya commented May 22, 2018

@cloud-fan Thanks. Fixed.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #90964 has finished for PR 21193 at commit 00cc564.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class TaskKilled(
  • trait HasValidationIndicatorCol extends Params
  • trait DivModLike extends BinaryArithmetic
  • case class Divide(left: Expression, right: Expression) extends DivModLike
  • case class Remainder(left: Expression, right: Expression) extends DivModLike
  • case class MapEntries(child: Expression) extends UnaryExpression with ExpectsInputTypes
  • class ReadOnlySQLConf(context: TaskContext) extends SQLConf
  • class TaskContextConfigProvider(context: TaskContext) extends ConfigProvider
  • case class ContinuousShuffleReadPartition(index: Int, queueSize: Int) extends Partition
  • class ContinuousShuffleReadRDD(
  • trait ContinuousShuffleReader

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in f9f055a May 22, 2018
@viirya
Copy link
Member Author

viirya commented May 22, 2018

Thanks for your review @hvanhovell @cloud-fan @kiszk @maropu @mgaido91 @ueshin @rednaxelafx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants