-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23951][SQL] Use actual java class instead of string representation. #21026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #89128 has finished for PR 21026 at commit
|
| } | ||
|
|
||
| if (input.isNull == "false") { | ||
| if (input.isNull == FalseLiteral) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes an existing bug :)
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
|
Test build #89129 has finished for PR 21026 at commit
|
|
Test build #89134 has finished for PR 21026 at commit
|
|
Test build #89136 has finished for PR 21026 at commit
|
|
Test build #89143 has finished for PR 21026 at commit
|
| case _ => s"$value = $initCode;" | ||
| } | ||
| ExprCode(code, FalseLiteral, GlobalValue(value, javaType(dataType))) | ||
| ExprCode.forNonNullValue(JavaCode.global(value, dataType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I left some comments and let's see if we should change them.
| // `rowWriter` is declared as a class field, so we can access it directly in methods. | ||
| ExprCode(code, FalseLiteral, StatementValue(s"$rowWriter.getRow()", "UnsafeRow", | ||
| canDirectAccess = true)) | ||
| ExprCode(code, FalseLiteral, SimpleExprValue(s"$rowWriter.getRow()", classOf[UnsafeRow])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a JavaCode.expression(expr: String, javaClass: Class[_]) and use it here?
| """ | ||
|
|
||
| ExprCode(code, FalseLiteral, VariableValue(output, "MapData")) | ||
| ExprCode(code, FalseLiteral, VariableValue(output, classOf[MapData])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JavaCode.variable(name, javaClass) here?
| """ | ||
|
|
||
| ExprCode(code, FalseLiteral, VariableValue(output, "ArrayData")) | ||
| ExprCode(code, FalseLiteral, VariableValue(output, classOf[ArrayData])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JavaCode.variable(name, javaClass) here?
| """.stripMargin | ||
|
|
||
| ExprCode(code, FalseLiteral, VariableValue(output, "InternalRow")) | ||
| ExprCode(code, FalseLiteral, VariableValue(output, classOf[InternalRow])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JavaCode.variable(name, javaClass) here?
| */ | ||
| trait ExprValue extends JavaCode { | ||
| def javaType: Class[_] | ||
| def isPrimitive: Boolean = javaType.isPrimitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR got rid of the canDirectAccess property from @viirya 's base change. Should we add it back?
(and implement that property in concrete classes with an override def instead of an override val)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya where are you planning to use this? I removed it because it did not seem to have any benefit (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a few lines in this file, easy to add it back if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell it was introduced for the whole stage codegen when it is needed to know whether a variable can be directly accessed in the scope of a new method or if it should be passed as an argument. I think we should keep it/replace with a def
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is not being used at the moment right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet,as far as I know @viirya was waiting for this method to be available to go on with his work: #19813 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for replying late. Yeah, my plan is to use canDirectAccess to quickly determine whether an ExprValue can be used a method parameter. I will add it back if I need in my work later.
| val value = eval.isNull match { | ||
| case TrueLiteral => FalseLiteral | ||
| case FalseLiteral => TrueLiteral | ||
| case v => JavaCode.isNullExpression(s"!($v)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the parens here. SimpleExprValue guarantees this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Test build #89157 has finished for PR 21026 at commit
|
|
Test build #89159 has finished for PR 21026 at commit
|
|
Test build #89160 has finished for PR 21026 at commit
|
|
LGTM, merging to master! |
| /** | ||
| * Create a local isNull variable. | ||
| */ | ||
| def isNullVariable(name: String): VariableValue = variable(name, BooleanType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is a booleanVariable, rather than a isNullVariable, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used to create is null variables. That it currently creates a boolean variable is actually not that important. I just want to create a narrow waist for this specific case, so we can swap it out for a more sophisticated implementation later on (something that advertises that it a null check for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I see the goal of this, but I can't understand why it is useful. The isNull variable is a boolean as all the other boolean variables... It is the more frequent case, but I can't see any differentiation between them and other booleans. Do you have in mind a case for which this differentiation can be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal but I'm ok if we change isNullVariable(xxx) to variable(xxx, BooleanType).
| */ | ||
| trait ExprValue extends JavaCode { | ||
| def javaType: Class[_] | ||
| def isPrimitive: Boolean = javaType.isPrimitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell it was introduced for the whole stage codegen when it is needed to know whether a variable can be directly accessed in the scope of a new method or if it should be passed as an argument. I think we should keep it/replace with a def
| /** | ||
| * Create a global isNull variable. | ||
| */ | ||
| def isNullGlobal(name: String): GlobalValue = global(name, BooleanType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as well, I think this should be a booleanGlobal...
| /** | ||
| * Create a isNull expression fragment. | ||
| */ | ||
| def isNullExpression(code: String): SimpleExprValue = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| /** | ||
| * Utility functions for creating [[JavaCode]] fragments. | ||
| */ | ||
| object JavaCode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
despite I really like the introduction of the JavaCode trait, all these methods seems more related to ExprValue (especially the expression one, which represents a SimpleExprValue is very specific to it, and not to JavaCode which can represent also complex expressions instead): shall we move these methods to the ExprValue object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to create a narrow waist for the creation JavaCode/ExprValue objects so we can swap out stuff later on. Feel free to move these methods (and remove the JavaCode trait) if you think that is more appropriate.
| * -1 for other primitive types. | ||
| */ | ||
| def defaultLiteral(dataType: DataType): LiteralValue = { | ||
| new LiteralValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just call literal(CodeGenerator.defaultValue(dataType, typedNull = true), dataType)? In current way, defaultLiteral(BooleanType) won't return a FalseLiteral.
What changes were proposed in this pull request?
This PR slightly refactors the newly added
ExprValueAPI by quite a bit. The following changes are introduced:ExprValuenow uses the actual class instead of the class name as its type. This should give some more flexibility with generating code in the future.StatementValuetoSimpleExprValue. The statement concept is broader then an expression (untyped and it cannot be on the right hand side of an assignment), and this was not really what we were using it for. I have added a top levelJavaCodetrait that can be used in the future to reinstate (no pun intended) a statement a-like code fragment.JavaCodecompanion object to make it slightly less verbose to createJavaCode/ExprValueobjects. This is also what makes the diff quite large.ExprCodeto make it easier to create code-less expressions.How was this patch tested?
Existing tests.