-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10724] [SQL] SQL's floor() returns DOUBLE #8893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider parameterizing UnaryMathExpression with the return type as opposed to using inheritance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Tried my best (I'm new to scala) but can be different with what you expected.
|
Test build #42949 has finished for PR 8893 at commit
|
|
Test build #42965 has finished for PR 8893 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no Scala expert either but I think what you want is something along the lines of:
// Add to imports
import scala.reflect.runtime.universe.TypeTag
abstract class UnaryMathExpression[T <: NumericType](
f: Double => Any, name: String)(implicit ttag: TypeTag[T], longttag: TypeTag[LongType])
extends UnaryExpression with Serializable with ImplicitCastInputTypes {
override def inputTypes: Seq[DataType] = Seq(DoubleType)
override def dataType: NumericType = if (ttag == longttag) LongType else DoubleType
override def nullable: Boolean = true
override def toString: String = s"$name($child)"
protected override def nullSafeEval(input: Any): Any = {
f(input.asInstanceOf[Double])
}
// name of function in java.lang.Math
def funcName: String = name.toLowerCase
override final def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
defineCodeGen(ctx, ev, codeBuilder(ctx, ev))
}
protected def codeBuilder(ctx: CodeGenContext, ev: GeneratedExpressionCode): (String) => String =
c => s"(${dataType.typeName})java.lang.Math.$funcName($c)"
}Notes:
- Eliminates the need for creating a subclass just for the purpose of
LongTypereturn specialization. - Uses tighter constraint:
NumericTypeinstead ofDataType, which makes sense intuitively for math functions. - Sets up type tags to handle the universe of cases:
LongTypevs.DoubleTypereturns. Since we no longer subclass, that's an OK approach. It is better to use automatic companion object discovery but that would require modifying all data type case objects, which may not be a bad idea but belongs in a separate PR. See this for more information. - Uses
dataType.typeNameinstead of hard-coding Java casts.
This would require changing all references of UnaryMathExpression to UnaryMathExpression[DoubleType] and then using UnaryMathExpression[LongType] for ceil and floor.
Hope this helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's cool! I've leaned a lot.
|
Test build #43003 has finished for PR 8893 at commit
|
|
It looks like @chenghao-intel is farther along towards a simpler fix in https://github.com/apache/spark/pull/8933/files Do you mind closing this PR? |
This is a change in behavior from 1.4.1 where {{floor}} returns a BIGINT.
{code}
scala> sql("select floor(1)").printSchema
root
|-- _c0: double (nullable = true)
{code}
In the [Hive Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] {{floor}} is defined to return BIGINT.
This is a significant issue because it changes the DataFrame schema.
I wonder what caused this and whether other SQL functions are affected.