Skip to content

Conversation

@navis
Copy link
Contributor

@navis navis commented Sep 24, 2015

This is a change in behavior from 1.4.1 where {{floor}} returns a BIGINT.

{code}
scala> sql("select floor(1)").printSchema
root
|-- _c0: double (nullable = true)
{code}

In the [Hive Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] {{floor}} is defined to return BIGINT.

This is a significant issue because it changes the DataFrame schema.

I wonder what caused this and whether other SQL functions are affected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider parameterizing UnaryMathExpression with the return type as opposed to using inheritance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Tried my best (I'm new to scala) but can be different with what you expected.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #42949 has finished for PR 8893 at commit d753294.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2015

Test build #42965 has finished for PR 8893 at commit b116def.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryMathExpression[T <: DataType](
    • abstract class UnaryMathExpressionWithBigIntRet(f: Double => Double, name: String)
    • case class Ceil(child: Expression) extends UnaryMathExpressionWithBigIntRet(math.ceil, "CEIL")
    • case class Floor(child: Expression) extends UnaryMathExpressionWithBigIntRet(math.floor, "FLOOR")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no Scala expert either but I think what you want is something along the lines of:

// Add to imports
import scala.reflect.runtime.universe.TypeTag

abstract class UnaryMathExpression[T <: NumericType](
  f: Double => Any, name: String)(implicit ttag: TypeTag[T], longttag: TypeTag[LongType])
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[DataType] = Seq(DoubleType)
  override def dataType: NumericType = if (ttag == longttag) LongType else DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  override final def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    defineCodeGen(ctx, ev, codeBuilder(ctx, ev))
  }

  protected def codeBuilder(ctx: CodeGenContext, ev: GeneratedExpressionCode): (String) => String =
    c => s"(${dataType.typeName})java.lang.Math.$funcName($c)"
}

Notes:

  • Eliminates the need for creating a subclass just for the purpose of LongType return specialization.
  • Uses tighter constraint: NumericType instead of DataType, which makes sense intuitively for math functions.
  • Sets up type tags to handle the universe of cases: LongType vs. DoubleType returns. Since we no longer subclass, that's an OK approach. It is better to use automatic companion object discovery but that would require modifying all data type case objects, which may not be a bad idea but belongs in a separate PR. See this for more information.
  • Uses dataType.typeName instead of hard-coding Java casts.

This would require changing all references of UnaryMathExpression to UnaryMathExpression[DoubleType] and then using UnaryMathExpression[LongType] for ceil and floor.

Hope this helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's cool! I've leaned a lot.

@SparkQA
Copy link

SparkQA commented Sep 25, 2015

Test build #43003 has finished for PR 8893 at commit e06167e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryMathExpression[T <: NumericType](
    • case class Acos(child: Expression) extends UnaryMathExpression[DoubleType](math.acos, "ACOS")
    • case class Asin(child: Expression) extends UnaryMathExpression[DoubleType](math.asin, "ASIN")
    • case class Atan(child: Expression) extends UnaryMathExpression[DoubleType](math.atan, "ATAN")
    • case class Cbrt(child: Expression) extends UnaryMathExpression[DoubleType](math.cbrt, "CBRT")
    • case class Ceil(child: Expression) extends UnaryMathExpression[LongType](math.ceil, "CEIL")
    • case class Cos(child: Expression) extends UnaryMathExpression[DoubleType](math.cos, "COS")
    • case class Cosh(child: Expression) extends UnaryMathExpression[DoubleType](math.cosh, "COSH")
    • case class Exp(child: Expression) extends UnaryMathExpression[DoubleType](math.exp, "EXP")
    • case class Expm1(child: Expression) extends UnaryMathExpression[DoubleType](math.expm1, "EXPM1")
    • case class Floor(child: Expression) extends UnaryMathExpression[LongType](math.floor, "FLOOR")
    • case class Rint(child: Expression) extends UnaryMathExpression[DoubleType](math.rint, "ROUND")
    • case class Signum(child: Expression) extends UnaryMathExpression[DoubleType](math.signum, "SIGNUM")
    • case class Sin(child: Expression) extends UnaryMathExpression[DoubleType](math.sin, "SIN")
    • case class Sinh(child: Expression) extends UnaryMathExpression[DoubleType](math.sinh, "SINH")
    • case class Sqrt(child: Expression) extends UnaryMathExpression[DoubleType](math.sqrt, "SQRT")
    • case class Tan(child: Expression) extends UnaryMathExpression[DoubleType](math.tan, "TAN")
    • case class Tanh(child: Expression) extends UnaryMathExpression[DoubleType](math.tanh, "TANH")
    • case class ToDegrees(child: Expression)
    • case class ToRadians(child: Expression)

@srowen
Copy link
Member

srowen commented Oct 1, 2015

It looks like @chenghao-intel is farther along towards a simpler fix in https://github.com/apache/spark/pull/8933/files Do you mind closing this PR?

@davies
Copy link
Contributor

davies commented Oct 1, 2015

@navis Since #8933 is merged, do you mind to close this one?

@navis navis closed this Oct 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants