Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 12, 2024

What changes were proposed in this pull request?

This pull request optimizes the Hex.hex(num: Long) method by removing leading zeros, thus eliminating the need to copy the array to remove them afterward.

Why are the changes needed?

  • Unit tests added
  • Did a benchmark locally (30~50% speedup)
Hex Long Tests:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Legacy                                             1062           1094          16          9.4         106.2       1.0X
New                                                 739            807          26         13.5          73.9       1.4X
object HexBenchmark extends BenchmarkBase {
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val N = 10_000_000
    runBenchmark("Hex") {
      val benchmark = new Benchmark("Hex Long Tests", N, 10, output = output)
      val range = 1 to 12
      benchmark.addCase("Legacy") { _ =>
        (1 to N).foreach(x => range.foreach(y => hexLegacy(x - y)))
      }

      benchmark.addCase("New") { _ =>
        (1 to N).foreach(x => range.foreach(y => Hex.hex(x - y)))
      }
      benchmark.run()
    }
  }

  def hexLegacy(num: Long): UTF8String = {
    // Extract the hex digits of num into value[] from right to left
    val value = new Array[Byte](16)
    var numBuf = num
    var len = 0
    do {
      len += 1
      // Hex.hexDigits need to be seen here
      value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt)
      numBuf >>>= 4
    } while (numBuf != 0)
    UTF8String.fromBytes(java.util.Arrays.copyOfRange(value, value.length - len, value.length))
  }
}

Does this PR introduce any user-facing change?

no

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Jun 12, 2024
val value = new Array[Byte](16)
val zeros = jl.Long.numberOfLeadingZeros(num)
if (zeros == jl.Long.SIZE) return ZERO_UTF8
val len = (jl.Long.SIZE - zeros + 3) / 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments to explain the arithmetic expression ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the necessity for commenting on such a pretty common expression

object Hex {
val hexDigits = Array[Char](
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
).map(_.toByte)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

do {
len += 1
value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt)
val value = new Array[Byte](len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value -> bytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes might be good but the naming is currently consistent with another variant.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @dongjoon-hyun @LuciferYang thanks

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@yaooqinn yaooqinn closed this in b5e1b79 Jun 12, 2024
@yaooqinn yaooqinn deleted the SPARK-48596 branch June 12, 2024 12:23
@yaooqinn
Copy link
Member Author

Merged to master. Thank you @LuciferYang and also for the offline suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants