Skip to content

Commit 206030b

Browse files
JasonMWhitecloud-fan
authored andcommitted
[SPARK-19561][SQL] add int case handling for TimestampType
## What changes were proposed in this pull request? Add handling of input of type `Int` for dataType `TimestampType` to `EvaluatePython.scala`. Py4J serializes ints smaller than MIN_INT or larger than MAX_INT to Long, which are handled correctly already, but values between MIN_INT and MAX_INT are serialized to Int. These range limits correspond to roughly half an hour on either side of the epoch. As a result, PySpark doesn't allow TimestampType values to be created in this range. Alternatives attempted: patching the `TimestampType.toInternal` function to cast return values to `long`, so Py4J would always serialize them to Scala Long. Python3 does not have a `long` type, so this approach failed on Python3. ## How was this patch tested? Added a new PySpark-side test that fails without the change. The contribution is my original work and I license the work to the project under the project’s open source license. Resubmission of #16896. The original PR didn't go through Jenkins and broke the build. davies dongjoon-hyun cloud-fan Could you kick off a Jenkins run for me? It passed everything for me locally, but it's possible something has changed in the last few weeks. Author: Jason White <[email protected]> Closes #17200 from JasonMWhite/SPARK-19561.
1 parent 274973d commit 206030b

File tree

2 files changed

+10
-0
lines changed

2 files changed

+10
-0
lines changed

python/pyspark/sql/tests.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1555,6 +1555,14 @@ def test_time_with_timezone(self):
15551555
self.assertEqual(now, now1)
15561556
self.assertEqual(now, utcnow1)
15571557

1558+
# regression test for SPARK-19561
1559+
def test_datetime_at_epoch(self):
1560+
epoch = datetime.datetime.fromtimestamp(0)
1561+
df = self.spark.createDataFrame([Row(date=epoch)])
1562+
first = df.select('date', lit(epoch).alias('lit_date')).first()
1563+
self.assertEqual(first['date'], epoch)
1564+
self.assertEqual(first['lit_date'], epoch)
1565+
15581566
def test_decimal(self):
15591567
from decimal import Decimal
15601568
schema = StructType([StructField("decimal", DecimalType(10, 5))])

sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ object EvaluatePython {
112112
case (c: Int, DateType) => c
113113

114114
case (c: Long, TimestampType) => c
115+
// Py4J serializes values between MIN_INT and MAX_INT as Ints, not Longs
116+
case (c: Int, TimestampType) => c.toLong
115117

116118
case (c, StringType) => UTF8String.fromString(c.toString)
117119

0 commit comments

Comments
 (0)