[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal #10796

hvanhovell · 2016-01-17T17:17:25Z

The current parser turns a decimal literal, for example 12.1, into a Double. The problem with this approach is that we convert an exact literal into a non-exact Double. The PR changes this behavior, a Decimal literal is now converted into an extact BigDecimal.

The behavior for scientific decimals, for example 12.1e01, is unchanged. This will be converted into a Double.

This PR replaces the BigDecimal literal by a Double literal, because the BigDecimal is the default now. You can use the double literal by appending a 'D' to the value, for instance: 3.141527D

cc @davies @rxin

hvanhovell · 2016-01-17T17:58:11Z

retest this please

davies · 2016-01-17T19:08:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala

Just curious why we need to change this?

This failed on the HiveCompatibilitySuite.udf_if test on the following query: SELECT IF(FALSE, 1, 1.1). It would not cast the trueCase to a decimal. CASE WHEN... ELSE did work and I used similar code for `if``.

davies · 2016-01-17T19:09:00Z

LGTM, just one question.

SparkQA · 2016-01-17T19:34:55Z

Test build #49557 has finished for PR 10796 at commit f5ecab6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-17T23:09:30Z

Test build #49559 has finished for PR 10796 at commit 5c72b58.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-18T00:49:33Z

The build fails on BucketedWriteSuite. This test passes on my local machine, and I am pretty sure I am not touching that part of Hive.

It seems something funny is up with the build. It cannot create the following path on jenkins: file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/warehouse--bea7c96d-5447-4320-aa0f-985f7c807bca/**db2.db/**bucketed_table

The highlighted part shouldn't be there. @cloud-fan / @JoshRosen any idea what is happening?

hvanhovell · 2016-01-18T00:49:40Z

retest this please

cloud-fan · 2016-01-18T01:44:44Z

LGTM, I'm not sure what's going on during the failing test, let's see if it passes this time.

SparkQA · 2016-01-18T02:27:56Z

Test build #49564 has finished for PR 10796 at commit 5c72b58.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-01-18T02:28:43Z

Hmm, looks like it might be a legitimate failure. I don't think that this suite is known to be flaky; https://spark-tests.appspot.com/tests/org.apache.spark.sql.sources.BucketedWriteSuite looks all green to me.

rxin · 2016-01-18T06:42:04Z

I checked out your source code and ran this locally. The tests passed:

22:39:53.378 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Persisting partitioned data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s): 
file:/scratch/rxin/spark/target/tmp/warehouse--56673338-b401-4032-854f-34cf4197194b/bucketed_table
22:40:03.856 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Couldn't find corresponding Hive SerDe for data source provider json. Persisting data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
22:40:12.559 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Persisting partitioned data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s): 
file:/scratch/rxin/spark/target/tmp/warehouse--56673338-b401-4032-854f-34cf4197194b/bucketed_table

Not 100% sure, but I suspect there is a bug with the test or the implementation itself and some changes here exposed that.

rxin · 2016-01-18T19:12:18Z

BTW you can also change those tests to ignore for now and we can fix them later if we can't figure this out at the moment.

hvanhovell · 2016-01-18T21:51:04Z

I have added some debugging code to the bucketed write test. It looks like something is wrong with the constructed file path.

SparkQA · 2016-01-18T23:14:20Z

Test build #49618 has finished for PR 10796 at commit 0ac091a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-19T00:35:49Z

Test build #2404 has finished for PR 10796 at commit 0ac091a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-19T00:45:27Z

Test build #2403 has finished for PR 10796 at commit 0ac091a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-19T08:21:25Z

Test build #49667 has finished for PR 10796 at commit df8e237.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-01-19T09:04:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystQl.scala

Is \\d+(\\.\\d*) correct or \\d+(\\.\\d+) ?
Do we allow 123. parsed as decimal?

We currently do. See this lexer rule:

Number : ((Digit+ (DOT Digit*)?) | (DOT Digit+)) Exponent? ;

viirya · 2016-01-19T09:33:52Z

LGTM

hvanhovell · 2016-01-19T22:15:58Z

Another-Attempt-at-Fixing-This©

The problem is caused in the way paths are derived. When we write the table we use the name of the current Hive database in the path. We don't account for this while reading the bucketed table.

SparkQA · 2016-01-20T00:58:29Z

Test build #49713 has finished for PR 10796 at commit 1b0508f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-20T02:17:31Z

Test build #2414 has finished for PR 10796 at commit 1b0508f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-20T08:54:36Z

@hvanhovell when you have a chance, can you also update CatalystQl to add some comment to Token and explain what it actually does? e.g. what it actually returns in Some[(String, List[ASTNode])].

Thanks.

rxin · 2016-01-20T09:00:32Z

Actually if possible, it would be great to document what each function does in CatalystQl.scala (even for private functions), so somebody else new to that part of the code can pick it up easily.

Feel free to do it as a separate pull request though.

SparkQA · 2016-01-20T09:28:46Z

Test build #49767 has finished for PR 10796 at commit 8bf2a27.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-20T09:33:49Z

I retest this a few more times.

hvanhovell · 2016-01-20T09:33:54Z

retest this please

SparkQA · 2016-01-20T11:29:43Z

Test build #49780 has finished for PR 10796 at commit 8bf2a27.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-20T14:39:41Z

Test build #49784 has finished for PR 10796 at commit 9afd729.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-20T14:48:23Z

retest this please

hvanhovell · 2016-01-20T14:49:34Z

sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala

@cloud-fan could you take a look at this?

This looks reasonable, just curious about why the previous one not working after your PR....

I have no idea. Nothing in the PR seems to touch this - it seems a bit random. The only think I am doing here is to make absolutely sure the paths are the same; the difference in paths is caused by the use of the hive current database in the path name.

SparkQA · 2016-01-20T16:37:27Z

Test build #49785 has finished for PR 10796 at commit 9afd729.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-20T23:12:48Z

Going to merge this. Thanks!

### What changes were proposed in this pull request? Explain that a number like `123.456`, without a postfix but with a decimal point, is a decimal literal. ### Why are the changes needed? In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class. In Spark, it's the other way around. I found this surprising and couldn't find documentation about it. I discovered this after reading SPARK-45786. I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals. ### Does this PR introduce _any_ user-facing change? Yes, it adds a bit of documentation. ### How was this patch tested? No testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45003 from nchammas/decimal-literal. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Provide examples showing what type literals like `123.456` and `123.456E0` have in SQL. ### Why are the changes needed? In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class. In Spark, it's the other way around. I found this surprising and couldn't find documentation about it. I discovered this after reading [SPARK-45786](https://issues.apache.org/jira/browse/SPARK-45786). I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals. There is an additional wrinkle I discovered in #45003. If the fractional literal has an exponent, then it's a double, not a decimal. The existing syntax documents this, but that alone is not user-friendly. The new examples make this clearer. ### Does this PR introduce _any_ user-facing change? Yes, it clarifies the user-facing documentation about fractional numeric literals. ### How was this patch tested? No testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45027 from nchammas/fractional-literal-take2. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Kent Yao <[email protected]>

Change parsed decimal literal datatype from Double to Decimal

f5ecab6

davies reviewed Jan 17, 2016
View reviewed changes

hvanhovell added 5 commits January 17, 2016 21:01

Merge remote-tracking branch 'spark/master' into SPARK-12848

d0166be

Replace Decimal Literal by Double Literal

d73632c

Fix tests expecting double literals.

954f819

Update golden files.

0f5f6e0

Fix R result.

5c72b58

felixcheung mentioned this pull request Jan 18, 2016

[SPARK-10521][SQL] Utilize Docker for test DB2 JDBC Dialect support #9893

Closed

Add debug code to BucketedWriteSuite (revert later on)

0ac091a

Merge remote-tracking branch 'spark/master' into SPARK-12848

df8e237

viirya reviewed Jan 19, 2016
View reviewed changes

Fix naming mismatch.

1b0508f

Add checks no fix yet...

8bf2a27

Revert Debugging code.

9afd729

hvanhovell reviewed Jan 20, 2016
View reviewed changes

asfgit closed this in 1017327 Jan 20, 2016

nchammas mentioned this pull request Feb 2, 2024

[MINOR][DOCS] The default fractional numeric literal in SQL is a decimal #45003

Closed

nchammas mentioned this pull request Feb 5, 2024

[MINOR][DOCS] Clarify docs on default fractional numeric literals in SQL #45027

Closed

[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal #10796

[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal #10796

Uh oh!

Conversation

hvanhovell commented Jan 17, 2016

Uh oh!

hvanhovell commented Jan 17, 2016

Uh oh!

davies Jan 17, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Jan 17, 2016

Choose a reason for hiding this comment

Uh oh!

davies commented Jan 17, 2016

Uh oh!

SparkQA commented Jan 17, 2016

Uh oh!

SparkQA commented Jan 17, 2016

Uh oh!

hvanhovell commented Jan 18, 2016

Uh oh!

hvanhovell commented Jan 18, 2016

Uh oh!

cloud-fan commented Jan 18, 2016

Uh oh!

SparkQA commented Jan 18, 2016

Uh oh!

JoshRosen commented Jan 18, 2016

Uh oh!

rxin commented Jan 18, 2016

Uh oh!

rxin commented Jan 18, 2016

Uh oh!

hvanhovell commented Jan 18, 2016

Uh oh!

SparkQA commented Jan 18, 2016

Uh oh!

SparkQA commented Jan 19, 2016

Uh oh!

SparkQA commented Jan 19, 2016

Uh oh!

SparkQA commented Jan 19, 2016

Uh oh!

viirya Jan 19, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Jan 19, 2016

Choose a reason for hiding this comment

Uh oh!

viirya commented Jan 19, 2016

Uh oh!

hvanhovell commented Jan 19, 2016

Uh oh!

SparkQA commented Jan 20, 2016

Uh oh!

SparkQA commented Jan 20, 2016

Uh oh!

rxin commented Jan 20, 2016

Uh oh!

rxin commented Jan 20, 2016

Uh oh!

SparkQA commented Jan 20, 2016

Uh oh!

hvanhovell commented Jan 20, 2016

Uh oh!

hvanhovell commented Jan 20, 2016

Uh oh!

SparkQA commented Jan 20, 2016

Uh oh!

SparkQA commented Jan 20, 2016

Uh oh!

hvanhovell commented Jan 20, 2016

Uh oh!

hvanhovell Jan 20, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 20, 2016

Choose a reason for hiding this comment

Uh oh!