Skip to content

Conversation

@hvanhovell
Copy link
Contributor

The current parser turns a decimal literal, for example 12.1, into a Double. The problem with this approach is that we convert an exact literal into a non-exact Double. The PR changes this behavior, a Decimal literal is now converted into an extact BigDecimal.

The behavior for scientific decimals, for example 12.1e01, is unchanged. This will be converted into a Double.

This PR replaces the BigDecimal literal by a Double literal, because the BigDecimal is the default now. You can use the double literal by appending a 'D' to the value, for instance: 3.141527D

cc @davies @rxin

@hvanhovell
Copy link
Contributor Author

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed on the HiveCompatibilitySuite.udf_if test on the following query: SELECT IF(FALSE, 1, 1.1). It would not cast the trueCase to a decimal. CASE WHEN... ELSE did work and I used similar code for `if``.

@davies
Copy link
Contributor

davies commented Jan 17, 2016

LGTM, just one question.

@SparkQA
Copy link

SparkQA commented Jan 17, 2016

Test build #49557 has finished for PR 10796 at commit f5ecab6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2016

Test build #49559 has finished for PR 10796 at commit 5c72b58.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

The build fails on BucketedWriteSuite. This test passes on my local machine, and I am pretty sure I am not touching that part of Hive.

It seems something funny is up with the build. It cannot create the following path on jenkins: file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/warehouse--bea7c96d-5447-4320-aa0f-985f7c807bca/**db2.db/**bucketed_table

The highlighted part shouldn't be there. @cloud-fan / @JoshRosen any idea what is happening?

@hvanhovell
Copy link
Contributor Author

retest this please

@cloud-fan
Copy link
Contributor

LGTM, I'm not sure what's going on during the failing test, let's see if it passes this time.

@SparkQA
Copy link

SparkQA commented Jan 18, 2016

Test build #49564 has finished for PR 10796 at commit 5c72b58.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Hmm, looks like it might be a legitimate failure. I don't think that this suite is known to be flaky; https://spark-tests.appspot.com/tests/org.apache.spark.sql.sources.BucketedWriteSuite looks all green to me.

@rxin
Copy link
Contributor

rxin commented Jan 18, 2016

I checked out your source code and ran this locally. The tests passed:

22:39:53.378 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Persisting partitioned data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s): 
file:/scratch/rxin/spark/target/tmp/warehouse--56673338-b401-4032-854f-34cf4197194b/bucketed_table
22:40:03.856 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Couldn't find corresponding Hive SerDe for data source provider json. Persisting data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
22:40:12.559 WARN org.apache.spark.sql.hive.HiveContext$$anon$2: Persisting partitioned data source relation `bucketed_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s): 
file:/scratch/rxin/spark/target/tmp/warehouse--56673338-b401-4032-854f-34cf4197194b/bucketed_table

Not 100% sure, but I suspect there is a bug with the test or the implementation itself and some changes here exposed that.

@rxin
Copy link
Contributor

rxin commented Jan 18, 2016

BTW you can also change those tests to ignore for now and we can fix them later if we can't figure this out at the moment.

@hvanhovell
Copy link
Contributor Author

I have added some debugging code to the bucketed write test. It looks like something is wrong with the constructed file path.

@SparkQA
Copy link

SparkQA commented Jan 18, 2016

Test build #49618 has finished for PR 10796 at commit 0ac091a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2016

Test build #2404 has finished for PR 10796 at commit 0ac091a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2016

Test build #2403 has finished for PR 10796 at commit 0ac091a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2016

Test build #49667 has finished for PR 10796 at commit df8e237.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is \\d+(\\.\\d*) correct or \\d+(\\.\\d+) ?
Do we allow 123. parsed as decimal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do. See this lexer rule:

Number
    :
    ((Digit+ (DOT Digit*)?) | (DOT Digit+)) Exponent?
    ;

@viirya
Copy link
Member

viirya commented Jan 19, 2016

LGTM

@hvanhovell
Copy link
Contributor Author

Another-Attempt-at-Fixing-This©

The problem is caused in the way paths are derived. When we write the table we use the name of the current Hive database in the path. We don't account for this while reading the bucketed table.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49713 has finished for PR 10796 at commit 1b0508f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #2414 has finished for PR 10796 at commit 1b0508f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 20, 2016

@hvanhovell when you have a chance, can you also update CatalystQl to add some comment to Token and explain what it actually does? e.g. what it actually returns in Some[(String, List[ASTNode])].

Thanks.

@rxin
Copy link
Contributor

rxin commented Jan 20, 2016

Actually if possible, it would be great to document what each function does in CatalystQl.scala (even for private functions), so somebody else new to that part of the code can pick it up easily.

Feel free to do it as a separate pull request though.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49767 has finished for PR 10796 at commit 8bf2a27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

I retest this a few more times.

@hvanhovell
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49780 has finished for PR 10796 at commit 8bf2a27.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49784 has finished for PR 10796 at commit 9afd729.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

retest this please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan could you take a look at this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, just curious about why the previous one not working after your PR....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. Nothing in the PR seems to touch this - it seems a bit random. The only think I am doing here is to make absolutely sure the paths are the same; the difference in paths is caused by the use of the hive current database in the path name.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49785 has finished for PR 10796 at commit 9afd729.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 20, 2016

Going to merge this. Thanks!

@asfgit asfgit closed this in 1017327 Jan 20, 2016
HyukjinKwon pushed a commit that referenced this pull request Feb 5, 2024
### What changes were proposed in this pull request?

Explain that a number like `123.456`, without a postfix but with a decimal point, is a decimal literal.

### Why are the changes needed?

In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class.

In Spark, it's the other way around. I found this surprising and couldn't find documentation about it.

I discovered this after reading SPARK-45786. I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals.

### Does this PR introduce _any_ user-facing change?

Yes, it adds a bit of documentation.

### How was this patch tested?

No testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45003 from nchammas/decimal-literal.

Authored-by: Nicholas Chammas <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
yaooqinn pushed a commit that referenced this pull request Feb 6, 2024
### What changes were proposed in this pull request?

Provide examples showing what type literals like `123.456` and `123.456E0` have in SQL.

### Why are the changes needed?

In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class. In Spark, it's the other way around. I found this surprising and couldn't find documentation about it.

I discovered this after reading [SPARK-45786](https://issues.apache.org/jira/browse/SPARK-45786). I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals.

There is an additional wrinkle I discovered in #45003. If the fractional literal has an exponent, then it's a double, not a decimal.

The existing syntax documents this, but that alone is not user-friendly. The new examples make this clearer.

### Does this PR introduce _any_ user-facing change?

Yes, it clarifies the user-facing documentation about fractional numeric literals.

### How was this patch tested?

No testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45027 from nchammas/fractional-literal-take2.

Authored-by: Nicholas Chammas <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants