- 
                Notifications
    
You must be signed in to change notification settings  - Fork 28.9k
 
[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal #10796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           retest this please  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed on the HiveCompatibilitySuite.udf_if test on the following query: SELECT IF(FALSE, 1, 1.1). It would not cast the trueCase to a decimal. CASE WHEN... ELSE did work and I used similar code for `if``.
| 
           LGTM, just one question.  | 
    
| 
           Test build #49557 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #49559 has finished for PR 10796 at commit  
  | 
    
| 
           The build fails on  It seems something funny is up with the build. It cannot create the following path on jenkins: file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/warehouse--bea7c96d-5447-4320-aa0f-985f7c807bca/**db2.db/**bucketed_table The highlighted part shouldn't be there. @cloud-fan / @JoshRosen any idea what is happening?  | 
    
| 
           retest this please  | 
    
| 
           LGTM, I'm not sure what's going on during the failing test, let's see if it passes this time.  | 
    
| 
           Test build #49564 has finished for PR 10796 at commit  
  | 
    
| 
           Hmm, looks like it might be a legitimate failure. I don't think that this suite is known to be flaky; https://spark-tests.appspot.com/tests/org.apache.spark.sql.sources.BucketedWriteSuite looks all green to me.  | 
    
| 
           I checked out your source code and ran this locally. The tests passed: Not 100% sure, but I suspect there is a bug with the test or the implementation itself and some changes here exposed that.  | 
    
| 
           BTW you can also change those tests to ignore for now and we can fix them later if we can't figure this out at the moment.  | 
    
| 
           I have added some debugging code to the bucketed write test. It looks like something is wrong with the constructed file path.  | 
    
| 
           Test build #49618 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #2404 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #2403 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #49667 has finished for PR 10796 at commit  
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is \\d+(\\.\\d*) correct or  \\d+(\\.\\d+) ?
Do we allow 123. parsed as decimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently do. See this lexer rule:
Number
    :
    ((Digit+ (DOT Digit*)?) | (DOT Digit+)) Exponent?
    ;
| 
           LGTM  | 
    
| 
           Another-Attempt-at-Fixing-This© The problem is caused in the way paths are derived. When we write the table we use the name of the current Hive database in the path. We don't account for this while reading the bucketed table.  | 
    
| 
           Test build #49713 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #2414 has finished for PR 10796 at commit  
  | 
    
| 
           @hvanhovell when you have a chance, can you also update CatalystQl to add some comment to Token and explain what it actually does? e.g. what it actually returns in Some[(String, List[ASTNode])]. Thanks.  | 
    
| 
           Actually if possible, it would be great to document what each function does in CatalystQl.scala (even for private functions), so somebody else new to that part of the code can pick it up easily. Feel free to do it as a separate pull request though.  | 
    
| 
           Test build #49767 has finished for PR 10796 at commit  
  | 
    
| 
           I retest this a few more times.  | 
    
| 
           retest this please  | 
    
| 
           Test build #49780 has finished for PR 10796 at commit  
  | 
    
| 
           Test build #49784 has finished for PR 10796 at commit  
  | 
    
| 
           retest this please  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan could you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable, just curious about why the previous one not working after your PR....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea. Nothing in the PR seems to touch this - it seems a bit random. The only think I am doing here is to make absolutely sure the paths are the same; the difference in paths is caused by the use of the hive current database in the path name.
| 
           Test build #49785 has finished for PR 10796 at commit  
  | 
    
| 
           Going to merge this. Thanks!  | 
    
### What changes were proposed in this pull request? Explain that a number like `123.456`, without a postfix but with a decimal point, is a decimal literal. ### Why are the changes needed? In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class. In Spark, it's the other way around. I found this surprising and couldn't find documentation about it. I discovered this after reading SPARK-45786. I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals. ### Does this PR introduce _any_ user-facing change? Yes, it adds a bit of documentation. ### How was this patch tested? No testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45003 from nchammas/decimal-literal. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Provide examples showing what type literals like `123.456` and `123.456E0` have in SQL. ### Why are the changes needed? In Python (and I think Java too) fractional numeric literals are typically floats. To get decimals, you need to provide an explicit postfix or use an explicit class. In Spark, it's the other way around. I found this surprising and couldn't find documentation about it. I discovered this after reading [SPARK-45786](https://issues.apache.org/jira/browse/SPARK-45786). I did a little searching and came across #10796, which shows that we used to default to floats as the fractional numeric literal, but then switched to decimals. There is an additional wrinkle I discovered in #45003. If the fractional literal has an exponent, then it's a double, not a decimal. The existing syntax documents this, but that alone is not user-friendly. The new examples make this clearer. ### Does this PR introduce _any_ user-facing change? Yes, it clarifies the user-facing documentation about fractional numeric literals. ### How was this patch tested? No testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45027 from nchammas/fractional-literal-take2. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Kent Yao <[email protected]>
The current parser turns a decimal literal, for example
12.1, into a Double. The problem with this approach is that we convert an exact literal into a non-exactDouble. The PR changes this behavior, a Decimal literal is now converted into an extactBigDecimal.The behavior for scientific decimals, for example
12.1e01, is unchanged. This will be converted into a Double.This PR replaces the
BigDecimalliteral by aDoubleliteral, because theBigDecimalis the default now. You can use the double literal by appending a 'D' to the value, for instance:3.141527Dcc @davies @rxin