-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-5009][SQL][Bug FIx] allCaseVersions leads to stackoverflow. #3909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@chenghao-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toLowerCase probably causes some other issue, can you add a unit test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel
Since Keyword is a constant and it's usage is to parsing matching and identify others, we don't need them after parsing correctly, so here I only make Keyword lower case is doesn't matter and will not cause other issues.
|
I can confirm this is a bug when the keyword is too long, however, this fixing seems a little hack to me, Sorry, @OopsOutOfMemory , I need more time in investigating on this issue. But can you also add a unit test for this? So people can reproduce the exception easily. |
|
@chenghao-intel @marmbrus |
|
@OopsOutOfMemory seems some other Parsers have the same bug, I've created the #3924 to refactor the code first. And will create another PR for the bug fixing, probably we can discuss the code then. |
|
@OopsOutOfMemory , I've updated the code to fix the long keyword issue at #3926, can you review that for me? |
|
@chenghao-intel Thanks for working on this :) |
|
This bug will be fixed in #3926 |
Currently, we use
allCaseVersionfunction to match all possible case versions ofKeywordthat user passing into to sql query, likeSelecT * From SRcis also allowed in query syntax.A stackoverflow exception appears when
Keywordis too long sinceallCaseVersionwill generate 2^Keyword.lengthcase versions. i.e.Keyword("SERDEPROPERTIES")will generate 2^15 = 32768 possible case version. This make implicit functionasParserthrows the SO exception.I think it is unnecessary to generate all kinds of case versions, this will cause SO when keyword is too long and also do extra computing to generate all case versions of a given keyword.
So I'd like to replace the
allCaseVersionsmatchingKeywordwith a more simpler way, and this also can prevent SO exception, speed up parsing.issues description is here: https://issues.apache.org/jira/browse/SPARK-5009