Skip to content

Conversation

@OopsOutOfMemory
Copy link
Contributor

Currently, we use allCaseVersion function to match all possible case versions of Keyword that user passing into to sql query, like SelecT * From SRc is also allowed in query syntax.

A stackoverflow exception appears when Keyword is too long since allCaseVersion will generate 2^Keyword.length case versions. i.e. Keyword("SERDEPROPERTIES") will generate 2^15 = 32768 possible case version. This make implicit function asParser throws the SO exception.

I think it is unnecessary to generate all kinds of case versions, this will cause SO when keyword is too long and also do extra computing to generate all case versions of a given keyword.

So I'd like to replace the allCaseVersions matching Keyword with a more simpler way, and this also can prevent SO exception, speed up parsing.

issues description is here: https://issues.apache.org/jira/browse/SPARK-5009

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@OopsOutOfMemory
Copy link
Contributor Author

@chenghao-intel
Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toLowerCase probably causes some other issue, can you add a unit test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenghao-intel
Since Keyword is a constant and it's usage is to parsing matching and identify others, we don't need them after parsing correctly, so here I only make Keyword lower case is doesn't matter and will not cause other issues.

@chenghao-intel
Copy link
Contributor

I can confirm this is a bug when the keyword is too long, however, this fixing seems a little hack to me, Sorry, @OopsOutOfMemory , I need more time in investigating on this issue. But can you also add a unit test for this? So people can reproduce the exception easily.

@OopsOutOfMemory
Copy link
Contributor Author

@chenghao-intel @marmbrus
I add a test suit to reproduce this exception. Could u have a look at it : )
Also I think this is not a hack since Keyword is always a constant and will not affect the user passed data, And the allCaseVersions implementation has a bad performance when keyword is too long while converting it to a parser with | operator.

@chenghao-intel
Copy link
Contributor

@OopsOutOfMemory seems some other Parsers have the same bug, I've created the #3924 to refactor the code first. And will create another PR for the bug fixing, probably we can discuss the code then.

@chenghao-intel
Copy link
Contributor

@OopsOutOfMemory , I've updated the code to fix the long keyword issue at #3926, can you review that for me?

@OopsOutOfMemory
Copy link
Contributor Author

@chenghao-intel Thanks for working on this :)

@OopsOutOfMemory
Copy link
Contributor Author

This bug will be fixed in #3926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants