Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Aug 16, 2017

What changes were proposed in this pull request?

Current CodeGeneraor.splitExpressions splits statements into methods if the total length of statements is more than 1024 characters. The length may include comments or empty line.

This PR excludes comment or empty line from the length to reduce the number of generated methods in a class, by using CodeFormatter.stripExtraNewLinesAndComments() method.

How was this patch tested?

Existing tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an internal SQLConf for it and make it adjustable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Aug 16, 2017

Test build #80749 has finished for PR 18966 at commit 5ef1f53.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80776 has finished for PR 18966 at commit 51b6253.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Aug 20, 2017

ping @gatorsmile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding, this is not the number of characters? This is the length of source codes, right?

Copy link
Member Author

@kiszk kiszk Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the length of source code does not mean the number of lines of the source code, you are right. This is because we check the sum of String.length.
If it is correct, I will update them based on the length of source codes.

Here are two examples.

abc -> 3

ab
c

-> 4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, using the number of characters looks pretty fragile. It is even worse than using the number of lines.

Copy link
Member Author

@kiszk kiszk Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, do we change this parameter to use number of lines instead of number of characters, too? It is possible technically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Aug 24, 2017

Test build #81100 has finished for PR 18966 at commit e59168c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this threshold impact this test case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Sorry, I updated the result. This threshold impacts this test case.

@SparkQA
Copy link

SparkQA commented Aug 25, 2017

Test build #81109 has finished for PR 18966 at commit ea9fea4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag does not take an effect at runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Could you elaborate this review comment?

Copy link
Member

@gatorsmile gatorsmile Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the conf to a new value at runtime, can you get the value from SQLConf.get.maxCodegenLinesPerFunction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I had another interpretation that "this value may not change performance".
Let me check this while I did it like other flags.

Copy link
Member Author

@kiszk kiszk Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Depends on calling context, it may take an effect or not. Should we pass SQLConf to this method? I am thinking whether we can pass SQLConf to a constructor or not.
Is there any other thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this option effective by executing SparkEnv.get.conf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Since to make it configurable takes long time, can we do it using hard-coded parameter?
Even in this case, this PR makes better since the estimation does not include characters for comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk You know, I am just afraid new regression could be introduced due to this change. Sorry for the delay. I really do not have a better solution. I kind of agree on your original solution. Just exclude the characters for comment. At least, it becomes better and take a less risk to hit a regression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I understand your concern about the possibility of new performance regression. I will use the original threshold (max characters) as hard-coded value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splited -> split

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to emphasize this is not for whole-stage codegen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is not correct. Explain it for expressions only? rename it to

spark.sql.codegen.expressions.maxCodegenLinesPerFunction

@SparkQA
Copy link

SparkQA commented Aug 26, 2017

Test build #81154 has finished for PR 18966 at commit 77f01e4.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Aug 26, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Aug 27, 2017

Test build #81160 has finished for PR 18966 at commit 77f01e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Aug 27, 2017

ping @gatorsmile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not following what we are doing for the other SQLConf. I am also thinking if we should just put it into StaticSQLConf. Let me check it with others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Is there any progress?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple related PRs. Maybe we can wait until the reviews are finished there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use bytecode size?

Copy link
Member Author

@kiszk kiszk Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we do not know precise bytecode size. Will we use estimated bytecode size based on characters per line? Or, other ideas to get precise bytecode size before compiling a method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle ping @gatorsmile

@SparkQA
Copy link

SparkQA commented Oct 4, 2017

Test build #82455 has finished for PR 18966 at commit a489938.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2017

Test build #82518 has finished for PR 18966 at commit b04c09c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"""([ |\t]*?\/\/[\s\S]*?\n)""").r // strip //comment
val codeWithoutComment = commentReg.replaceAllIn(input, "")
codeWithoutComment.replaceAll("""\n\s*\n""", "\n") // strip ExtraNewLines
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add back the test case of this function?

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Oct 10, 2017

Test build #82596 has finished for PR 18966 at commit 4c47802.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2017

Test build #82598 has finished for PR 18966 at commit 516a72a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants