-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24215][PySpark][Follow Up] Implement eager evaluation for DataFrame APIs in PySpark #21553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #91766 has finished for PR 21553 at commit
|
retest this please |
Test build #91778 has finished for PR 21553 at commit
|
docs/configuration.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SQLConf, we do not need to hard code the conf description here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks, done in afada2b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rewrite these descriptions based on the description I posted in the original PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in afada2b.
Could you address the comments in the original PR? |
Thanks, I want take this. Maybe it should be done in another jira and PR, and I should fix all the config hard code in PySpark? |
Test build #91901 has finished for PR 21553 at commit
|
python/pyspark/sql/conf.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile I moved all core configs using in pyspark into conf.py here. Please have a look when you have time.
#21370 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this! Let us do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done in #21648
@gatorsmile I address the comments in the last commit, but maybe it should be done in a independent PR and Jira? |
Test build #92344 has finished for PR 21553 at commit
|
docs/configuration.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are removing documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL Confs are not part of the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in sql-programming-guide.md
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the SQL configuration, all the description can be shown by spark.sql("SET -v").show(numRows = 200, truncate = false)
. https://spark.apache.org/docs/latest/configuration.html#spark-sql
python/pyspark/sql/conf.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PySpark SQLConf stuff should be done in a separate Jira/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it should be separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree, done in #21648.
python/pyspark/sql/conf.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the key. I think current way duplicates a lot of codes in Scala side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm also puzzled by this, cause we also do the register in Scala side. How about just call buildConf on Scala side for theses keys which used only on PySpark? Lets discuss it in #21648
d719dfb
to
00ae164
Compare
In the last commit I revert the changes of SQLConf and created a new PR of #21648. Could this follow up PR merged first? Thanks. |
Test build #92373 has finished for PR 21553 at commit
|
retest this please |
Test build #92377 has finished for PR 21553 at commit
|
LGTM Thanks! Merged to master. |
What changes were proposed in this pull request?
Address comments in #21370 and add more test.
How was this patch tested?
Enhance test in pyspark/sql/test.py and DataFrameSuite