-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation #44300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon @cloud-fan @gatorsmile - What do you think? Tagging you because you last reviewed the documentation machinery I am modifying here.
docs/sql-performance-tuning.md
Outdated
| </tr> | ||
|
|
||
| </table> | ||
| {% include_relative generated-sql-config-table-caching-data.html %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff demonstrates the main benefit of this PR. Instead of needing to copy and maintain the full HTML table of some configs, we tags the ones we want to group together in SQLConf.scala and then reference that group's table here.
| this | ||
| } | ||
|
|
||
| def withTag(tag: String): ConfigBuilder = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on the fence regarding this name. Maybe something more explicit like withDocumentationGroup would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do away with custom group names and instead select certain prefixes for use as documentation groups, like spark.sql.cbo, spark.sql.statistics, etc.
However, this won't work for groupings that don't align with config name prefixes, like the breakdown of runtime vs. static configurations that was added in #28274.
|
There are about 55 HTML tables of Spark configs (both SQL and non-SQL) across our documentation. They probably span a few thousand lines of HTML. I believe they can all be replaced with automatically generated HTML tables. I intend to do that if this PR gets buy-in from committers. |
|
(will try to take a look around next week) |
|
Thank you, Hyukjin. (I hope when I edited this comment it didn't re-ping you. That wasn't my intention.) |
we need it until generated function tables get moved, too
|
Btw @HyukjinKwon if there is anything I can do to make this PR easier to review, I'm all ears. The key change is the introduction of |
|
@HyukjinKwon - How are you feeling about this PR? Shall I recruit another committer to take a look instead? No worries if you need more time, or if you disapprove of the approach. I just want to check in since I have the time and motivation to migrate all of our config documentation to use this new system. As I noted in a previous comment, there are around 55 config tables across our documentation that span several thousand lines of manually maintained HTML. They can all be eliminated with this approach. I think it would be a big win for the maintainability of our documentation. |
|
Silence on a PR usually means there is something wrong with it, so I am proactively trying to figure out how to move this idea forward. The possibilities I am considering are:
In the case of 2, I have proposed a completely different approach in #44756 that does not touch Spark core. In the case of 3, I pushed #44755, which is a stripped down version of this PR which should be easier to review. In the case of 1, I need some feedback on why this is a bad idea so I can drop this effort. As you can see, I believe it's a good idea and am eager to push it forward. In the case of 4, please forgive me for repeatedly pinging this PR and being annoying. 😅 |
What changes were proposed in this pull request?
Summary:
This change enables us to assign the various SQL configs to groups that can then be referenced automatically in our documentation. This will replace the large number of manually maintained HTML tables with references to generated tables.
For example, the SQL performance tuning page has a section on data caching that references two configs. They are in an HTML table that has to be manually maintained.
With this new system, we group the configs in
SQLConf.scalaaccording to how we'd like to display them in our documentation. For example, we can assign some configs to thesql-tuning-caching-datagroup:Then we can reference this
sql-tuning-caching-dataconfig group from withinsql-performance-tuning.mdas follows:{% include_api_gen _generated/config_tables/sql-tuning-caching-data.html %}This pulls in an automatically generated HTML table that includes all the configs tagged
sql-tuning-caching-data.In addition to eliminating the need to manually maintain HTML tables of configs, this PR also fixes some problems with the current documentation:
spark.sql.files.openCostInBytesis an internal config, but it is currently documented publicly in the SQL tuning guide.spark.sql.autoBroadcastJoinThresholdis missing the following line that is in the code:This PR also adds anchors to each config in the generated HTML tables, so that people can link directly to specific configurations in the documentation.
This PR builds on the work done in #27459, and is related to the work done in #28274.
Why are the changes needed?
This makes Spark configuration documentation much easier to maintain and more useful to users.
Does this PR introduce any user-facing change?
Yes, it alters some of the user-facing documentation.
How was this patch tested?
Manually built the documentation and viewed it in my browser.
Here's a screenshot of the generated docs: sql-performance-tuning.html 📸
I also reviewed the generated documentation for
configuration.htmlbut didn't screenshot it because it's too long.Was this patch authored or co-authored using generative AI tooling?
No.