-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation #28274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9324ba9
991a7ec
284ab70
9621155
b61b46c
4fca3fd
5a42b0d
1de696e
81e043f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2622,11 +2622,32 @@ Please refer to the [Security](security.html) page for available options on how | |
| Spark subsystems. | ||
|
|
||
|
|
||
| {% for static_file in site.static_files %} | ||
| {% if static_file.name == 'generated-sql-configuration-table.html' %} | ||
| ### Spark SQL | ||
|
|
||
| {% include_relative generated-sql-configuration-table.html %} | ||
| #### Runtime SQL Configuration | ||
|
|
||
| Runtime SQL configurations are per-session, mutable Spark SQL configurations. They can be set with initial values by the config file | ||
| and command-line options with `--conf/-c` prefixed, or by setting `SparkConf` that are used to create `SparkSession`. | ||
| Also, they can be set and queried by SET commands and rest to their initial values by RESET command, | ||
| or by `SparkSession.conf`'s setter and getter methods in runtime. | ||
|
|
||
| {% for static_file in site.static_files %} | ||
| {% if static_file.name == 'generated-runtime-sql-config-table.html' %} | ||
| {% include_relative generated-runtime-sql-config-table.html %} | ||
| {% break %} | ||
| {% endif %} | ||
| {% endfor %} | ||
|
|
||
|
|
||
| #### Static SQL Configuration | ||
|
|
||
| Static SQL configurations are cross-session, immutable Spark SQL configurations. They can be set with final values by the config file | ||
| and command-line options with `--conf/-c` prefixed, or by setting `SparkConf` that are used to create `SparkSession`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could mention
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| External users can query the static sql config values via `SparkSession.conf` or via set command, e.g. `SET spark.sql.extensions;`, but cannot set/unset them. | ||
|
|
||
| {% for static_file in site.static_files %} | ||
| {% if static_file.name == 'generated-static-sql-config-table.html' %} | ||
| {% include_relative generated-static-sql-config-table.html %} | ||
| {% break %} | ||
| {% endif %} | ||
| {% endfor %} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.api.python | ||
|
|
||
| import org.apache.spark.SparkFunSuite | ||
| import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} | ||
|
|
||
| class PythonSQLUtilsSuite extends SparkFunSuite { | ||
|
|
||
| test("listing sql configurations contains runtime ones only") { | ||
| val configs = PythonSQLUtils.listSQLConfigs() | ||
|
|
||
| // static sql configurations | ||
| assert(!configs.exists(entry => entry._1 == StaticSQLConf.SPARK_SESSION_EXTENSIONS.key), | ||
| "listSQLConfigs should contain public static sql configuration") | ||
| assert(!configs.exists(entry => entry._1 == StaticSQLConf.DEBUG_MODE.key), | ||
| "listSQLConfigs should not contain internal static sql configuration") | ||
|
|
||
| // dynamic sql configurations | ||
| assert(configs.exists(entry => entry._1 == SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key), | ||
| "listSQLConfigs should contain public dynamic sql configuration") | ||
| assert(!configs.exists(entry => entry._1 == SQLConf.ANALYZER_MAX_ITERATIONS.key), | ||
| "listSQLConfigs should not contain internal dynamic sql configuration") | ||
|
|
||
| // spark core configurations | ||
| assert(!configs.exists(entry => entry._1 == "spark.master"), | ||
| "listSQLConfigs should not contain core configuration") | ||
| } | ||
|
|
||
| test("listing static sql configurations contains public static ones only") { | ||
| val configs = PythonSQLUtils.listStaticSQLConfigs() | ||
|
|
||
| // static sql configurations | ||
| assert(configs.exists(entry => entry._1 == StaticSQLConf.SPARK_SESSION_EXTENSIONS.key), | ||
| "listStaticSQLConfigs should contain public static sql configuration") | ||
| assert(!configs.exists(entry => entry._1 == StaticSQLConf.DEBUG_MODE.key), | ||
| "listStaticSQLConfigs should not contain internal static sql configuration") | ||
|
|
||
| // dynamic sql configurations | ||
| assert(!configs.exists(entry => entry._1 == SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key), | ||
| "listStaticSQLConfigs should not contain dynamic sql configuration") | ||
| assert(!configs.exists(entry => entry._1 == SQLConf.ANALYZER_MAX_ITERATIONS.key), | ||
| "listStaticSQLConfigs should not contain internal dynamic sql configuration") | ||
|
|
||
| // spark core configurations | ||
| assert(!configs.exists(entry => entry._1 == "spark.master"), | ||
| "listStaticSQLConfigs should not contain core configuration") | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| import os | ||
| import re | ||
| import sys | ||
| from collections import namedtuple | ||
| from textwrap import dedent | ||
|
|
||
|
|
@@ -30,15 +31,19 @@ | |
| "SQLConfEntry", ["name", "default", "description", "version"]) | ||
|
|
||
|
|
||
| def get_public_sql_configs(jvm): | ||
| def get_public_sql_configs(jvm, group): | ||
| if group == "static": | ||
| config_set = jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listStaticSQLConfigs() | ||
| else: | ||
| config_set = jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listSQLConfigs() | ||
| sql_configs = [ | ||
| SQLConfEntry( | ||
| name=_sql_config._1(), | ||
| default=_sql_config._2(), | ||
| description=_sql_config._3(), | ||
| version=_sql_config._4() | ||
| ) | ||
| for _sql_config in jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listSQLConfigs() | ||
| for _sql_config in config_set | ||
| ] | ||
| return sql_configs | ||
|
|
||
|
|
@@ -114,11 +119,17 @@ def generate_sql_configs_table_html(sql_configs, path): | |
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| if len(sys.argv) != 2: | ||
| print("Usage: ./bin/spark-submit sql/gen-sql-config-docs.py <static|runtime>") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could just call the function twice instead of adding an argument considering that SQL configuration gen is pretty cheap. Let's fix it next time when we happen to touch here.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. got it |
||
| sys.exit(-1) | ||
| else: | ||
| group = sys.argv[1] | ||
|
|
||
| jvm = launch_gateway().jvm | ||
| sql_configs = get_public_sql_configs(jvm) | ||
| sql_configs = get_public_sql_configs(jvm, group) | ||
|
|
||
| spark_root_dir = os.path.dirname(os.path.dirname(__file__)) | ||
| sql_configs_table_path = os.path.join( | ||
| spark_root_dir, "docs/generated-sql-configuration-table.html") | ||
| sql_configs_table_path = os.path\ | ||
| .join(spark_root_dir, "docs", "generated-" + group + "-sql-config-table.html") | ||
|
|
||
| generate_sql_configs_table_html(sql_configs, path=sql_configs_table_path) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally removed the leading spaces here because they are considered actual white spaces. Newer liquid syntax supports to ignore these white spaces but I didn't use in case old Jykill is used.
Seems like it could make the html format malformed in some cases given my rough testing. Let's remove these leading white spaces next time.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might the problem be that
### Spark SQLcontent was inside the for-loop before?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we already use the markdown and Liquid syntax together at https://github.com/apache/spark/blob/master/docs/sql-ref-functions-builtin.md