Skip to content

Conversation

@dnskr
Copy link
Contributor

@dnskr dnskr commented Apr 26, 2023

Why are the changes needed?

The PR is needed to configure default properties to be used by Apache Spark as query engine.

The PR also changes values.yaml file structure:

# APIs for connectivity and interoperation between supported clients and Kyuubi server
api:
  # Thrift Binary protocol (HiveServer2 compatible)
  thriftBinary:
    ...

# Kyuubi server configuration
server:
  replicas: 2
  ...

# Query engines
engine:
  # Apache Spark default configuration
  spark:
    ...

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@codecov-commenter
Copy link

Codecov Report

Merging #4776 (be227cd) into master (b7012aa) will decrease coverage by 0.03%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master    #4776      +/-   ##
============================================
- Coverage     57.99%   57.96%   -0.03%     
  Complexity       13       13              
============================================
  Files           581      581              
  Lines         32431    32431              
  Branches       4309     4309              
============================================
- Hits          18807    18799       -8     
- Misses        11820    11827       +7     
- Partials       1804     1805       +1     

see 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793
Copy link
Member

pan3793 commented Apr 27, 2023

Seems we had such an idea about the structure of value.yaml, but decided to reject it.

In practice, if Spark uses HDFS as storage and HMS as metastore, typically, the user should provide hive-site.xml core-site.xml hdfs-site.xml etc. under HADOOP_CONF_DIR, which would be shared by both Kyuubi server and Spark engine (other engines may require it too)

spark.kubernetes.container.image.pullPolicy={{ .Values.engine.spark.image.pullPolicy }}
spark.kubernetes.container.image.pullSecrets={{ range .Values.imagePullSecrets }}{{ print .name "," }}{{ end }}

### Driver resources
Copy link
Member

@pan3793 pan3793 Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is kind of over-engineering stuff.

One advantage of Kyuubi is, it almost transparently supports all Spark features, users who are familiar w/ Spark, should easy to understand how Kyuubi works and how to configure the Spark engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we can put everything in spark-defaults.conf to values.yaml's sparkDefaults block, in Spark native configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is over-engineered implementation and it might confuse users about what to configure and where.
I would like to find some balance between convenient basic configuration and flexibility, but obviously it is not the best implementation.

@dnskr
Copy link
Contributor Author

dnskr commented May 10, 2023

Thanks for the comments!
This is experimental changes and they are not working fully, so I created the PR as draft. Apologize for confusing and for the delayed response.

Seems we had such an idea about the structure of value.yaml, but decided to reject it.

Right, we discussed here. I'll continue with flat structure in a separate PR. As I mentioned above, it is more experimental PR to track my tries and demo different approach.

In practice, if Spark uses HDFS as storage and HMS as metastore, typically, the user should provide hive-site.xml core-site.xml hdfs-site.xml etc. under HADOOP_CONF_DIR, which would be shared by both Kyuubi server and Spark engine (other engines may require it too)

Got it! I'll add these files as well. Am I right that there is no default HADOOP_CONF_DIR path in Kyuubi server or Kyuubi docker image? If no, could you please suggest how to set it in the chart(add env variable, property etc)?

@dnskr
Copy link
Contributor Author

dnskr commented Jan 9, 2024

Closed in favor of #5934

@dnskr dnskr closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants