feat: Add HDFS support #5245

Xuanwo · 2022-05-09T04:07:51Z

Signed-off-by: Xuanwo [email protected]

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This PR will allow databend-query to use hdfs as storage backend.

Part of #5215

Changelog

New Feature

Signed-off-by: Xuanwo <[email protected]>

vercel · 2022-05-09T04:07:56Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Updated
databend	⬜️ Ignored (Inspect)		May 9, 2022 at 4:25AM (UTC)

mergify · 2022-05-09T04:08:18Z

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

Xuanwo · 2022-05-09T04:09:16Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

BohuTANG · 2022-05-09T04:10:55Z

This PR only makes the HDFS as a normal storage backend like AWS S3, it's not related to @dantengsky work on hive?

Xuanwo · 2022-05-09T04:12:03Z

From I understand, this PR makes the HDFS as a normal storage backend like AWS S3, it's not related to @dantengsky work on hive?

Yes. Hive's integration should happen in another PR.

query/src/configs/config_storage.rs

Signed-off-by: Xuanwo <[email protected]>

BohuTANG · 2022-05-09T05:15:26Z

query/src/configs/config_storage.rs

+
+    // hdfs storage backend config
+    #[clap(flatten)]
+    pub hdfs: HdfsConfig,


If we don't add the HdfsConfig item to *.toml, the config deserialize looks will crash?

This is another problem, we don't want unused config items explicitly:

[storage] # fs|s3 type = "s3" [storage.fs] -- this config [storage.s3] bucket = "databend" endpoint_url = "https://s3.amazonaws.com" access_key_id = "<your-key-id>" secret_access_key = "<your-access-key>" [storage.azblob] -- this config

Is it possible to configure the item only we used, like [storage.s3] here?

If we don't add the HdfsConfig item to *.toml, the config deserialize looks will crash?

If HdfsConfig is not added in *.toml, we will use the default value instead.

I tested this behavior locally: query is able to start without adding hdfs-related staff.

That's great, I am out, thank you.
I will remove all the unused config items from the documents.

dantengsky · 2022-05-09T05:33:49Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

Yeah, currently, hive PR's ut/it is not integrated with the github workflows yet. Just a local hadoop + hive cluster for the testings. there are no build-time dependencies on the JDK/jar files. A docker image seems to be able to cover it.

But for this PR, a docker image may not be enough. hope I get it right:

to enable feature storage-hdfs at compiling
we need a JDK(for libjvm.so and some header files of jvm)
to enable feature storage-hdfs at runtime (or ut/it)
an HDFS cluster (for the specified version) is needed

Xuanwo · 2022-05-09T05:47:36Z

@dantengsky Maybe I need to borrow some work from your PR to get JAVA setup correctly?

Yeah, currently, hive PR's ut/it is not integrated with the github workflows yet. Just a local hadoop + hive cluster for the testings. there are no build-time dependencies on the JDK/jar files. A docker image seems to be able to cover it.

But for this PR, a docker image may not be enough. hope I get it right:

to enable feature storage-hdfs at compiling
we need a JDK(for libjvm.so and some header files of jvm)

to enable feature storage-hdfs at runtime (or ut/it)
an HDFS cluster (for the specified version) is needed

Thanks for the advice! Maybe we can make databend-query compilable in this PR and test it in the next PR.

BohuTANG

👍

feat: Add HDFS support

6b0469e

Signed-off-by: Xuanwo <[email protected]>

Xuanwo requested a review from BohuTANG as a code owner May 9, 2022 04:07

databend-bot added the need-review label May 9, 2022

mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 9, 2022

BohuTANG requested a review from dantengsky May 9, 2022 04:09

BohuTANG reviewed May 9, 2022

View reviewed changes

query/src/configs/config_storage.rs Show resolved Hide resolved

Format toml

68004a3

Signed-off-by: Xuanwo <[email protected]>

BohuTANG reviewed May 9, 2022

View reviewed changes

BohuTANG approved these changes May 9, 2022

View reviewed changes

BohuTANG merged commit 4fd1b64 into databendlabs:main May 9, 2022

Xuanwo deleted the hdfs branch May 9, 2022 05:57

Xuanwo mentioned this pull request May 9, 2022

Tracking issues of adopting features added in OpenDAL v0.6 #5215

Closed

3 tasks

Uh oh!

feat: Add HDFS support #5245

feat: Add HDFS support #5245

Uh oh!

Conversation

Xuanwo commented May 9, 2022

Summary

Changelog

Uh oh!

vercel bot commented May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented May 9, 2022

Uh oh!

Xuanwo commented May 9, 2022

Uh oh!

BohuTANG commented May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xuanwo commented May 9, 2022

Uh oh!

Uh oh!

BohuTANG May 9, 2022

Choose a reason for hiding this comment

Uh oh!

BohuTANG May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xuanwo May 9, 2022

Choose a reason for hiding this comment

Uh oh!

BohuTANG May 9, 2022

Choose a reason for hiding this comment

Uh oh!

dantengsky commented May 9, 2022

Uh oh!

Xuanwo commented May 9, 2022

Uh oh!

BohuTANG left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented May 9, 2022 •

edited

Loading

BohuTANG commented May 9, 2022 •

edited

Loading

BohuTANG May 9, 2022 •

edited

Loading