-
Notifications
You must be signed in to change notification settings - Fork 834
feat: Allow COPY FROM/INTO different storage services #6573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
Stateful tests need an update, I will fix them later. |
Signed-off-by: Xuanwo <[email protected]>
|
We need a stateful test for the new COPY style :) |
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
|
The |
Signed-off-by: Xuanwo <[email protected]>
|
@BohuTANG raised some security concerns, here are my test results: What happens if I start Copy via MySQL [(none)]> copy into x from 's3://testbucket/' connection = (endpoint_url='http://127.0.0.1:9900' access_key_id='minioadmin' secret_access_key='minioadmin') PATTERN = 'ontime_200.csv$' FILE_FORMAT = (type = CSV field_delimiter = ',' record_delimiter = '');
ERROR 1105 (HY000): Code: 3903, displayText = copy from insecure storage is not allowed.Copy via MySQL [(none)]> copy into x from 's3://testbucket/' connection = (endpoint_url='https://127.0.0.1:9900' access_key_id='xxxx' secret_access_key='yyyy') PATTERN = 'ontime_200.csv$' FILE_FORMAT = (type = CSV field_delimiter = ',' record_delimiter = '');
ERROR 1105 (HY000): Code: 4000, displayText = other error (backend error: (context: {"bucket": "testbucket"}, source: sending request: https://127.0.0.1:9900/testbucket: hyper::Error(Connect, Ssl(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 336130315, library: "SSL routines", function: "ssl3_get_record", reason: "wrong version number", file: "ssl/record/ssl3_record.c", line: 331 }]))) }, X509VerifyResult { code: 0, error: "ok" })))).Improvements that could be done in the future:
|
|
This is the enhanced version of |
|
I have a few questions about this new feature:
|
I replied in #6620 |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
This PR will close #6103 #6104
This PR adds endpoint URL support for COPY, user can specify the endpoint URL like the following:
COPY INTO mytable FROM 's3://mybucket/data.csv' CONNECTION = ( ENDPOINT_URL = 'http://127.0.0.1:9900' ) FILE_FORMAT = ( type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1 ) size_limit=10;Within this PR, we introduced all storage backends support for databend. Now, we can copy data from
azblob,hdfs,fs, and so on:From azblob:
COPY INTO mytable FROM 'azblob://mybucket/data.csv' CONNECTION = ( ENDPOINT_URL = 'http://127.0.0.1:9900' ) FILE_FORMAT = ( type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 1 ) size_limit=10;To avoid users accessor internal network, we added config for databend:
Users can only use an endpoint that starts with
https://unlessallow_insecurehas been enabled during deployment.Also, in this PR, we unify all connection-related options into
CONNECTION:COPY INTO mytable FROM 'azblob://mybucket/data.csv' CONNECTION = ( ENDPOINT_URL = 'http://127.0.0.1:9900' ACCESS_KEY_ID = 'access_key_id' SECRET_ACCESS_KEY = 'secret_access_key' )CREDENTIALSandENCRYPTIONare still supported for backward compatibility.Remaining Work
CREATE STAGEis not updated, we will adapt changes within the following PRs (addressed in Refactor CREATE STAGE with the new common-storage parse_uri_location #6580)