-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19654. Upgrade AWS SDK to 2.32.23 #7882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
HADOOP-19654. Upgrade AWS SDK to 2.32.23 #7882
Conversation
I found |
|
@pan3793 maybe. what is unrelated is out the box the SDK doesn't do bulk delete with third party stores which support it (Dell ECS). |
|
@pan3793 no, it's lifecycle related. Test needs to set up that minicluster before the test cases. and that's somehow not happening |
5b9a7e3 to
efd34a0
Compare
|
regressions everywhereNo logging. Instead we get
more on this once I've looked at it. If it is an SDK issue, major regression, though it may be something needing changes in the aal libary s3 expressassumption: now that the store has lifecycle rules, you don't get prefix listings when there's an in-progress upload. Fix: change test but also path capability warning of inconsistency. this is good. Operation costs/auditing count an extra HTTP request, so cost tests fail. I suspect it is always calling CreateSession, but without logging can't be sure |
efd34a0 to
6a7e6d9
Compare
6a7e6d9 to
cc31e5b
Compare
|
💔 -1 overall
This message was automatically generated. |
cc31e5b to
3351e41
Compare
|
Thanks @steveloughran, PR looks good overall. Are then failures in |
| // disable create session so there's no need to | ||
| // add a role policy for it. | ||
| disableCreateSession(conf); | ||
| //disableCreateSession(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can just cut this instead of commenting it out, since we're skipping these tests if S3 Express is enabled
| // close the stream, should throw RemoteFileChangedException | ||
| RemoteFileChangedException exception = intercept(RemoteFileChangedException.class, stream::close); | ||
| assertS3ExceptionStatusCode(SC_412_PRECONDITION_FAILED, exception); | ||
| verifyS3ExceptionStatusCode(SC_412_PRECONDITION_FAILED, exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know what the difference is with the other tests here?
As in, why with S3 express is it ok to assert that we'll get a 412, whereas the others tests will throw a 200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it's your server code. Go see.
|
💔 -1 overall
This message was automatically generated. |
|
I've attached a log of a test run against an s3 express bucket where the test The relevant stuff is at line 564 where a HEAD request fails because the stream is broken The second request always works. Either the request is being rejected (why?) or the connection has gone stale. But why should it happen at exactly the same place on every single test run? org.apache.hadoop.fs.s3a.statistics.ITestAWSStatisticCollection-output.txt |
|
💔 -1 overall
This message was automatically generated. |
|
@steveloughran discovered completely by accident, but it's something to do with the checksumming code. If you comment out these lines: the test will pass. Could be something to do with s3Express not supporting md5, will look into it. |
|
Specifically, it's this line: Comment that out, or change it to My guess is it's something to do with S3 express not supporting MD5, but for operations where Have asked the SDK team. |
|
ok, so maybe for s3express stores we don't do legacy MD5 plugin stuff all is good?
While on the topic of S3 Express, is it now the case that because there's lifecycle rules for cleanup, LIST calls don't return prefixes of paths with incomplete uploads? If so I will need to change production code and the test -with a separate JIRA for that for completeness |
@steveloughran confirming with the SDK team, since the MD5 plugin is supposed to restore previous behaviour, the server rejecting the first request seems wrong. let's see what they have to say.
Will check with S3 express team on this |
661dc6e to
aa8e814
Compare
|
thanks. I don't see it on tests against s3 with the 2.29.52 release, so something is changing with the requests made with new SDK + MD5 stuff. |
|
@steveloughran not able to narrow this error down just yet, it looks like it's a combination of S3A's configuration of the S3 client + these new Md5 changes. I see the failure when the S3A client, and don't see it when I use a newly created client. So it's not just because of Looking into it some more. S3 express team said there have been no changes in LIST behaviour. |
|
able to reproduce the issue outside of S3A. Basically did what would happen when you run a test in S3A:
The head fails, but if you comment out no idea what's going on. but have shared this local reproduction with SDK team. And rules out that it's something in the S3A code. |
|
good explanation. Though I would have expected a bit broader test coverage of your own stores; something to look for on the next library update. Can I also get improvements in error translation too -we need the error string including request IDs. Relying on the stack entry below to print it isn't enough, as deep exception nesting (hive, spark) can lose that. |
|
one more thing here: make sure you can handle You won't be able to detect overwrites, but we can just document having a short TTL here. |
|
@steveloughran updated exception handling: awslabs/analytics-accelerator-s3#361, next release will have include the requestIDs in the message, eg: The null as |
This is just the library declaration change.
* create legacy MD5 headers * downgrade request checksum calculation to "when required" * set the (existing) checksum validation flag the same way so the builder library is happy * add "none" as a valid checksum algorithm * document that and the existing "unknown_to_sdk_version" option * with troubleshooting * ITestS3AChecksum modified to handle these checksum changes.
… of multiparts is 200 + error * Recognise and wrap with RemoteFileChangedException * Fix up assertions in test.
Add relevant statements for s3 access and skip all tests which expect partial access to paths on a bucket.
...so adds cost to the assertion
…3 express Revert test to original assert (no special treatment of s3 express), and log Initiating GET request" for ease of log scanning. Add to the log4.properties file the settings needed to turn on wire logging, commented out by default.
…MD5 header fs.s3a.checksum.calculation.enabled (default: true) fs.s3a.md5.header.enabled (default: false)
The problem here is not the signing -it's that the test couldn't cope with path-style-access as it is used the hostname of the http request as the bucket name. Fixed by adding a static field to note when path access is in use and to map the name to "bucket" in such a situation.
Third-party stores may remove an ongoing MPU after it is committed, so a retry will fail. On AWS S3 it is simply ignored. New option fs.s3a.mpu.commit.consumes.upload.id can be set to true to change assertions in tests which would otherwise fail.
remove mention of CRC32 as checksum algorithm; don't assert that it is null if requested version is is unknown.
New command "etag". Retrieves and prints etags. Updated release version HADOOP-19654. Update third party documentation, especially on null etags Input stream MUST be set to classic unless/until support there changes.
138bbee to
2e47690
Compare
This includes one production-side change: FS automatically declares that magic committers are disabled if MPU is disabled.
Final(?) wrap-up of tests which fail on a store without multipart upload or bulk delete.
|
latest iteration works with third party stores without MPU (so no magic or use of memory for upload buffering), or bulk delete. tested google gcs, only underful buffers which can be ignored. |
|
@ahmarsuhail I think Apache Ozone is the one. I just added an dell ECS and Google both supply etags. We don't retrieve them for directory markers anyway, which isn't an issue
|
|
💔 -1 overall
This message was automatically generated. |
Address style, trailing space and javadoc. Not addressed: the new spotbugs errors. These are not from this patch.
|
💔 -1 overall
This message was automatically generated. |
|
fun test run today, against s3 london. Most of the multipart upload/commit tests were failing "missing part", from cli or IDE. Testing with S3 express was happy. ( This has to be some transient issue with my s3 london bucket, as if in progress upload parts were not being retained. Never seen this before; the expiry time is set to 24h When these uploads fail we do leave incomplete uploads in progress: Most interesting here is The attempt to complete failed. Yet the uploads list afterwards finds it I have to conclude that the list of pending uploads was briefly offline/inconsistent. This is presumably so, so rare that there's almost no point retrying here. With no retries, every active write/job would have failed, even though the system had recovered within a minute. Maybe we should retry here? I remember a long long time ago the v1 sdk didn't retry on failures of the final POST to commit an upload, and how that sporadically caused problems. Retrying on MPU failures will allow for recovery in the presence of a transient failure here, and the cost of "deletion of all pending uploads will take longer to fail all active uploads". |
|
💔 -1 overall
This message was automatically generated. |
How was this patch tested?
Testing in progress; still trying to get the ITests working.
JUnit5 update complicates things here, as it highlights that minicluster tests aren't working.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?