Skip to content

Commit 018e034

Browse files
committed
HADOOP-19057. Landsat bucket deleted
Moves to new test file/bucket Adopts test path s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz this is actually quite an interesting path as it has a space in and breaks s3guard tool uri parsing. fix: those tests just take the root schema/host and not the rest Rename all methods about ExternalFile rather than CSV file, as we no longer expect it to be CSV. Leaves the test key name alone: fs.s3a.scale.test.csvfile This is a .gz file (needed for coded testing) on a store with anonymous access supported. All references to "landsat" in the code and docs have been stripped. * "external file" used instead of "csv file" * "external bucket" used instead of "landsat bucket" * All examples updated. * Unit tests which used it as an arbitrary s3 bucket now use the constant UNIT_TEST_EXAMPLE_PATH = "s3a://example/data/" * references inc variable names where it was a "csv file" now say "external file" ITestS3APrefetchingCacheFiles fixes: * don't remove bucket overrides * use a smaller block size * use an isolated buffer dir * make teardown resilient to startup failures. This stuff isn't going to be backportable to older releases with ITestS3APrefetchingCacheFiles; we will just have to expect failures there as the new test file is too small for the seek logic. Change-Id: Ifcdfa20d753b0ab2b35577291bed1db8aea41f54
1 parent d278b34 commit 018e034

30 files changed

+241
-286
lines changed

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ public CompressionCodecFactory(Configuration conf) {
194194
* Find the relevant compression codec for the given file based on its
195195
* filename suffix.
196196
* @param file the filename to check
197-
* @return the codec object
197+
* @return the codec object or null if no matching codec is found.
198198
*/
199199
public CompressionCodec getCodec(Path file) {
200200
CompressionCodec result = null;

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/assumed_roles.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,7 @@ If an operation fails with an `AccessDeniedException`, then the role does not ha
585585
the permission for the S3 Operation invoked during the call.
586586

587587
```
588-
> hadoop fs -touch s3a://landsat-pds/a
588+
> hadoop fs -touch s3a://noaa-isd-pds/a
589589
590590
java.nio.file.AccessDeniedException: a: Writing Object on a:
591591
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,9 @@ Specific buckets can have auditing disabled, even when it is enabled globally.
111111

112112
```xml
113113
<property>
114-
<name>fs.s3a.bucket.landsat-pds.audit.enabled</name>
114+
<name>fs.s3a.bucket.noaa-isd-pds.audit.enabled</name>
115115
<value>false</value>
116-
<description>Do not audit landsat bucket operations</description>
116+
<description>Do not audit bucket operations</description>
117117
</property>
118118
```
119119

@@ -342,9 +342,9 @@ either globally or for specific buckets:
342342
</property>
343343

344344
<property>
345-
<name>fs.s3a.bucket.landsat-pds.audit.referrer.enabled</name>
345+
<name>fs.s3a.bucket.noaa-isd-pds.audit.referrer.enabled</name>
346346
<value>false</value>
347-
<description>Do not add the referrer header to landsat operations</description>
347+
<description>Do not add the referrer header to operations</description>
348348
</property>
349349
```
350350

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md

Lines changed: 3 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -747,7 +747,7 @@ For example, for any job executed through Hadoop MapReduce, the Job ID can be us
747747
### `Filesystem does not have support for 'magic' committer`
748748

749749
```
750-
org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://landsat-pds': Filesystem does not have support for 'magic' committer enabled
750+
org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://noaa-isd-pds': Filesystem does not have support for 'magic' committer enabled
751751
in configuration option fs.s3a.committer.magic.enabled
752752
```
753753

@@ -760,42 +760,15 @@ Remove all global/per-bucket declarations of `fs.s3a.bucket.magic.enabled` or se
760760

761761
```xml
762762
<property>
763-
<name>fs.s3a.bucket.landsat-pds.committer.magic.enabled</name>
763+
<name>fs.s3a.bucket.noaa-isd-pds.committer.magic.enabled</name>
764764
<value>true</value>
765765
</property>
766766
```
767767

768768
Tip: you can verify that a bucket supports the magic committer through the
769-
`hadoop s3guard bucket-info` command:
769+
`hadoop s3guard bucket-info` command.
770770

771771

772-
```
773-
> hadoop s3guard bucket-info -magic s3a://landsat-pds/
774-
Location: us-west-2
775-
776-
S3A Client
777-
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
778-
Endpoint: fs.s3a.endpoint=s3.amazonaws.com
779-
Encryption: fs.s3a.encryption.algorithm=none
780-
Input seek policy: fs.s3a.experimental.input.fadvise=normal
781-
Change Detection Source: fs.s3a.change.detection.source=etag
782-
Change Detection Mode: fs.s3a.change.detection.mode=server
783-
784-
S3A Committers
785-
The "magic" committer is supported in the filesystem
786-
S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
787-
S3A Committer name: fs.s3a.committer.name=magic
788-
Store magic committer integration: fs.s3a.committer.magic.enabled=true
789-
790-
Security
791-
Delegation token support is disabled
792-
793-
Directory Markers
794-
The directory marker policy is "keep"
795-
Available Policies: delete, keep, authoritative
796-
Authoritative paths: fs.s3a.authoritative.path=```
797-
```
798-
799772
### Error message: "File being created has a magic path, but the filesystem has magic file support disabled"
800773

801774
A file is being written to a path which is used for "magic" files,

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -289,9 +289,8 @@ for buckets in the central and EU/Ireland endpoints.
289289

290290
```xml
291291
<property>
292-
<name>fs.s3a.bucket.landsat-pds.endpoint.region</name>
292+
<name>fs.s3a.bucket.us2w-dataset.endpoint.region</name>
293293
<value>us-west-2</value>
294-
<description>The region for s3a://landsat-pds URLs</description>
295294
</property>
296295

297296
<property>
@@ -354,9 +353,9 @@ The boolean option `fs.s3a.endpoint.fips` (default `false`) switches the S3A con
354353
For a single bucket:
355354
```xml
356355
<property>
357-
<name>fs.s3a.bucket.landsat-pds.endpoint.fips</name>
356+
<name>fs.s3a.bucket.noaa-isd-pds.endpoint.fips</name>
358357
<value>true</value>
359-
<description>Use the FIPS endpoint for the landsat dataset</description>
358+
<description>Use the FIPS endpoint for the NOAA dataset</description>
360359
</property>
361360
```
362361

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ If it was deployed unbonded, the DT Binding is asked to create a new DT.
188188

189189
It is up to the binding what it includes in the token identifier, and how it obtains them.
190190
This new token identifier is included in a token which has a "canonical service name" of
191-
the URI of the filesystem (e.g "s3a://landsat-pds").
191+
the URI of the filesystem (e.g "s3a://noaa-isd-pds").
192192

193193
The issued/reissued token identifier can be marshalled and reused.
194194

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -481,8 +481,8 @@ This will fetch the token and save it to the named file (here, `tokens.bin`),
481481
even if Kerberos is disabled.
482482

483483
```bash
484-
# Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin
485-
$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin
484+
# Fetch a token for the AWS noaa-isd-pds bucket and save it to tokens.bin
485+
$ hdfs fetchdt --webservice s3a://noaa-isd-pds/ tokens.bin
486486
```
487487

488488
If the command fails with `ERROR: Failed to fetch token` it means the
@@ -498,11 +498,11 @@ host on which it was created.
498498
```bash
499499
$ bin/hdfs fetchdt --print tokens.bin
500500

501-
Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds;
501+
Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://noaa-isd-pds;
502502
timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3};
503503
Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.};
504504
Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid))
505-
for s3a://landsat-pds
505+
for s3a://noaa-isd-pds
506506
```
507507
The "(valid)" annotation means that the AWS credentials are considered "valid":
508508
there is both a username and a secret.
@@ -513,11 +513,11 @@ If delegation support is enabled, it also prints the current
513513
hadoop security level.
514514

515515
```bash
516-
$ hadoop s3guard bucket-info s3a://landsat-pds/
516+
$ hadoop s3guard bucket-info s3a://noaa-isd-pds/
517517

518-
Filesystem s3a://landsat-pds
518+
Filesystem s3a://noaa-isd-pds
519519
Location: us-west-2
520-
Filesystem s3a://landsat-pds is not using S3Guard
520+
Filesystem s3a://noaa-isd-pds is not using S3Guard
521521
The "magic" committer is not supported
522522

523523
S3A Client

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -314,9 +314,8 @@ All releases of Hadoop which have been updated to be marker aware will support t
314314
Example: `s3guard bucket-info -markers aware` on a compatible release.
315315

316316
```
317-
> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
318-
Filesystem s3a://landsat-pds
319-
Location: us-west-2
317+
> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
318+
Filesystem s3a://noaa-isd-pds
320319
321320
...
322321
@@ -326,13 +325,14 @@ Directory Markers
326325
Authoritative paths: fs.s3a.authoritative.path=
327326
The S3A connector is compatible with buckets where directory markers are not deleted
328327
328+
...
329329
```
330330

331331
The same command will fail on older releases, because the `-markers` option
332332
is unknown
333333

334334
```
335-
> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
335+
> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
336336
Illegal option -markers
337337
Usage: hadoop bucket-info [OPTIONS] s3a://BUCKET
338338
provide/check information about a specific bucket
@@ -354,9 +354,8 @@ Generic options supported are:
354354
A specific policy check verifies that the connector is configured as desired
355355
356356
```
357-
> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/
358-
Filesystem s3a://landsat-pds
359-
Location: us-west-2
357+
> hadoop s3guard bucket-info -markers keep s3a://noaa-isd-pds/
358+
Filesystem s3a://noaa-isd-pds
360359

361360
...
362361

@@ -371,9 +370,8 @@ When probing for a specific policy, the error code "46" is returned if the activ
371370
does not match that requested:
372371
373372
```
374-
> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/
375-
Filesystem s3a://landsat-pds
376-
Location: us-west-2
373+
> hadoop s3guard bucket-info -markers delete s3a://noaa-isd-pds/
374+
Filesystem s3a://noaa-isd-pds
377375

378376
S3A Client
379377
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
@@ -398,7 +396,7 @@ Directory Markers
398396
Authoritative paths: fs.s3a.authoritative.path=
399397

400398
2021-11-22 16:03:59,175 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210))
401-
-Exiting with status 46: 46: Bucket s3a://landsat-pds: required marker polic is
399+
-Exiting with status 46: 46: Bucket s3a://noaa-isd-pds: required marker polic is
402400
"keep" but actual policy is "delete"
403401

404402
```
@@ -450,10 +448,10 @@ Audit the path and fail if any markers were found.
450448
451449
452450
```
453-
> hadoop s3guard markers -limit 8000 -audit s3a://landsat-pds/
451+
> hadoop s3guard markers -limit 8000 -audit s3a://noaa-isd-pds/
454452
455-
The directory marker policy of s3a://landsat-pds is "Keep"
456-
2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://landsat-pds/
453+
The directory marker policy of s3a://noaa-isd-pds is "Keep"
454+
2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://noaa-isd-pds/
457455
Scanned 1,000 objects
458456
Scanned 2,000 objects
459457
Scanned 3,000 objects
@@ -463,8 +461,8 @@ Scanned 6,000 objects
463461
Scanned 7,000 objects
464462
Scanned 8,000 objects
465463
Limit of scan reached - 8,000 objects
466-
2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://landsat-pds/: duration 0:05.107s
467-
No surplus directory markers were found under s3a://landsat-pds/
464+
2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://noaa-isd-pds/: duration 0:05.107s
465+
No surplus directory markers were found under s3a://noaa-isd-pds/
468466
Listing limit reached before completing the scan
469467
2020-08-05 13:43:01,187 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 3:
470468
```

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -616,15 +616,14 @@ header.x-amz-version-id="KcDOVmznIagWx3gP1HlDqcZvm1mFWZ2a"
616616
A file with no-encryption (on a bucket without versioning but with intelligent tiering):
617617

618618
```
619-
bin/hadoop fs -getfattr -d s3a://landsat-pds/scene_list.gz
619+
bin/hadoop fs -getfattr -d s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
620620
621-
# file: s3a://landsat-pds/scene_list.gz
622-
header.Content-Length="45603307"
623-
header.Content-Type="application/octet-stream"
624-
header.ETag="39c34d489777a595b36d0af5726007db"
625-
header.Last-Modified="Wed Aug 29 01:45:15 BST 2018"
626-
header.x-amz-storage-class="INTELLIGENT_TIERING"
627-
header.x-amz-version-id="null"
621+
# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
622+
header.Content-Length="524671"
623+
header.Content-Type="binary/octet-stream"
624+
header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
625+
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
626+
header.x-amz-server-side-encryption="AES256"
628627
```
629628

630629
###<a name="changing-encryption"></a> Use `rename()` to encrypt files with new keys

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,7 @@ explicitly opened up for broader access.
503503
```bash
504504
hadoop fs -ls \
505505
-D fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider \
506-
s3a://landsat-pds/
506+
s3a://noaa-isd-pds/
507507
```
508508

509509
1. Allowing anonymous access to an S3 bucket compromises
@@ -1606,11 +1606,11 @@ a session key:
16061606
</property>
16071607
```
16081608

1609-
Finally, the public `s3a://landsat-pds/` bucket can be accessed anonymously:
1609+
Finally, the public `s3a://noaa-isd-pds/` bucket can be accessed anonymously:
16101610

16111611
```xml
16121612
<property>
1613-
<name>fs.s3a.bucket.landsat-pds.aws.credentials.provider</name>
1613+
<name>fs.s3a.bucket.noaa-isd-pds.aws.credentials.provider</name>
16141614
<value>org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</value>
16151615
</property>
16161616
```

0 commit comments

Comments
 (0)