Skip to content

Commit f76842b

Browse files
authored
Revise README for Azure Blob and Data Lake Storage (#514)
1 parent 9b9437f commit f76842b

File tree

2 files changed

+11
-24
lines changed

2 files changed

+11
-24
lines changed

README.md

Lines changed: 10 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Filesystem interface to Azure-Datalake Gen1 and Gen2 Storage
1+
Filesystem interface to Azure Blob and Data Lake Storage (Gen2)
22
------------------------------------------------------------
33

44

@@ -16,20 +16,9 @@ or
1616

1717
`conda install -c conda-forge adlfs`
1818

19-
The `adl://` and `abfs://` protocols are included in fsspec's known_implementations registry
20-
in fsspec > 0.6.1, otherwise users must explicitly inform fsspec about the supported adlfs protocols.
19+
The `az://` and `abfs://` protocols are included in fsspec's known_implementations registry.
2120

22-
To use the Gen1 filesystem:
23-
24-
```python
25-
import dask.dataframe as dd
26-
27-
storage_options={'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}
28-
29-
dd.read_csv('adl://{STORE_NAME}/{FOLDER}/*.csv', storage_options=storage_options)
30-
```
31-
32-
To use the Gen2 filesystem you can use the protocol `abfs` or `az`:
21+
To connect to Azure Blob Storage or Azure Data Lake Storage (ADLS) Gen2 filesystem you can use the protocol `abfs` or `az`:
3322

3423
```python
3524
import dask.dataframe as dd
@@ -41,6 +30,7 @@ ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage
4130

4231
Accepted protocol / uri formats include:
4332
'PROTOCOL://container/path-part/file'
33+
'PROTOCOL://[email protected]/path-part/file'
4434
'PROTOCOL://[email protected]/path-part/file'
4535

4636
or optionally, if AZURE_STORAGE_ACCOUNT_NAME and an AZURE_STORAGE_<CREDENTIAL> is
@@ -58,15 +48,9 @@ ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', stora
5848

5949
Details
6050
-------
61-
The package includes pythonic filesystem implementations for both
62-
Azure Datalake Gen1 and Azure Datalake Gen2, that facilitate
63-
interactions between both Azure Datalake implementations and Dask. This is done leveraging the
64-
[intake/filesystem_spec](https://github.com/intake/filesystem_spec/tree/master/fsspec) base class and Azure Python SDKs.
51+
The package includes pythonic filesystem implementations for both [Azure Blobs](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-overview) and [Azure Datalake Gen2 (ADLS)](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction), that facilitate interactions between these implementations and Dask. This is done leveraging the [fsspec/filesystem_spec](https://github.com/fsspec/filesystem_spec) base class and Azure Python SDKs.
6552

66-
Operations against both Gen1 Datalake currently only work with an Azure ServicePrincipal
67-
with suitable credentials to perform operations on the resources of choice.
68-
69-
Operations against the Gen2 Datalake are implemented by leveraging [Azure Blob Storage Python SDK](https://github.com/Azure/azure-sdk-for-python).
53+
Operations against Azure Blobs and ADLS Gen2 are implemented by leveraging [Azure Blob Storage Python SDK](https://github.com/Azure/azure-sdk-for-python).
7054

7155
### Setting credentials
7256
The `storage_options` can be instantiated with a variety of keyword arguments depending on the filesystem. The most commonly used arguments are:
@@ -81,7 +65,7 @@ The `storage_options` can be instantiated with a variety of keyword arguments de
8165
anonymous access will not be attempted. Otherwise the value for `anon` resolves to True.
8266
- `location_mode`: valid values are "primary" or "secondary" and apply to RA-GRS accounts
8367

84-
For more argument details see all arguments for [`AzureBlobFileSystem` here](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/adlfs/spec.py#L328) and [`AzureDatalakeFileSystem` here](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/adlfs/spec.py#L69).
68+
For more argument details see all arguments for [`AzureBlobFileSystem` here](https://fsspec.github.io/adlfs/api/#adlfs.AzureBlobFileSystem)
8569

8670
The following environmental variables can also be set and picked up for authentication:
8771
- "AZURE_STORAGE_CONNECTION_STRING"
@@ -102,3 +86,6 @@ The filesystem can be instantiated for different use cases based on a variety of
10286
The `AzureBlobFileSystem` accepts [all of the Async BlobServiceClient arguments](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python).
10387

10488
By default, write operations create BlockBlobs in Azure, which, once written can not be appended. It is possible to create an AppendBlob using `mode="ab"` when creating and operating on blobs. Currently, AppendBlobs are not available if hierarchical namespaces are enabled.
89+
90+
### Older versions
91+
ADLS Gen1 filesystem has officially been [retired](https://learn.microsoft.com/en-us/lifecycle/products/azure-data-lake-storage-gen1). Hence the adl:// method, which was designed to connect to ADLS Gen1 is obsolete.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ write_to = "adlfs/_version.py"
77

88
[project]
99
name = "adlfs"
10-
description = "Access Azure Datalake Gen1 with fsspec and dask"
10+
description = "Access Azure Blobs and Data Lake Storage (ADLS) Gen2 with fsspec and dask"
1111
readme = "README.md"
1212
license = {text = "BSD"}
1313
maintainers = [{ name = "Greg Hayes", email = "[email protected]"}]

0 commit comments

Comments
 (0)