Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docs on accessing Azure blob storage through fsspec (#836)
Summary: ### Changes Adding an example of DataPipe usage with Azure Blob storage via `fsspec`, similar to #812. The example is placed into a new section in `docs/source/tutorial.rst` Here is the screenshot showing that code snippets in the tutorial work as expected: <img width="1569" alt="Screenshot 2022-10-18 at 19 33 49" src="https://user-images.githubusercontent.com/23200558/196503562-034162c0-6dde-4749-adc7-5e081ff2c19f.png"> #### Minor note Technically, `fsspec` [allows both path prefixes `abfs://` or `az://`](https://github.com/fsspec/adlfs/blob/f15c37a43afd87a04f01b61cd90294dd57181e1d/README.md?plain=1#L33) for Azure Blob storage Gen2 as synonyms. However, only `abfs://` works for us for the following reason: - If a path starts with `az`, the variable `fs.protocol` [here](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L82) is still `abfs` - So the condition `root.startswith(protocol)` is false, and `is_local` is true - As a result the path "doubles" in [this line](https://github.com/pytorch/data/blob/768ecdae8b56af640a78e29f82864dc4f65df371/torchdata/datapipes/iter/load/fsspec.py#L95), like on this screenshot: <img width="754" alt="Screenshot 2022-10-18 at 19 50 56" src="https://user-images.githubusercontent.com/23200558/196506965-697eb2d7-8f84-4536-972b-7081e55e1ff5.png"> This won't have any effect for the users, however, as long as they use `abfs://` prefix recommended in the tutorial Pull Request resolved: #836 Reviewed By: NivekT Differential Revision: D40483505 Pulled By: sgrigory fbshipit-source-id: f03373aa4b376af8ea2ac3480fc133067caaa0ce
- Loading branch information