Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding reader_options kwargs to open_virtual_dataset. #67

Merged
merged 37 commits into from
May 14, 2024
Merged
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
4c6cb63
adding reader_options kwargs to open_virtual_dataset
norlandrhagen Mar 29, 2024
adf311a
Merge branch 'main' into reader_options
TomNicholas Apr 30, 2024
ba5ac6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 30, 2024
ea30914
fix typing
TomNicholas Apr 30, 2024
448800b
modifies _automatically_determine_filetype to open file with fsspec t…
norlandrhagen May 1, 2024
8c5dff7
using UPath to get file protocol and open with fsspec
norlandrhagen May 1, 2024
6cd77ce
tests passing locally. Reading over s3/local w+w/o indexes & guessing…
norlandrhagen May 2, 2024
f0daafe
merge w/ main
norlandrhagen May 2, 2024
ed3d0f4
add s3fs to test
norlandrhagen May 2, 2024
beec724
typing school 101
norlandrhagen May 2, 2024
e669841
anon
norlandrhagen May 2, 2024
09f89a6
tying
norlandrhagen May 2, 2024
e4db860
test_anon update
norlandrhagen May 2, 2024
ba8b1e3
anon failing
norlandrhagen May 2, 2024
b12d32c
double down on storage_options
norlandrhagen May 2, 2024
f9478b9
fsspec nit
norlandrhagen May 3, 2024
6958b59
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 3, 2024
aefa22d
seting s3 defaults as empty to try to appease the cruel boto3 gods
norlandrhagen May 3, 2024
464ffd3
merge
norlandrhagen May 3, 2024
d108978
added fpath to SingleHDF5ToZarr
norlandrhagen May 3, 2024
5cc5ecd
hardcode in empty storage opts for s3
norlandrhagen May 3, 2024
3509a1f
hardcode default + unpack test
norlandrhagen May 3, 2024
80cf22b
changed reader_options defaults
norlandrhagen May 3, 2024
a3fc72e
Merge branch 'main' into reader_options
norlandrhagen May 3, 2024
0235f51
updated docs install
norlandrhagen May 3, 2024
1e9e2fe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 3, 2024
55031f9
changed docstring type in utils to numpy style
norlandrhagen May 6, 2024
6a3d7be
added TYPE_CHECKING for fsspec and s3fs mypy type hints
norlandrhagen May 8, 2024
5aec9db
merged w/ main and lint
norlandrhagen May 8, 2024
83b3c4b
fixed TYPE_CHECKING import
norlandrhagen May 8, 2024
a143cf4
pinned xarray to latest commit on github
norlandrhagen May 9, 2024
9d124ef
merged w/ main to pin xarray and kerchunk
norlandrhagen May 13, 2024
3a29b41
re-add upath
norlandrhagen May 13, 2024
b9c056a
Merge branch 'main' into reader_options
norlandrhagen May 13, 2024
13fc295
merged w/ main
norlandrhagen May 14, 2024
4f766d9
ådds section to usage
norlandrhagen May 14, 2024
e6f047f
Minor formatting nit of code example in docs
TomNicholas May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ vds = open_virtual_dataset('air.nc')

(Notice we did not have to explicitly indicate the file format, as {py:func}`open_virtual_dataset <virtualizarr.xarray.open_virtual_dataset>` will attempt to automatically infer it.)


```{note}
In future we would like for it to be possible to just use `xr.open_dataset`, e.g.

Expand Down Expand Up @@ -61,6 +62,17 @@ Attributes:

These {py:class}`ManifestArray <virtualizarr.manifests.ManifestArray>` objects are each a virtual reference to some data in the `air.nc` netCDF file, with the references stored in the form of "Chunk Manifests".

### Opening remote files

To open remote files as virtual datasets pass the `reader_options` options, e.g.

```python

aws_credentials = {"key": "", "secret": ""}
vds = open_virtual_dataset("s3://fake-bucket/file.nc", reader_options={'storage_options':aws_credentials})

TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
```

## Chunk Manifests

In the Zarr model N-dimensional arrays are stored as a series of compressed chunks, each labelled by a chunk key which indicates its position in the array. Whilst conventionally each of these Zarr chunks are a separate compressed binary file stored within a Zarr Store, there is no reason why these chunks could not actually already exist as part of another file (e.g. a netCDF file), and be loaded by reading a specific byte range from this pre-existing file.
Expand Down
Loading