-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paths as URIs #243
base: main
Are you sure you want to change the base?
Paths as URIs #243
Conversation
What should we do with the paths in kerchunk references? Are they are always meant as absolute? I guess we should assume they are absolute, unless they have |
They are always meant "as interpreted by the target filesystem". The nature of that filesystem might be implied by the protocol of a path alone, but commonly additional arguments are also required. This means, that relative paths do work if the target happens to be the local filesystem (file://), but I think of the other filesystems, only ssh supports this concept at all. I would not expect this to be meaningful for basically any practical case. Note that the dir:// filesystem adds prefixes to URLs for any filesystem, if that's useful at all. |
(I am happy to require absolute paths even if it makes some tests slightly more verbose) |
Thanks @martindurant !
But is the nature of the filesystem explicitly recorded in the kerchunk references format anywhere? Obviously if the prefix is explicit (e.g.
Would this approach work then?
This might be helpful if the above approach doesn't work. |
No. The original intention was to have these in the "templates", but in practice, the remote_protocol, remote_options and fss arguments to ReferenceFileSystem are used (and often encoded in Intake prescriptions) in cases of ambiguity. |
Okay important question: Are we trying to support manifests with http URLs in them? Right now we have some tests which create virtual datasets containing
But these tests (cc @scottyhq ) doesn't actually try to read the data back as loadable xarray variables. AFAIK Icechunk could not read data from a manifest containing http URLs (cc @mpiannucci ), but fsspec presumably could? Handling this case in the manifest validation is possible but a bit annoying as cloudpathlib doesn't support http paths (drivendataorg/cloudpathlib#455 - at least not yet drivendataorg/cloudpathlib#468), and pathlib will incorrectly conclude that the http url is a relative posix path. |
It's planned for Icechunk to support HTTP apparently.
I added support for HTTP in fefab90 |
If I'm following correctly, I'd say yes! There are lots of datasets out there that are not in cloud buckets, but are on servers that support http range requests. Agreed that it would be nice if cloudpathlib handled http:// paths |
This PR closes #242 at the data model level - all paths are coerced to absolute URIs (i.e.
file:///directory/test.nc
ors3://bucket/test.nc
) as they go into theManifest
.As this forbids constructing manifests using relative paths, it requires minor changes to many tests (e.g.
test.nc
->/test.nc
). It also will require slightly more invasive changes to any tests that involve kerchunk references.docs/releases.rst
api.rst
Sub-tasks:
.rename_paths
method automaticallyfs_root
option internallyreader_kwargs
to all backends (see Add reader_kwargs argument to open_virtual_dataset #315)reader_options
(rename tofsspec_kwargs
?) (also see Add reader_kwargs argument to open_virtual_dataset #315)fs_root
forkerchunk
anddmrpp
readers as an option toreader_kwargs
(requires Add reader_kwargs argument to open_virtual_dataset #315)fs_root
kerchunk
reader (requires Refactor kerchunk reader tests to call open_virtual_dataset #317)dmrpp
readerfs_root
forkerchunk
anddmrpp
readersfs_root
automatically in other readers