Draft of `FsspecUrlOperations` #215

mih · 2023-01-13T13:40:57Z

For now this is just covering downloads and has an implementation for authentication against S3 endpoints.

The included code enables downloads of selected archived content, without having to
download the entire archive first.

For ZIP/TAR archives, and Github projects this is hooked into AnyUrlOperations and
thereby accessible via the datalad download command.

Demo:

❯ datalad download 'zip://datalad-datalad-cddbe22/requirements-devel.txt::https://zenodo.org/record/7497306/files/datalad/datalad-0.18.0.zip?download=1 -'
 # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided
 # Since we use requirements.txt ATM only for development IMHO it is ok but
 # we need to figure out/complaint to pip folks
 -e .[devel]

❯ datalad download 'tar://datalad-0.18.0/requirements-devel.txt::https://files.pythonhosted.org/packages/dd/5e/9be11886ef4c3c64e78a8cdc3f9ac3f27d2dac403a6337d5685cd5686770/datalad-0.18.0.tar.gz -'
 # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided
 # Since we use requirements.txt ATM only for development IMHO it is ok but
 # we need to figure out/complaint to pip folks
 -e .[devel]

As demo'ed in the code, dependening on the capabilities of the
particular filesystem abstraction it needs custom handling of the
actual download process after open() was called.

Closes #210
Closes #179
Closes #217

TODO

Come up with some meaningful suits of extra requirements
Under some circumstances boto wants to authenticate even if a bucket is public. Investigate and possibly switch to starting with anon=True. However, that would be expensive, because it would prevent automatically using a session token provided via the environment. If that is needed, we should add additional logic.
Check S3 versioned access. datalad download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz?versionId=+WMIWSpgtnESd8J2k.BfgJ3Xo7qpQ1Kjm demo.nii.gz' works, but it would be good to have a testcase for accessing a non-recent version of a file. Setting version_aware=False prevents handling about URLs with version tags. What needs testing is:
- whether not turning it on specifically would make version tags in URLs get ignored (leading to wrong outcomes)
- whether not turning it off specifically leads to needless requests
  For an openneuro URL a version tag is reported regardless of the version_aware setting and regardless of whether the requesting URL had one.
verify that anon-access can be turned off when explicitly provisioning a credential
Investigate speed differences: download-url from S3 seems to be 3x faster than download at times.
In turns out to be mostly a question of how data are read from the "file" pointer. The default was/is to simply iterate over the file object. This leaves the decision making to fsspec. However, not all filesystem implementations support this iteration anyways. Switching to reading direct chunks (of a size declared explicitly from the outside) remove the slowdown entirely (and actually makes the implementation about 15% faster than the (default behavior) of the downloader in datalad-core).
Add error handling in download() when a remote URL is already known not to be around from stat'ing

codecov · 2023-01-17T13:01:06Z

Codecov Report

Patch coverage: 90.22% and project coverage change: +1.07 🎉

Comparison is base (0cb44b0) 90.70% compared to head (9a68f85) 91.77%.

❗ Current head 9a68f85 differs from pull request most recent head d0e3689. Consider uploading reports for the commit d0e3689 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #215      +/-   ##
==========================================
+ Coverage   90.70%   91.77%   +1.07%     
==========================================
  Files         122       87      -35     
  Lines        9100     7664    -1436     
==========================================
- Hits         8254     7034    -1220     
+ Misses        846      630     -216

Impacted Files	Coverage Δ
datalad_next/url_operations/file.py	`92.92% <0.00%> (+5.17%)`	⬆️
datalad_next/url_operations/fsspec_s3.py	`80.00% <80.00%> (ø)`
datalad_next/url_operations/fsspec.py	`90.90% <90.90%> (ø)`
datalad_next/url_operations/tests/test_fsspec.py	`98.30% <98.30%> (ø)`
datalad_next/url_operations/any.py	`73.62% <100.00%> (-12.90%)`	⬇️

... and 82 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

mih · 2023-01-18T15:43:21Z

Using the new config-based handler definitions I took a look at performance and which lever can impact it. While performance can be impacted, the underlying mechanism remain elusive to me.

The below uses features introduced with 42148e0 -- see the commit message for infos.

All examples download a 17.3 MB file from S3 in north-america to my laptop in Germany.

Here is the reference: download-url from core.

❯ rm demo.nii.gz ; time datalad download-url s3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz -O demo.nii.gz --nosave
-> datalad download-url  -O demo.nii.gz --nosave  0.36s user 0.12s system 4% cpu 9.991 total

Smooth progress reporting, 10s.

Now download via fsspec:

❯ rm demo.nii.gz; time datalad \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.class=datalad_next.url_operations.fsspec.FsspecUrlOperations' \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.kwargs={"fs_kwargs": {"s3": {"anon": true}}}' \
  download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz demo.nii.gz'
datalad -c  -c  download   0.68s user 0.17s system 4% cpu 19.657 total

All on default, same anonymous access that download-url is doing. Chopping progress (every 5MB, seems to match the default cache blocksize of fsspec for S3), 20s.

Make it download everything at once by increasing the cache size beyond the file size:

❯ rm demo.nii.gz; time datalad \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.class=datalad_next.url_operations.fsspec.FsspecUrlOperations' \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.kwargs={"fs_kwargs": {"s3": {"anon": true, "default_block_size": 20000000, "default_cache_type": "readahead"}}}' \
  download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz demo.nii.gz'
datalad -c  -c  download   0.62s user 0.13s system 8% cpu 8.456 total

No meaningful progress reporting, less than 10s.

Now turning of fsspec's readahead cache (which should not do anything for a complete download, and instead use the same 20MB block size to read everything at once:

❯ rm demo.nii.gz; time datalad \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.class=datalad_next.url_operations.fsspec.FsspecUrlOperations' \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.kwargs={"block_size": 20000000, "fs_kwargs": {"s3": {"anon": true, "default_cache_type": "none"}}}' \
  download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz demo.nii.gz'
datalad -c  -c  download   0.61s user 0.15s system 3% cpu 23.602 total

Same progress behavior as before, more than twice the runtime.

Again no caching, but 0.5M chunksize for meaningful progress reporting (every half MB)

❯ rm demo.nii.gz; time datalad \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.class=datalad_next.url_operations.fsspec.FsspecUrlOperations' \
  -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.kwargs={"block_size": 500000, "fs_kwargs": {"s3": {"anon": true, "default_cache_type": "none"}}}' \
  download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz demo.nii.gz'
datalad -c  -c  download   0.62s user 0.07s system 2% cpu 25.003 total

Proper progress reporting, more or less the same runtime of ~25s. So the per chunk processing in the handler does not add much, but it still takes 2.5x longer than with the downloader from datalad-core

For now just downloads via unauthenticated connections. The included code enables downloads of selected archived content, without having to download the entire archive first. See datalad/datalad#373 For ZIP and TAR archives this is hooked into `AnyUrlOperations` and thereby accessible via the `datalad download` command. Demo: ```sh ❯ datalad download 'zip://datalad-datalad-cddbe22/requirements-devel.txt::https://zenodo.org/record/7497306/files/datalad/datalad-0.18.0.zip?download=1 -' # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided # Since we use requirements.txt ATM only for development IMHO it is ok but # we need to figure out/complaint to pip folks -e .[devel] ❯ datalad download 'tar://datalad-0.18.0/requirements-devel.txt::https://files.pythonhosted.org/packages/dd/5e/9be11886ef4c3c64e78a8cdc3f9ac3f27d2dac403a6337d5685cd5686770/datalad-0.18.0.tar.gz -' # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided # Since we use requirements.txt ATM only for development IMHO it is ok but # we need to figure out/complaint to pip folks -e .[devel] ``` As demo'ed in the code, dependening on the capabilities of the particular filesystem abstraction it needs custom handling of the actual download process after `open()` was called.

This includes, but is not limited to, being able to specify whether or not anonymous access should be attempted (first). This change paves the way for endpoint customizations and anything else that FSSPEC exposes. Moreover, when an explicit `credential` identifier is given, the boto-based attempt to locate credentials or try anonymous access first is skipped, and a credential is looked up and provisioned immediately.

This facilitates exploitation of the numerous filesystem specific features provided by FSSPEC.

This changes demos the utility of this feature for the FSSPEC based access to S3 resources. By default anonymous access is attempted, but additional handlers could change this behavior (for particular S3 targets).

Requires particular permissions and comes with a potential performance penalty.

They are mostly needed for the tests (and more were needed), and we want to keep the actually core-dependencies small.

See docs inside. At present it is unclear how relevant this will be in practice. However, different filesystem caching settings impact performance substantially (empricial observation). If caching is turned off entirely, this parameter is the only way to specify chunk sizes (for reading). Hence I consider it a useful thing to have in general -- and it is cheap. This change also flips the default download method from iteration over a file pointer to reading chunks of a specific size. Ths has been found to be more performant in some cases.

Such an explicit option conflicts with processing of S3 URL that specify a version explicitly.

Expand test coverage too

adswa · 2023-05-02T12:47:13Z

setup.cfg

@@ -36,6 +36,9 @@ devel =
 httpsupport =
    requests
    requests_toolbelt
+remotefs =
+    fsspec
+    requests


I discovered in #223 that a dependency missing on my system was aiohttp in addition to those listed here

When trying S3 URLs, I was missing the s3fs dependency. However, the reporting for this was very nice and user-friendly:

datalad -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.class=datalad_next.url_operations.fsspec.FsspecUrlOperations' -c 'datalad.url-handler.(^|.*::)s3://openneuro.org.kwargs={"fs_kwargs": {"s3": {"anon": true}}}' download 's3://openneuro.org/ds004393/sub-26/func/sub-26_task-ImAcq_acq-task_run-05_bold.nii.gz demo.nii.gz' download(error): demo.nii.gz [download failure] [Install s3fs to access S3]

This error comes from fsspec directly: https://github.com/fsspec/filesystem_spec/blob/e180aa859ef081215882b2d1b67d4bc33c040330/fsspec/registry.py#L114.

Interestingly, this mapping of protocols to dependencies and errors also exists for HTTP (https://github.com/fsspec/filesystem_spec/blob/e180aa859ef081215882b2d1b67d4bc33c040330/fsspec/registry.py#L70), and I did see it in #223 - but only in the special remote debug output, not bubbled up. It would be nice to find out what makes the s3 code handle this error better, and mirror it in the generic fsspec code.

Ah, it might have been because #223 concerned a plain git-annex call. potentially there is already stuff in place that bubbles it up when wrapped in a datalad call...

adswa · 2023-05-03T10:42:54Z

I was looking into the open questions about versioned s3 URLs by writing a unit test based on @mslw's bucket and code snippets - thanks much!

I found that handling versioned URLs works in general. However, in case of s3, there is a problem if we configure a URL handler to be not version-aware (ops = FsspecUrlOperations(fs_kwargs={'version_aware': False})) but provide it with a versioned URL (such as s3://mslw-datalad-test0-versioned/3versions-allversioned.txt?versionId=Tro_UjqVFJfr32v5tuPfjwtOzeqYCxi2'):

The outcome is an error that reads like this:

E               datalad_next.url_operations.UrlOperationsRemoteError: UrlOperationsRemoteError for 's3://mslw-datalad-test0-versioned/3versions-allversioned.txt?versionId=Tro_UjqVFJfr32v5tuPfjwtOzeqYCxi2'

Internally, it is a dictionary update in fsspec that fails:

path = 's3://mslw-datalad-test0-versioned/3versions-allversioned.txt?versionId=Tro_UjqVFJfr32v5tuPfjwtOzeqYCxi2'
kwargs = {'version_aware': False}

    def _un_chain(path, kwargs):
        x = re.compile(".*[^a-z]+.*")  # test for non protocol-like single word
        bits = (
            [p if "://" in p or x.match(p) else p + "://" for p in path.split("::")]
            if "::" in path
            else [path]
        )
        # [[url, protocol, kwargs], ...]
        out = []
        previous_bit = None
        kwargs = kwargs.copy()
        for bit in reversed(bits):
            protocol = kwargs.pop("protocol", None) or split_protocol(bit)[0] or "file"
            cls = get_filesystem_class(protocol)
            extra_kwargs = cls._get_kwargs_from_urls(bit)
            print(extra_kwargs)
            kws = kwargs.pop(protocol, {})
            if bit is bits[0]:
                kws.update(kwargs)
>           kw = dict(**extra_kwargs, **kws)
E           TypeError: dict() got multiple values for keyword argument 'version_aware'

../../env/next/lib/python3.11/site-packages/fsspec/core.py:331: TypeError

This error results from the fact that s3fs's S3 class has a method _get_kwargs_from_urls which parses S3 URLs for version strings, and if it find it, it tries to tell fsspec to become version aware:

    def _get_kwargs_from_urls(urlpath):
        """
        When we have a urlpath that contains a ?versionId=

        Assume that we want to use version_aware mode for
        the filesystem.
        """
        url_storage_opts = infer_storage_options(urlpath)
        url_query = url_storage_opts.get("url_query")
        out = {}
        if url_query is not None:
            from urllib.parse import parse_qs

            parsed = parse_qs(url_query)
            if "versionId" in parsed:
                out["version_aware"] = True
        return out

Since we already set this key, fsspec crashes. So it seems that the case "configure version awareness to be off, but supply versioned URLs nevertheless" can't be supported. I wondering if this is something that needs documentation, or if we can do some clever handling - but parsing URLs pre-emptively for version strings seems a bit costly, IMO.

When working with S3 URLs, s3fs employs internal URL parsing to detect whether the S3 file has versioning enabled. Based on the outcome of this evaluation, it sets the version_aware value itself. If a user sets this keys' value to False in fs_kwargs of the URL handler, but then supplies a versioned URL, s3fs's attempt to set this key to True is sabotaged, and the download crashes: datalad_next.url_operations.UrlOperationsRemoteError: UrlOperationsRemoteError for 's3://mslw-datalad-test0-versioned/3versions-allversioned.txt?versionId=Tro_UjqVFJfr32v5tuPfjwtOzeqYCxi2' To leave a trace of this, even if only temporary, this commit adds a short paragraph to the URL handler's docstring to warn about setting this key.

adswa · 2023-05-03T11:33:34Z

I have for now settled on the following:

there is a unit test that confirms that version awareness is able to retrieve the file versions specified in URLs. It uses @mslw's S3 bucket, so its probably temporary, but maybe we have means to create a similar bucket under a datalad account?
I have found that setting the 'version_aware' key is generally a bad idea, as it interferes with s3fs's autodetection and setting of this key. I have added a docstring amendment to warn users about this - I think it isn't optimally phrased, and maybe also in the wrong place; please advise improvements

adswa · 2023-05-03T11:55:16Z

Pretty weird failure in crippledFS tests:


=================================== FAILURES ===================================
_____________________________ test_fsspec_download _____________________________

tmp_path = PosixPath('/crippledfs/pytest-of-runner/pytest-0/test_fsspec_download0')

    def test_fsspec_download(tmp_path):
        # test a bunch of different (chained) URLs that point to the same content
        # on different persistent storage locations
        ops = FsspecUrlOperations()
        for url in (
            # included in a ZIP archive
            'zip://datalad-datalad-cddbe22/requirements-devel.txt::https://zenodo.org/record/7497306/files/datalad/datalad-0.18.0.zip?download=1',
            # included in a TAR archive
            'tar://datalad-0.18.0/requirements-devel.txt::https://files.pythonhosted.org/packages/dd/5e/9be11886ef4c3c64e78a8cdc3f9ac3f27d2dac403a6337d5685cd5686770/datalad-0.18.0.tar.gz',
            # pushed to github
            '***0.18.0/requirements-devel.txt',
        ):
>           props = ops.download(url, tmp_path / 'dummy', hash=['md5'])

../../../../.local/lib/python3.9/site-packages/datalad_next/url_operations/tests/test_fsspec.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../../.local/lib/python3.9/site-packages/datalad_next/url_operations/fsspec.py:215: in download
    fs, urlpath, props = self._get_fs(from_url, credential=credential)
../../../../.local/lib/python3.9/site-packages/datalad_next/url_operations/fsspec.py:363: in _get_fs
    fs, urlpath, props = get_fs(
../../../../.local/lib/python3.9/site-packages/datalad_next/url_operations/fsspec.py:63: in get_fs_generic
    fs, urlpath = url_to_fs(url, **kwargs)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/core.py:375: in url_to_fs
    fs = filesystem(protocol, **inkwargs)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/registry.py:257: in filesystem
    return cls(**storage_options)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/spec.py:76: in __call__
    obj = super().__call__(*args, **kwargs)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/implementations/zip.py:54: in __init__
    fo = fsspec.open(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/core.py:439: in open
    return open_files(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <List of 0 OpenFile instances>, item = 0

    def __getitem__(self, item):
>       out = super().__getitem__(item)
E       IndexError: list index out of range

/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/fsspec/core.py:194: IndexError

Mostly to minimize the diff for conflict avoidance, after a lot of typos were fixed via datalad#315

mih · 2023-05-04T07:21:24Z

Thanks @adswa for the S3 access test. I think this is spot on!

adswa · 2023-05-04T07:25:24Z

The crippledFS failure also shows up in #223, but I can't reproduce it locally. After a bit of digging it, it looks like it is a cross-platform incompatibility between Windows and Unix systems in fsspec's zip implementation. I have filed an issue to find out more fsspec/filesystem_spec#1256

Get the updates to -core for the CI

If the props variable does not get populated during _get_fs(), access to the URL has failed for some reason. Further processing of props in download will lead to crashes. This change adds error handling to raise early and more informative

jsheunis · 2023-05-09T11:18:13Z

Some initial usage notes:

wasn't initially aware that I had to run pip install -r requirements-devel.txt in addition to a normal pip install -e ., so it took me a while to get started. but after that no issues.
can confirm pretty much the same stats as @mih reports here: Draft of FsspecUrlOperations #215 (comment) (just about 50% longer on my network)
command from @adswa works without errors: Draft of FsspecUrlOperations #215 (comment)

This is a replacement for the implementation of the `datalad-archives` remote. In addition to its predecessor, it reduces the storage overhead from 200% to 100% by doing partial extraction from fully downloaded archives. Ultimately, it will support sparse/partial access to remote archives, avoiding any storage overhead, and the requirement to unconditionally download full archives (see datalad#215). This implementation is trying to be efficient by: - trying to fulfill requests using locally present archives - trying to download smaller archives before larger ones This implementation is aiming to be extensible by: - using `ArchiveOperations` as a framework to uniformly implement (partial) access to local (or remote) archives. Support for a number of corner cases is implemented (e.g., registered availability from an archive that actually does not contain a given key), but there are presently no tests for this, yet.

mih · 2023-07-27T13:33:33Z

I think we have learned a lot here, but nobody was able to figure out the speed issues. This will need to be picked up later, or replaced by something else entirely.

Thanks to everyone who tried.

adswa · 2023-09-13T11:49:33Z

FTR, I'm linking two relevant comments to this PR:

@christian-monch found, I believe similar to what @mih found in #215 (comment), that the block size makes a different for download speeds, in this particular case, for partial 7z archives: datalad/datalad-ria#50 (comment).

In @christian-monch's explorations, a block size of 1 resulted in a general speed up in different usecases.

I tried to test this in this PR, too, rediscovered @mih's initial comment about this, and found no such general improvements for s3 downloads: datalad/datalad-ria#50 (comment)

mih force-pushed the fsspec branch 2 times, most recently from 09e0450 to 98398ad Compare January 13, 2023 14:22

mih mentioned this pull request Jan 15, 2023

Re-Implement datalad-archives special remote #185

Closed

mih force-pushed the fsspec branch from 98398ad to 0cefdbf Compare January 17, 2023 12:35

mih force-pushed the fsspec branch 2 times, most recently from f22fb06 to ad69071 Compare January 18, 2023 10:42

mih force-pushed the fsspec branch 2 times, most recently from 8f03822 to c46959c Compare January 23, 2023 12:13

mih mentioned this pull request Jan 23, 2023

Loading URL handlers from configuration #222

Merged

mih force-pushed the fsspec branch from c46959c to 1b65da5 Compare January 24, 2023 05:33

mih mentioned this pull request Jan 26, 2023

Rough sketch of an archivist annex remote #223

Closed

mih added 14 commits February 24, 2023 09:08

Basic smoke testing for fsspec and support github://

40dd829

More fsspec support, in particular credential provisioning for s3://

74bbdce

Let FsspecUrlOperations take customizing arguments for url_to_fs()

ce67f4c

This facilitates exploitation of the numerous filesystem specific features provided by FSSPEC.

Update AnyUrlOperations's handler registry to include kwargs

f34df7d

This changes demos the utility of this feature for the FSSPEC based access to S3 resources. By default anonymous access is attempted, but additional handlers could change this behavior (for particular S3 targets).

Turn off version-awareness in the default S3 handler

72d350d

Requires particular permissions and comes with a potential performance penalty.

Avoid necessity to account for (un)chained URLs in config

aa92043

Fix undefined variable access

36d14e1

Support non-AWS S3 endpoints for credential lookup

d22f6e1

Streamline and complete dependencies

2366720

They are mostly needed for the tests (and more were needed), and we want to keep the actually core-dependencies small.

Revert turning off version awareness for S3

a74b823

Such an explicit option conflicts with processing of S3 URL that specify a version explicitly.

Rough approximations or upload and delete for FsspecUrlOperations

2d6b80c

Expand test coverage too

mih force-pushed the fsspec branch from 93e4431 to 2d6b80c Compare February 24, 2023 08:08

adswa reviewed May 2, 2023

View reviewed changes

Merge branch 'main' into fsspec

6f3c678

adswa added 2 commits May 3, 2023 13:24

TST: Add a test for S3 version handling

a861d68

mih added 2 commits May 4, 2023 08:15

Merge remote-tracking branch 'origin/main' into fsspec

54da154

Mostly to minimize the diff for conflict avoidance, after a lot of typos were fixed via datalad#315

Fix typos

68b34b2

mih and others added 6 commits May 4, 2023 12:00

Merge remote-tracking branch 'origin/main' into fsspec

375f7b1

Get the updates to -core for the CI

Protect keyring from modification in test

edc1a34

Merge remote-tracking branch 'origin/main' into fsspec

4a74ad2

Add error handling for inaccessible URLs

b12824c

If the props variable does not get populated during _get_fs(), access to the URL has failed for some reason. Further processing of props in download will lead to crashes. This change adds error handling to raise early and more informative

TST: Add a test case for error handling in download

d7d3f93

Merge branch 'fsspec' of github.com:mih/datalad-next into fsspec

8dd3ef8

adswa force-pushed the fsspec branch 3 times, most recently from a471406 to 80038f6 Compare May 5, 2023 13:45

Bump windows tests to Python 3.10

9a68f85

adswa force-pushed the fsspec branch from 80038f6 to 9a68f85 Compare May 5, 2023 13:56

mih mentioned this pull request May 12, 2023

Implement alternative to add-archive-content #183

Closed

mih mentioned this pull request May 29, 2023

archivist git-annex special remote #380

Merged

7 tasks

Merge branch 'main' into fsspec

d0e3689

mih closed this Jul 27, 2023

adswa mentioned this pull request Sep 13, 2023

Support partial archive.7z access via HTTP without server side customizations datalad/datalad-ria#50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft of `FsspecUrlOperations` #215

Draft of `FsspecUrlOperations` #215

mih commented Jan 13, 2023 •

edited by adswa

Loading

codecov bot commented Jan 17, 2023 •

edited

Loading

mih commented Jan 18, 2023

adswa May 2, 2023

adswa May 3, 2023

adswa May 3, 2023

adswa commented May 3, 2023

adswa commented May 3, 2023

adswa commented May 3, 2023

mih commented May 4, 2023

adswa commented May 4, 2023 •

edited

Loading

jsheunis commented May 9, 2023

mih commented Jul 27, 2023

adswa commented Sep 13, 2023

Draft of FsspecUrlOperations #215

Draft of FsspecUrlOperations #215

Conversation

mih commented Jan 13, 2023 • edited by adswa Loading

codecov bot commented Jan 17, 2023 • edited Loading

Codecov Report

mih commented Jan 18, 2023

adswa May 2, 2023

Choose a reason for hiding this comment

adswa May 3, 2023

Choose a reason for hiding this comment

adswa May 3, 2023

Choose a reason for hiding this comment

adswa commented May 3, 2023

adswa commented May 3, 2023

adswa commented May 3, 2023

mih commented May 4, 2023

adswa commented May 4, 2023 • edited Loading

jsheunis commented May 9, 2023

mih commented Jul 27, 2023

adswa commented Sep 13, 2023

Draft of `FsspecUrlOperations` #215

Draft of `FsspecUrlOperations` #215

mih commented Jan 13, 2023 •

edited by adswa

Loading

codecov bot commented Jan 17, 2023 •

edited

Loading

adswa commented May 4, 2023 •

edited

Loading