Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix s3fs region access, s3 url styles #6679

Merged
merged 4 commits into from
Dec 7, 2022
Merged

Fix s3fs region access, s3 url styles #6679

merged 4 commits into from
Dec 7, 2022

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Dec 5, 2022

Two fixes for s3fs

  • Set withForceGlobalBucketAccessEnabled to true, since we do not know in advance in which s3 region the relevant buckets will be stored.
  • Fix improper handling of non-standard URI formats. Now all three variants can be accepted:
s3://webknossos-zarr/demodata/l4_sample_zarr/color.zarr/1/
s3://webknossos-zarr.s3.amazonaws.com/demodata/l4_sample_zarr/color.zarr/1/
s3://s3.amazonaws.com/webknossos-zarr/demodata/l4_sample_zarr/color.zarr/1/

(Note that these examples need credentials, talk to us for testing)

Note on Implementation: The FileSystem itself now holds bucket information IF the bucket is in the base uri (as is the case in the first two examples above). When getPath is called on that fileSystem, the bucket is prepended to the path arguments.

Note that non-standard endpoints are no longer supported due to this change. The endpoint is now always s3.amazon.com (with default region us-east-1 but globalBucketAccess). We never encountered non-standard endpoints so far anyway and this case was not properly tested before either. When we encounter them, we can still try to re-add it without breaking this existing logic.

I did not clean up the s3fs java code. We could consider converting it to scala, rewriting it with our style, and remove the parts we do not use. However, I’d suggest we wait with this since there have been discussions about a new s3fs fork with active support, which may replace our package in the future.

Steps to test:

  • Test importing, and loading data from, the above sample zarr dataset URIs

@fm3 fm3 self-assigned this Dec 5, 2022
@fm3 fm3 marked this pull request as ready for review December 6, 2022 10:11
@fm3 fm3 requested review from jstriebel and frcroth December 6, 2022 10:12
Copy link
Member

@frcroth frcroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I haven't looked too much at the Java code but it seems to work

@fm3 fm3 enabled auto-merge (squash) December 7, 2022 13:01
@fm3 fm3 merged commit c0d1477 into master Dec 7, 2022
@fm3 fm3 deleted the s3fs-fixes branch December 7, 2022 13:19
hotzenklotz added a commit that referenced this pull request Dec 12, 2022
…cing

* 'master' of github.com:scalableminds/webknossos:
  fixes infinite loop in getLineCount (#6689)
  Catch Malformed Json Exceptions (#6691)
  Implement folder-specific search for datasets (#6677)
  Fix s3fs region access, s3 url styles (#6679)
  Swagger annotation for shortLinkByKey (#6682)
  Improve layout of dashboard
  Provide valid JSON schema (#6642)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants