Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: allow proxying multiple S3 endpoints #3

Open
pvbouwel opened this issue Nov 10, 2024 · 1 comment
Open

feature: allow proxying multiple S3 endpoints #3

pvbouwel opened this issue Nov 10, 2024 · 1 comment

Comments

@pvbouwel
Copy link
Contributor

Currently the proxy always targets a single endpoint.

One use case of fakes3pp is to provide access to a backend without the need to expose credentials for the S3 backend.
Only allowing to proxy a single endpoint is a limitation to this use case.

It is possible to generalize the design a bit and to not specify a a single target but multiple targets. There are multiple solutions to come to such a design and 1 important aspect of it is backend selection. Possible options:

  • expand S3 API with another header/query argument to allow backend selection
    • Causes issues using existing SDKs and tooling
  • Allow configuring a backend based on bucket name (or bucket name regex)
    • A lot of configuration proxy-side
    • no way to deal with duplicate bucket names (bucket names are 'globally' unique but that only upholds for a single S3 stack)
  • Use existing flag of S3 API to control wich endpoint to use
    • a flag with low cardinality and 1-to-1 relationship with endpoint is preferred

At time of writing the best option seems to use the region parameter for this.
Generally different S3 providers have different regions. Some examples:

  • AWS
    • eu-west-1
    • us-west-2
    • ap-northeast-1
    • ...
  • CloudFerro Cloud
  • WAW3-1
  • WAW3-2
  • WAW4-1
  • FRA1-2
  • CF2
  • OpenTelecom Cloud
    • eu-nl
    • eu-de
  • OVH cloud
    • gra
    • rbx
    • sbg
    • de
    • uk
    • waw
    • bhs
    • ca-east-tor
    • sgp
    • ap-southeast-syd
    • ap-shouth-mum

So by just using the region name it would be possible to chose which backend to use. Offcourse this list of cloud providers is not exhaustive and there might be cloud providers who'll have region names that conflicts (even in the list above it is already clear that some region names are close together) so this won´t be a silver bullet but if such a conflict arrives it should be possible to support it by supporting mappings of region names. That would make the design slightly more complex but allows to avoid any collision issues.

The scope of this issue is providing support for multiple backends that do not have conflicting region names.

@pvbouwel
Copy link
Contributor Author

One caveat on the above approach would be presigned hmacv1query URLs because they do not specify a region. For a sigv4 query url the X-Amz-Credential parameter would still convey region information but for hmacv1 queryies there is no such parameter available. While this is unfortunate having sigv4 query urls available is likely a good enough workaround.

Theoretically the proxy could look for the bucket in the backends but as that would trigger additional API calls and introduce extra latency that really does not seem to be acceptable. Given that support for multiple S3 backends is an optional feature it could make sense to still support the use case of presigned hmacv1query URLs.

One way to do this is to have a default backend defined that way if there is only 1 S3 backend it will keep on working as expected. When multiple S3 backends are available it could lead to broken hmacv1 URLs if the default backend is not the one for which the pre-signed URL was generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant