Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the AWS S3 library S3-compatible? #50

Closed
w0rd-driven opened this issue Jun 13, 2023 · 10 comments
Closed

Is the AWS S3 library S3-compatible? #50

w0rd-driven opened this issue Jun 13, 2023 · 10 comments
Assignees

Comments

@w0rd-driven
Copy link

I was looking to potentially load test S3-compatible APIs but I think it may be locked into AWS proper as I can't seem to configure the respective endpoint correctly.

A configuration like the following:

const awsConfig = new AWSConfig({
  region: __ENV.REGION,
  endpoint: `https://${__ENV.REGION}.digitaloceanspaces.com`,
  accessKeyId: __ENV.ACCESS_KEY_ID,
  secretAccessKey: __ENV.SECRET_ACCESS_KEY,
});

Produces output like:

WARN[0004] Request Failed                                error="Get \"https://[bucket].s3.sfo2.https://sfo2.digitaloceanspaces.com/?list-type=2&prefix=files%2FFirst_Look%2Fpremium%2Fcurriculum%2FAugust%202023\": lookup [bucket].s3.sfo2.https: no such host"

I can change the endpoint to just .digitaloceanspaces.com but that produces a URL like https://[bucket].s3.sfo2.digitaloceanspaces.com/ which is not correct. It produces an SSL certificate error as the wildcard cert doesn't apply at that extra level I believe (it would need to be ..sfo2).

Other S3-compatible stores like Minio wouldn't necessarily have the s3 prefix but if I controlled the domain, I could make the resolution work. I can't control Digital Ocean's domains and something like a CNAME would work in theory but it'd introduce an SSL problem that puts me back to square one.

I think my biggest hurdle to self-discovery is this is also bundled so I can't easily dig into the source to see if I could propose a change or find the incantation I need to alter the outbound URL. Nothing jumps out of the official documentation.

It's also possible this isn't or wouldn't be supported due to other technical reasons that I can't see.

@mstoykov mstoykov transferred this issue from grafana/jslib.k6.io Jun 14, 2023
@mstoykov
Copy link
Contributor

Hi @w0rd-driven, I am not the main maintainer, so can't really give you a 100% certain answer but:

  1. it seems like it is expected that it will have an s3
  2. endpoint definitely shouldn't have the scheme in it.

Having gone through the code and done some experiments, it seems that after you get your s3 client instance

const s3 = new S3Client(awsConfig);

you can do

s3.host = "whatever you want"

and that seems to work for my basic test. But I can't really test it fully as I will need an s3 compliant setup.

Hope this helps you for now

@w0rd-driven
Copy link
Author

I just realized I posted this on the source GH instead of https://github.com/grafana/jslib.k6.io where it was only minified 🤦‍♂️. That's the problem I was having debugging as my focus was there and somehow finally made it here.

Thank you for that. The host gets me further but I'm still seeing a S3ServiceError

ERRO[0001] S3ServiceError
        at value (webpack://k6-jslib-aws/./src/internal/s3.ts:297:16(71))
        at value (webpack://k6-jslib-aws/./src/internal/s3.ts:62:27(40))

I find it odd that line 297 is for createMultipartUpload and I'm just calling s3.listBuckets(). I understand line 62 is the signed http.get call which is what I expect but that extra part of the trace seems odd.

I have been playing with trying to sign requests manually and see if I can piece together something tangible as every manner of s3.(command) produces similar results.

When I build the sig myself I get a mix of things but the most consistent is NotImplemented or 501 which is the generic response from Spaces according to the docs: API requests for S3 functionality that is not currently supported by Spaces will receive an S3-compatible, XML-formated NotImplemented error response. at https://docs.digitalocean.com/reference/api/spaces-api/.

One wrench in this is I'm using Insomnia and the plugin https://www.npmjs.com/package/insomnia-plugin-aws-iam-v4 works with the same credentials hitting the same signedRequest.url I lifted from console.log. When I look at the headers of both they're identical save the Signature=[signature] which is to be expected. If the signature was a problem, I would get 403/SignatureDoesNotMatch which is what I get when I set applyChecksum: false.

All of this makes me feel really close. I'll go back through everything starting at zero as it could easily be something obvious in hindsight. I have verified my credentials are coming from __ENV in both s3 and signature so this isn't something that jumps out at me.

I know the runtimes aren't 1:1 here and I know this is difficult for anyone else to replicate without DO Spaces access. If their API keys didn't give access to all buckets (another 🤦) I wouldn't mind sharing it temporarily for debugging purposes. I planned on revoking access after the tests were done anyway.

@mstoykov
Copy link
Contributor

I find it odd that line 297 is for createMultipartUpload and I'm just calling s3.listBuckets(). I understand line 62 is the signed http.get call which is what I expect but that extra part of the trace seems odd.

That is because you are looking at the current master which has a lot of changes on top of the actually released v0.7.2
Where line 297 is _handleError:

throw new S3ServiceError(awsError.message, awsError.code || 'unknown', operation)

and 62 is the call to it:

this._handle_error('ListBuckets', res)

I might recommend actually trying to use the master version from the master as it might have been fixed.

Unfortunately I am not the main maintainer and I don't know why some of those changes have been put in a release and would prefer to not do that on my own ;)

@oleiade
Copy link
Member

oleiade commented Jun 26, 2023

Hey @w0rd-driven 👋🏻

Thanks for reporting the issue 🙇🏻

I want to start with a disclaimer: we do not officially support anything other than the official AWS platform and to a certain extent Localstack (but because we use it for testing purposes).

That being said, we have had some issues raised by users using either Ceph or MinIO in the past. In the case of S3 support, most of them were linked to the URL generation in one way or another. I would be keen to make the customization of the target more straightforward than it is now, as @mstoykov pointed out, we support controlling the host only at this point.

I was planning to timebox a quick test run with MinIO in the near future, but no guarantees that we will reach anything conclusive, and we probably never target officially supporting it either (we just don't have the resources to ensure that at the moment).

If you'd be keen to look into it yourself, it should be possible to put together a way to pass the S3 URL scheme in the form of closure or something along those lines. If you were to come up with a Pull Request, I would for sure allocate some time to review it 👍🏻

@w0rd-driven
Copy link
Author

Regarding the disclaimer: That's totally understood and I titled this before I really understood how this all fits together. Thank you tremendously for your help @mstoykov.

I think I figured it out through futzing with the body in Insomnia.

There is a line for sending the request with an empty body:

const res = http.request(method, signedRequest.url, signedRequest.body || '', {
    headers: signedRequest.headers,
})

If I duplicate listBuckets and change signedRequest.body || '' to signedRequest.body || null it works as expected. I didn't include the parsing logic but in the XML response, I see bucket names as I expect. I was able to verify this also works in my own version of listObjects as well.

I do see that the code is assigning an empty variable to avoid an undefined where a null value is not undefined. I wouldn't mind making a PR to change that but I don't know if it's something that would break localstack or AWS.

If anyone had any tips for how I could inject the body otherwise, I'm open to that approach but I don't think it's possible. Maybe I could get by with duping SignatureV4 where I set request.body = null instead of '' but I don't believe that would work because signedRequest.body || '' would treat null as false and set it to '' everywhere. I'm trying to avoid fully duplicating listBuckets and listObjects if I can help it but that is all I'm trying to test at the moment.

Thanks again for the help.

@mstoykov
Copy link
Contributor

A quick look through the code show cases that we do use || '' only in s3 code ... but even there we do not use it everywhere - for example not in putObject.

Looking at the sign code that returns signedRequest - body already is being set to '' in cases where it isn't there.
So it seems to me like this particular code is not ... needed.

A quick look at https://docs.digitalocean.com/products/spaces/reference/s3-sdk-examples/ seems to suggest the only "strange" settting is forcePathStyle with some code that uses it here. This doesn't seem like our problem.

I couldn't figure out the code where listObject is called in both go or js in 5 minutes. So I am aborting that try.

@w0rd-driven are you certain that is the only thing you've changed? I would recommend openning a PR as well. And potentially making a script that is reproducing the issue. Even if we need to provide our own digital ocean credentials taht will be really helpful.

@oleiade
Copy link
Member

oleiade commented Jun 28, 2023

@mstoykov Indeed, the setting signedRequest || '' is legacy, which I intend to address at some point 👍🏻 I'm also not convinced this is the cause of the error.

@w0rd-driven, I concur, it would be beneficial if you could put together an example or set of examples that demonstrate the issue to make sure we're all on the same level of understanding and to help us reproduce the issue (I must admit I'm not 100% sure what the actual problem you're experiencing and where is at this point 😸). Could you confirm the exact setup you experience the issue against, I got a bit confused along the way. Thanks a lot 🙇🏻

@w0rd-driven
Copy link
Author

I agree that it makes very little sense and I honestly found it via a fluke by trying Insomnia. I realized sending an empty string would return a similar 503 response which lead me down that rabbit hole.

I've rebuilt the script from the ground up starting with 0.8.0 and verified the default implementation still doesn't work just to make certain that wasn't playing tricks. I pulled in just listBuckets where I previously included _handle_error to get a deeper understanding of the problem. I changed things up to initialize dependencies in the function because the s3 client isn't passed through but I don't believe that invalidates the correction. That example can be found at https://gist.github.com/w0rd-driven/4e982d154a9c423997b73b714ff7e30d. It's looking for REGION, ACCESS_KEY_ID, and SECRET_ACCESS_KEY as environment variables. We have multiple regions and I can verify both of them require this change to work with this example. I would normally supply credentials but unfortunately, these keys give you access to all buckets in all regions so I have no easy temporary sandbox for you to play in.

I don't know what other examples I could provide other than listObjects and I'm honestly taking the same approach there. The URL scheme of both bucket.region.host and region.host/bucket work so there's nothing extra that really has to be done. I can make a PR from here but I do want to make sure we're all on the same page first.

@oleiade
Copy link
Member

oleiade commented Aug 15, 2023

Hey @w0rd-driven 👋🏻

I finally took the time to look in more depth into this. I could interact with a DO space using the S3 module.

It took me a while to get to the bottom of the issue, but my experiments showed that the main issue was that signedRequest.body || '' should have been signedRequest.body || null. I haven't analyzed too much why this was happening, but with #60 this should at least make a DO spaces based workflow possible 🤞🏻

I expect the fix to land in the next version of jslib-aws (0.10.0), soon-ish 🤞🏻

I might also take a stab at #57 and provide an endpoint property on the AWSConfig to make the whole setup easier. Will let you know if that's the case 🙇🏻

@oleiade
Copy link
Member

oleiade commented Sep 5, 2023

I will close this as I believe this is now addressed and will land in 0.10.0 (which is blocked by #55). The main branch already has the change if you wish to take it out for a spin.

Feel free to reopen in case you still encounter issues 🙇🏻

@oleiade oleiade closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants