Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a virtual hosted style s3 bucket endpoint causes the healthcheck to fail #17351

Closed
bkaznowski opened this issue May 9, 2023 · 18 comments
Labels
sink: aws_s3 Anything `aws_s3` sink related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@bkaznowski
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When providing the more modern, virtual-hosted-style bucket endpoint (https://bucket-name.s3.region-code.amazonaws.com) and the bucket name to an aws_s3 sink then the healthcheck fails indicating that the bucket cannot be found:

ERROR vector::topology::builder: Healthcheck: Failed Reason. error=Unknown bucket: "redacted-bucket-name"

The bucket name is required. However, this results in the following (incorrect) URL being generated for listing the bucket: https://bucket-name.s3.region-code.amazonaws.com/bucket-name.

Vector can still put objects in this bucket but they are all placed inside a directory named the same as the bucket because it will generate https://bucket-name.s3.region-code.amazonaws.com/bucket-name/key-prefix/key as the URL to upload to.

Given that path-style URLs are being deprecated, it would be good to support the new virtual-hosted-style URLs.

Configuration

[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"

Version

vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)

Debug Output

No response

Example Data

No response

Additional Context

Virtual-hosted-style URLs are useful when you want to restrict egress traffic to specific domains.

References

No response

@bkaznowski bkaznowski added the type: bug A code related bug. label May 9, 2023
@zamazan4ik
Copy link
Contributor

vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)

I definitely recommend you update to the latest Vector version (0.29.1 right now) and test your case on this version. If the issue still remains - please let us know. So old Vector versions are not supported, unfortunately.

@bkaznowski
Copy link
Author

Thanks, I will update and report back 👍

@bkaznowski
Copy link
Author

Sorry for the delay, I have just tried this on vector 0.29.1 (x86_64-unknown-linux-musl 74ae15e 2023-04-20 14:50:42.739094536) and I am getting the same behaviour as originally described. I had to specify the bucket region because otherwise the region header wasn't matching the bucket region, so the config looks like this:

[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
region = "eu-west-1"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"

@jszwedko jszwedko added the sink: aws_s3 Anything `aws_s3` sink related label May 16, 2023
@bkaznowski
Copy link
Author

It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13

@spencergilbert
Copy link
Contributor

It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13

I believe that's the case, unfortunately we've been blocked from upgrading by a regression that was introduced a few versions ago. Working through that is on my todo list in the next few weeks IIRC.

@bkaznowski
Copy link
Author

Excellent, thank you! I will wait for this to be completed then.

@bkaznowski
Copy link
Author

Do you happen to have a link to the issue so we can track it? 👀

@spencergilbert
Copy link
Contributor

Do you happen to have a link to the issue so we can track it? 👀

I didn't see an issue so I opened #17728

@bkaznowski
Copy link
Author

bkaznowski commented Aug 14, 2023

Just tested this again on 0.31.0 and I am still seeing the same problem. It's odd because 0.31.0 supposedly contains this change:
#17731
When I look at the forked AWS SDK that the hash points to then it appears to contain the change that should have fixed this:
https://github.com/vectordotdev/aws-sdk-rust/blob/3d6aefb7fcfced5fc2a7e761a87e4ddbda1ee670/CHANGELOG.md#january-13th-2023
So it seems that there might be something else going on here.

@bkaznowski
Copy link
Author

Just tested on 0.33.0. The issue is still present.

@pront pront added type: enhancement A value-adding code change that enhances its existing functionality. and removed type: bug A code related bug. labels Nov 30, 2023
@ashrayjain
Copy link

I believe this is happening because of the force_path_style(true) configuration here: https://github.com/vectordotdev/vector/blob/master/src/common/s3.rs#L11

@Babbadger
Copy link

Babbadger commented Oct 17, 2024

Virtual-hosted-style bucket endpoint is still not supported in v.0.41.1.

Has anyone tried flipping force_path_style(true) to false and building custom image? Or there is more changes required for it to work? AWS S3 sink was updated around v.0.31, so it should teoretically be possible to make it work.

@sam6258
Copy link
Contributor

sam6258 commented Nov 17, 2024

+1 for this request as I have an object storage service that only supports vhost based access so in its current state, the s3 sink is unusable for me.

@sam6258
Copy link
Contributor

sam6258 commented Nov 18, 2024

FWIW I built a custom vector binary (0.39.0) and tried out removing force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):

# cat ~/.aws/config 
[default]
s3 =
    addressing_style = path

So it seems there is also a little bit of work to add an option to go back to path based if needed.

@Babbadger
Copy link

FWIW I built a custom vector binary (0.39.0) and tried out removing force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):

# cat ~/.aws/config 
[default]
s3 =
    addressing_style = path

So it seems there is also a little bit of work to add an option to go back to path based if needed.

Did you just flip

    let config = config::Builder::from(config).force_path_style(true).build();

true to false to build it? I'm getting errors when building, but maybe this is just compiling problems on my side

@sam6258
Copy link
Contributor

sam6258 commented Nov 20, 2024

I removed the call to force_path_style(true) to just let it use default

@sam6258
Copy link
Contributor

sam6258 commented Dec 10, 2024

#21999

@jszwedko
Copy link
Member

Closed by #21999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: aws_s3 Anything `aws_s3` sink related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

8 participants