-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentbit S3 output causes prometheus metrics endpoint to be briefly unavailable. #4165
Comments
cc: @PettitWesley |
@bharatnc if possible can you do two things to help me:
|
Also wait, you can use my S3 output to send to Google?? @JeffLuoo @qingling128 Curious if this is something you folks recommend? 😬 |
That's news to me. Does it actually work? If so, I guess that means the s3 output plugin is implemented in a generic way? I do see a related feature request here: #1032 |
Yes it works quite well (though I haven't tested it throughly) and I guess that the s3 output plugin is generic enough that all I had to do was to generate HMAC keys on GCS and use it for access-key and access-secret. |
Thank you @PettitWesley. Re 1: Filed a new issue under https://github.com/aws/aws-for-fluent-bit |
interesting... this doesn't really make any sense to me... does GCS have an option to not use Auth? Because AWS has its own auth algorithm called Sigv4 (for which Eduardo and I had to write a custom module for in Fluent Bit). No one else uses that; I think Google uses oauthv2. If you take someone else's secret and put it into the AWS sigv4 algorithm... the output shouldn't be something GCS would accept if its checking auth headers... |
I don't know much details about the auth algorithm. But this doc: https://cloud.google.com/storage/docs/migrating#migration-simple provides a generic example on how to use the AWS go-sdk with HMAC keys (for interoperability). I happened to try this with the fluentbit and the AWS S3 Output plugin assuming that it was written in a S3 generic way. Guess what ? It looks like it was written in such a way and started working with just the above settings with |
@bharatnc interesting! I didn't know that GCP had built compatibility with S3 and with AWS auth into GCS. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This functionality is provided by the embedded http server which is implemented in This http server is implemented through monkey which in this case means it runs in its own thread so when the call to What I'm wondering is if it would be possible for the S3 to be causing this in |
@leonardo-albertovich Yea we do try to fetch credentials in cb_s3_init which can lead to synchronous http requests being made. The credential code is here: https://github.com/fluent/fluent-bit/tree/master/src/aws But all our http requests are made using this: https://github.com/fluent/fluent-bit/blob/master/src/aws/flb_aws_util.c#L151 What are you looking for? How could this code cause the issue? |
I was trying to make sense of the symptom which is "the http server takes longer than expected to start when the s3 output plugin is enabled". That's why I mentioned the initialization order. I wouldn't expect the credentials request mechanism to be super long, it wouldn't really make sense for the process to be too convoluted or for those services to actually take long to answer but that is one of the possible reasons considering that blocking connections are used there. I think the safest way to determine it is printing the timestamp in key lines in |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
Bug Report
Describe the bug
Prometheus metrics endpoint
/api/v1/metrics/prometheus
is not available immediately with S3 Output and takes a long time for the interface to be available.To Reproduce
Add an output to S3:
Note: For credentials,
AWS_ACCESS_KEY_ID
&AWS_SECRET_ACCESS_KEY
variables are used. These are exported and present in systemd environment when running fluentbit.curl the metrics endpoint:
Expected behavior
I should be able to curl the endpoint without seeing connection refused error instantaneously. Curl starts working only after a long time. This varies b/w my tests from attempt to attempt - generally 5-10 minutes and also happens to coincide with the first successful upload after buffering 1M of data (according to the settings I use above).
Screenshots
NA
Your Environment
Additional context
This will cause delay in metrics reporting as the metrics interface will not be available for Prometheus scrapes. I am observing varying amount of time before this interface is accessible for curl - not sure what's going on / if I am missing some settings.
The text was updated successfully, but these errors were encountered: