Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent bit can't refresh token (Auth2) after an hour with output bigquery #7361

Closed
Dimoniss opened this issue May 11, 2023 · 5 comments · Fixed by #7387
Closed

Fluent bit can't refresh token (Auth2) after an hour with output bigquery #7361

Dimoniss opened this issue May 11, 2023 · 5 comments · Fixed by #7387

Comments

@Dimoniss
Copy link

Bug Report

Describe the bug
Fluent bit can't refresh token (Auth2) after an hour with output bigquery

To Reproduce

  • Example log message:
[2023/05/11 07:26:33] [error] [output:bigquery:bigquery.1] error creating RSA context
[2023/05/11 07:26:33] [error] [output:bigquery:bigquery.1] JWT signature generation failed
[2023/05/11 07:26:33] [error] [output:bigquery:bigquery.1] cannot retrieve oauth2 token
[2023/05/11 07:26:33] [debug] [upstream] KA connection #63 to www.googleapis.com:443 is now available
[2023/05/11 07:26:33] [debug] [out flush] cb_destroy coro_id=3
[2023/05/11 07:26:33] [debug] [task] task_id=0 reached retry-attempts limit 1/1
[2023/05/11 07:26:33] [error] [engine] chunk '1-1683789975.588212479.flb' cannot be retried: task_id=0, input=tail.0 > output=bigquery.1
[2023/05/11 07:26:33] [debug] [task] destroy task=0x7fe49d4bb380 (task_id=0)
  • Steps to reproduce the problem:
    Fluent Bit receives the necessary token at startup, but after an hour, when the token expires, it is not updated

Expected behavior
Fluentbit must refresh token on expiration

Your Environment

  • Version used:
    Fluent Bit v2.1.2
  • Configuration:
[SERVICE]
    Flush                   10
    Log_Level               Debug

[INPUT]
    Name                        tail
    Tag                         ntripcaster-logs
    Path                        /logs/ntripcaster-*.log
    Read_from_Head              True
    Ignore_Older                24h
    Skip_Empty_Lines            True
    Mem_Buf_Limit               50MB
    Refresh_Interval            10

[FILTER]
    Name                        grep
    Match                       ntripcaster-logs
    Regex                       log .*\[Source died\].*

[OUTPUT]
    Name                        bigquery
    Match                       ntripcaster-logs
    Google_Service_Credentials  /var/secrets/key.json
    Google_Service_Account      [email protected]
    Project_Id                  some-progect
    Dataset_Id                  ntripcaster_logs
    Table_Id                    logs
  • Environment name and version (e.g. Kubernetes? What version?):
    kubernetes
    Client Version: v1.25.4
    Kustomize Version: v4.5.7
    Server Version: v1.25.8-gke.500
@leonardo-albertovich
Copy link
Collaborator

It seems like there's an issue with the JWT signature creation, which linux distribution (and version) is this running in? I'd like to replicate the setup locally if possible.

I wonder if it's one of those "openssl 3 vs openssl 1" kind of scenarios so any information you can give us about your setup would be really helpful (ie. did you build fluent-bit yourself? which package did you install?)

@Dimoniss
Copy link
Author

Hi @leonardo-albertovich, to run fluent-bit, I used a ready-made image from docker hub with the tag 'latest', launched in the kubernetes infrastructure on a cluster in GCP declaratively with a manifest. I also have fluent-bit configured to send logs to stackdriver, for both outputs i use the same service account key. In the case of stackdriver, it works fine without failures, and when sending to bigquery, an hour after starting the container, this error occurs.

I hope this helps.

@leonardo-albertovich
Copy link
Collaborator

Help? you just solved the issue! those two plugins share a lot of code and it seems that a long time ago when I refactored the crypto layer I missed one line in out_bigquery.

The call chain in line 149 that starts with flb_crypto_sign_simple requires the user to pass a pointer to a variable that holds the signature buffer size and is then updated to contain the signature data length.

The issue is that in out_bigquery I didn't properly initialize the variable like I did in out_stackdriver.

Obviously in some cases sig_len could contain garbage at this point which could accidentally cause it to work (without security implications because the buffer is large enough for the data) but it seems like this is not the case for you.

Btw, the lower level function that actually performs the signature is evp_pkey_sign.

Do you want to send a PR with the fix or would you rather have someone from the team do it?

@Dimoniss
Copy link
Author

It's great that we caught this error so quickly. I'm not strong enough to do PR in C. I'd rather leave it to the professionals )) could you do it ?

@leonardo-albertovich
Copy link
Collaborator

Sure, I was just about to ping you. Don't worry, someone from the team will take over.

Thanks a lot for taking action, it really makes a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants