Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_kubernetes: add option kube_token_ttl (#4352) #4487

Merged
merged 3 commits into from
May 23, 2022

Conversation

novegit
Copy link
Contributor

@novegit novegit commented Dec 20, 2021

The option sets the re-read frequency of the token for the
"read serviceaccount file" method and for option Kube_Token_Command. Default is 600 seconds.
Before this change, the token was reloaded only for the Kube_Token_Command with fixed default value 600s.

Signed-off-by: Michael Voelker [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

The option sets the re-read frequency of the token for the
defauld method and for option Kube_Token_Command. Default is 600
seconds.

Signed-off-by: Michael Voelker <[email protected]>
@novegit
Copy link
Contributor Author

novegit commented Dec 20, 2021

Tested with config:

[SERVICE]
    flush        5
    daemon       Off
    log_level    debug

    parsers_file /fluent-bit/my/parsers.conf
    plugins_file plugins.conf

    # HTTP Server
    # ===========
    # Enable/Disable the built-in HTTP Server for metrics
    http_server  Off
    http_listen  0.0.0.0
    http_port    2020

[INPUT]
    name cpu
    tag  generic_${HOSTNAME}_logging_logging-mux-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log
    interval_sec 15

[FILTER]
    Name                kubernetes
    Match               *
    Kube_URL            https://kubernetes.default.svc:443
    Merge_Log           Off
    Keep_Log            Off
    K8S-Logging.Parser  Off
    Kube_Tag_Prefix     generic
    Kube_Meta_Cache_TTL 10
    Kube_Token_TTL      14
    #Kube_Token_Command  /usr/local/bin/cat /var/run/secrets/kubernetes.io/serviceaccount/token


[OUTPUT]
    name  stdout
    match *

Logoutput:

/fluent-bit/bin/fluent-bit -c /fluent-bit/my/fluent-bit.conf
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/20 14:23:11] [Warning] [config] I cannot open /fluent-bit/my/..2021_12_20_14_22_01.776367954/plugins.conf file
[2021/12/20 14:23:11] [ info] Configuration:
[2021/12/20 14:23:11] [ info]  flush time     | 5.000000 seconds
[2021/12/20 14:23:11] [ info]  grace          | 5 seconds
[2021/12/20 14:23:11] [ info]  daemon         | 0
[2021/12/20 14:23:11] [ info] ___________
[2021/12/20 14:23:11] [ info]  inputs:
[2021/12/20 14:23:11] [ info]      cpu
[2021/12/20 14:23:11] [ info] ___________
[2021/12/20 14:23:11] [ info]  filters:
[2021/12/20 14:23:11] [ info]      kubernetes.0
[2021/12/20 14:23:11] [ info] ___________
[2021/12/20 14:23:11] [ info]  outputs:
[2021/12/20 14:23:11] [ info]      stdout.0
[2021/12/20 14:23:11] [ info] ___________
[2021/12/20 14:23:11] [ info]  collectors:
[2021/12/20 14:23:11] [ info] [engine] started (pid=17)
[2021/12/20 14:23:11] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/12/20 14:23:11] [debug] [storage] [cio stream] new stream registered: cpu.0
[2021/12/20 14:23:11] [ info] [storage] version=1.1.5, initializing...
[2021/12/20 14:23:11] [ info] [storage] in-memory
[2021/12/20 14:23:11] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/20 14:23:11] [ info] [cmetrics] version=0.2.2
[2021/12/20 14:23:11] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2021/12/20 14:23:11] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:23:11] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2021/12/20 14:23:11] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2021/12/20 14:23:11] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information
[2021/12/20 14:23:11] [debug] [http_client] not using http_proxy for header
[2021/12/20 14:23:11] [debug] [http_client] server kubernetes.default.svc:443 will close connection #19
[2021/12/20 14:23:11] [debug] [filter:kubernetes:kubernetes.0] Request (ns=logging, pod=logging-mux-6986665c64-2t989) http_do=0, HTTP Status: 200
[2021/12/20 14:23:11] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2021/12/20 14:23:11] [debug] [stdout:stdout.0] created event channels: read=19 write=20
[2021/12/20 14:23:11] [debug] [router] match rule cpu.0:stdout.0
[2021/12/20 14:23:11] [ info] [sp] stream processor started
[2021/12/20 14:23:11] [debug] [socket] could not validate socket status for #19 (don't worry)
[2021/12/20 14:23:25] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information
[2021/12/20 14:23:25] [debug] [http_client] not using http_proxy for header
[2021/12/20 14:23:25] [debug] [http_client] server kubernetes.default.svc:443 will close connection #25
[2021/12/20 14:23:25] [debug] [filter:kubernetes:kubernetes.0] Request (ns=logging, pod=logging-mux-6986665c64-2t989) http_do=0, HTTP Status: 200
[2021/12/20 14:23:25] [debug] [socket] could not validate socket status for #25 (don't worry)
[0] generic_logging-mux-6986665c64-2t989_logging_logging-mux-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log: [1640010205.734955700, {"cpu_p"=>3.411111, "user_p"=>1.166667, "system_p"=>2.244444, "cpu0.p_cpu"=>3.066667, "cpu0.p_user"=>0.800000, "cpu0.p_system"=>2.266667, "cpu1.p_cpu"=>4.066667, "cpu1.p_user"=>1.533333, "cpu1.p_system"=>2.533333, "cpu2.p_cpu"=>3.266667, "cpu2.p_user"=>1.000000, "cpu2.p_system"=>2.266667, "cpu3.p_cpu"=>3.733333, "cpu3.p_user"=>1.466667, "cpu3.p_system"=>2.266667, "cpu4.p_cpu"=>2.933333, "cpu4.p_user"=>0.733333, "cpu4.p_system"=>2.200000, "cpu5.p_cpu"=>3.600000, "cpu5.p_user"=>1.533333, "cpu5.p_system"=>2.066667, "kubernetes"=>{"pod_name"=>"logging-mux-6986665c64-2t989", "namespace_name"=>"logging", "pod_id"=>"ddfa1232-7591-40b7-91fc-f2305424c982", "labels"=>{"app"=>"fluentbit", "component"=>"mux", "logging-infra"=>"mux", "pod-template-hash"=>"6986665c64", "provider"=>"openshift", "service"=>"mux"}, "host"=>"docker-desktop", "container_name"=>"logging-mux", "[2021/12/20 14:23:30] [debug] [task] created task=0x7f37f7445150 id=0 OK
docker_id"=>"aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6"}}]
[2021/12/20 14:23:30] [debug] [out flush] cb_destroy coro_id=0
[2021/12/20 14:23:30] [debug] [task] destroy task=0x7f37f7445150 (task_id=0)
[2021/12/20 14:23:40] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information
[2021/12/20 14:23:40] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:23:40] [debug] [http_client] not using http_proxy for header
[2021/12/20 14:23:40] [debug] [http_client] server kubernetes.default.svc:443 will close connection #25
[2021/12/20 14:23:40] [debug] [filter:kubernetes:kubernetes.0] Request (ns=logging, pod=logging-mux-6986665c64-2t989) http_do=0, HTTP Status: 200
[2021/12/20 14:23:40] [debug] [task] created task=0x7f37f7445380 id=0 OK
[0] generic_logging-mux-6986665c64-2t989_logging_logging-mux-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log: [1640010220.701014500, {"cpu_p"=>3.466667, "user_p"=>1.355556, "system_p"=>2.111111, "cpu0.p_cpu"=>3.266667, "cpu0.p_user"=>1.133333, "cpu0.p_system"=>2.133333, "cpu1.p_cpu"=>3.333333, "cpu1.p_user"=>1.200000, "cpu1.p_system"=>2.133333, "cpu2.p_cpu"=>3.200000, "cpu2.p_user"=>1.066667, "cpu2.p_system"=>2.133333, "cpu3.p_cpu"=>3.400000, "cpu3.p_user"=>1.466667, "cpu3.p_system"=>1.933333, "cpu4.p_cpu"=>3.600000, "cpu4.p_user"=>1.266667, "cpu4.p_system"=>2.333333, "cpu5.p_cpu"=>4.000000, "cpu5.p_user"=>2.000000, "cpu5.p_system"=>2.000000, "kubernetes"=>{"pod_name"=>"logging-mux-6986665c64-2t989", "namespace_name"=>"logging", "pod_id"=>"ddfa1232-7591-40b7-91fc-f2305424c982", "labels"=>{"app"=>"fluentbit", "component"=>"mux", "logging-infra"=>"mux", "pod-template-hash"=>"6986665c64", "provider"=>"openshift", "service"=>"mux"}, "host"=>"docker-desktop", "container_name"=>"logging-mux", "docker_id"=>"aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6"}}]
[2021/12/20 14:23:40] [debug] [socket] could not validate socket status for #25 (don't worry)

grep "token updated"  logoutput
[2021/12/20 14:23:55] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:24:10] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:24:25] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:24:40] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:24:55] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:25:10] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2021/12/20 14:25:25] [ info] [filter:kubernetes:kubernetes.0]  token updated

Valgrind

==38==  Address 0x5e541a0 is in a rw- anonymous segment
==38==
==38==
==38== HEAP SUMMARY:
==38==     in use at exit: 195,972 bytes in 4,324 blocks
==38==   total heap usage: 13,007 allocs, 8,684 frees, 8,132,802 bytes allocated
==38==
==38== LEAK SUMMARY:
==38==    definitely lost: 0 bytes in 0 blocks
==38==    indirectly lost: 0 bytes in 0 blocks
==38==      possibly lost: 0 bytes in 0 blocks
==38==    still reachable: 195,972 bytes in 4,324 blocks
==38==         suppressed: 0 bytes in 0 blocks
==38== Reachable blocks (those to which a pointer was found) are not shown.
==38== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==38==

@patrick-stephens
Copy link
Contributor

Ignore the ActionLint failure, this is resolved on master now.

@nokute78
Copy link
Collaborator

@novegit In my understanding, this patch will change below points.

  1. Add option kube_token_ttl to change fixed FLB_KUBE_TOKEN_TTL
  2. Updating token regularly even if Kube_Token_Command is not set since ctx->kube_token_create will be greater than 0.

Is the point 2 breaking change?

It may be pointless question since I'm not familiar with k8s.

@novegit
Copy link
Contributor Author

novegit commented Dec 22, 2021

For using 'Kube_Token_Command' to fetch the authorization token, there is no change, because there was "a reload token" with 600s frequency, but it can configured now to other values.
For the built in default method to fetch the token (reading the token within the pod from /var/run/secrets/kubernetes.io/serviceaccount/token), the behavior changes. The token will be also read every 600s. The reason is, that in latest kubernetes version, the lifetime of the auth token within the pod could also be limited (as described in #4352), so best option is to read the auth token regulary from that file, and not only once at startup.

@patrick-stephens
Copy link
Contributor

For the built in default method to fetch the token (reading the token within the pod from /var/run/secrets/kubernetes.io/serviceaccount/token), the behavior changes. The token will be also read every 600s. The reason is, that in latest kubernetes version, the lifetime of the auth token within the pod could also be limited (as described in #4352), so best option is to read the auth token regulary from that file, and not only once at startup.

Yeah I think this is perfectly acceptable - presumably you could set a longer interval as well. We must be assuming the best practices are followed for regular credential rotation.

@edsiper
Copy link
Member

edsiper commented Jan 12, 2022

@novegit FYI: we are going to do the cut shortly, there are some pending changes

@novegit
Copy link
Contributor Author

novegit commented Jan 12, 2022

@edsiper whats missing?
docs pr created
patrick-stephens mentioned to ignore ActionLint failure
just changed the default value for kube_token_ttl from 60s to 600s (the 60s value was only to be intended for my tests)

@patrick-stephens
Copy link
Contributor

@edsiper looks like this got missed

@jyotimahapatra
Copy link

@edsiper This change is required for fluentbit running in EKS 1.21 clusters. I'm interested in getting this in. Please do let us know what else the PR requires to get the change in.

{
FLB_CONFIG_MAP_INT, "kube_token_ttl", "600",
0, FLB_TRUE, offsetof(struct flb_kube, kube_token_ttl),
"kubelet token ttl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this description should tell the unit- is this seconds or minutes or hours?

PettitWesley
PettitWesley previously approved these changes Apr 28, 2022
@PettitWesley
Copy link
Contributor

@patrick-stephens @edsiper Can we get this one merged and released? This is needed by AWS EKS customers. CC @lubingfeng

@patrick-stephens
Copy link
Contributor

I'm afraid I have no super powers for this one: CI is signed off though so I'll check with @edsiper

@@ -847,6 +847,11 @@ static struct flb_config_map config_map[] = {
0, FLB_TRUE, offsetof(struct flb_kube, kubelet_port),
"kubelet port to connect with when using kubelet"
},
{
FLB_CONFIG_MAP_INT, "kube_token_ttl", "60",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time based properties must use FLB_CONFIG_MAP_TIME type

if the original timeout was 10 minutes, now you are defaulting to 1 minute (60 seconds). That's a breaking change, it should keep the old defaults

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

@@ -847,6 +847,11 @@ static struct flb_config_map config_map[] = {
0, FLB_TRUE, offsetof(struct flb_kube, kubelet_port),
"kubelet port to connect with when using kubelet"
},
{
FLB_CONFIG_MAP_INT, "kube_token_ttl", "600",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be FLB_CONFIG_MAP_TIME, then you can use 10m as the default

Copy link
Contributor

@PettitWesley PettitWesley May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, the field on the struct should be uint64_t ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, flb_utils_time_to_seconds returns an int. (which I think means the time config map type won't work well on systems where an integer is smaller than 32 bits, since a 16 bit integer can only hold enough seconds for about 1 day if I did my math right)

@PettitWesley
Copy link
Contributor

@novegit If you make the simple change as shown in this commit we can merge this PR: PettitWesley@0527e1a

@novegit
Copy link
Contributor Author

novegit commented May 12, 2022

sorry, its not really clear for me what to use now. In the last commit was FLB_CONFIG_MAP_INT / "600" => 10m . So I changed it now to FLB_CONFIG_MAP_TIME / "10m"

@PettitWesley
Copy link
Contributor

PettitWesley commented May 12, 2022

@novegit This looks correct/as requested to me now. Thanks!

@stevehipwell
Copy link

@edsiper is there any chance of getting this merged ASAP?

@edsiper
Copy link
Member

edsiper commented May 23, 2022

Thanks

@stevehipwell
Copy link

Should this PR have closed #5445?

@PettitWesley
Copy link
Contributor

PettitWesley commented May 24, 2022

@stevehipwell may be yes but also its in many ways ideal to not close the issue until its released, so customers have something to search and find... of course github makes it easier to auto-close on merge so in general I think this repo uses that most of the time..

@lecaros lecaros added this to the Fluent Bit v1.9.4 milestone May 26, 2022
mgeriesa pushed a commit to mgeriesa/fluent-bit that referenced this pull request Oct 25, 2022
* filter_kubernetes: add option kube_token_ttl

The option sets the re-read frequency of the token for the
defauld method and for option Kube_Token_Command. Default is 600
seconds.

Signed-off-by: Michael Voelker <[email protected]>

* filter_kubernetes: set kube_token_ttl default to 600s

Signed-off-by: Michael Voelker <[email protected]>

* filter_kubernetes: use FLB_CONFIG_MAP_TIME for kube_token_ttl config

Signed-off-by: Michael Voelker <[email protected]>
Signed-off-by: Manal Geries <[email protected]>
demonccc pushed a commit to demonccc/fluent-bit that referenced this pull request Feb 14, 2023
* filter_kubernetes: add option kube_token_ttl

The option sets the re-read frequency of the token for the
defauld method and for option Kube_Token_Command. Default is 600
seconds.

Signed-off-by: Michael Voelker <[email protected]>

* filter_kubernetes: set kube_token_ttl default to 600s

Signed-off-by: Michael Voelker <[email protected]>

* filter_kubernetes: use FLB_CONFIG_MAP_TIME for kube_token_ttl config

Signed-off-by: Michael Voelker <[email protected]>
Signed-off-by: a445943 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants