Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable load shedding for kubernetes_log source #18784

Open
mdbenjam opened this issue Oct 5, 2023 · 3 comments
Open

Enable load shedding for kubernetes_log source #18784

mdbenjam opened this issue Oct 5, 2023 · 3 comments
Labels
source: kubernetes_logs Anything `kubernetes_logs` source related type: feature A value-adding code addition that introduce new functionality.

Comments

@mdbenjam
Copy link

mdbenjam commented Oct 5, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

When running Vector as a Kubernetes DaemonSet, a single pod that writes large quantities of logs can degrade performance for the logs from the other pods on a node, and eventually can lead to pods being evicted from the node due to disk pressure.

Vector holds on to file descriptors of log files that it hasn't finished processing. So if a pod generates more logs per second than Vector can parse then over time Vector will continue to hold on to file descriptors preventing rotated log files from being deleted. This can eventually exhaust the disk space on the node and cause pods to be evicted.

To prevent this, there needs to be some way for Vector to shed load. Ideally in an equitable way that sheds load from noisy pods first.

Attempted Solutions

No response

Proposal

One way to address this issue is to add a new max_open_rotated_files_per_pod configuration to the kubernetes_logs source. This would allow users to define the maximum number of files Vector could track for a given pod.

Example:

Given:

  • max_open_rotated_files_per_pod=2 and oldest_first=true
  • Pod foo outputs logs faster than Vector can process them
    • After an hour Vector has fallen behind and the logs have been rotated twice, but Vector is still reading from the first one.
log_file <--- current file, pod `foo` is writing to this
log_file_1
log_file_2 <--- oldest file, Vector is currently reading from this

Now that Vector is tracking 3 files for pod foo, but max_open_rotated_files_per_pod is set to 2, Vector will stop tracking the oldest file, which will allow the system to remove it.

Caveats

This setting will lead to log loss, which should be called out in documentation. If added, a corresponding metric should be added to allow users to know how many log files are being left unread.

References

Version

vector 0.33.0

@mdbenjam mdbenjam added the type: feature A value-adding code addition that introduce new functionality. label Oct 5, 2023
@jcantrill
Copy link

@syedriko please push an upstream patch for ViaQ#154 to jump start the discussion and move into the upstream

@neuronull neuronull added the source: kubernetes_logs Anything `kubernetes_logs` source related label Oct 18, 2023
@syedriko
Copy link
Contributor

@syedriko please push an upstream patch for ViaQ#154 to jump start the discussion and move into the upstream

Here it is: #18904

@benjaminhuo
Copy link

cc @wanjunlei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: kubernetes_logs Anything `kubernetes_logs` source related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

5 participants