-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Cluster Autoscaler Debugging Snapshot #4346
Comments
i think this sounds like an interesting idea and i could certainly see using it occasionally, i do have some concerns about the size of the data and how it gets returned to the user. couple questions,
|
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Cluster Autoscaler Debugging Snapshot [Toil Reduction]
Author:
Introduction
With the growing number of large, autoscaled clusters we are increasingly dealing with complex cases that are very hard to debug. One major difficulty is that we log information about what decision Cluster Autoscaler (CA) took as well as the results of various intermediate steps that lead to this decision, but we don't log the data this decision was based on. The reason for this is that CA is internally simulating the behavior of the entire cluster and to fully understand any given decision we need to know the exact state of relevant nodes and pods (possibly all nodes and pods in the cluster) as well as certain k8s objects. The volume of logging required to capture all that data would be prohibitive.
This document proposed introducing a new "snapshot" feature to Cluster Autoscaler. This feature would allow an engineer debugging an autoscaler issue to manually trigger CA to dump its internal state.
Proposal
The snapshot tool will use a manual trigger mechanism using HTTP endpoint which will collect the debugging information in the following run-cycle of CA and return the information as a JSON-parsable HTTP response.
Trigger
The debugging snapshot is captured only when it receives a trigger from the user. Manual trigger is used instead of automated data collection is because the process of capturing the snapshot can be long in a large cluster thereby affecting the performance of CA and increase in latency. Using the HTTP request as a trigger. This is done by creating a HTTP endpoint in CA which would receive the trigger as an API call. This will allow passing of parameters for better extension of the trigger. It allows for easy return of error code if the request fails.
Method of Formatting
The data in the snapshot will be very large and not all of the fields are always relevant. This poses the problem of choosing between what data we want to include and how readable the data be. We want to avoid a situation where we have a debugging snapshot of which fields are missing, but that may not always be possible.
The proposed way is to make the snapshot a parsable json.
All the relevant data collected to be encapsulated as a json field. The snapshot will contain all the elements that can be captured.
This file may not be easily human readable. But with some additional tooling (e.g. jq) to extract a readable format from this “full” snapshot. This should bridge the gap between how much data can be pushed in the snapshot and the readability of it. It also gives a quick turnaround time to create a “new readable” snapshot which won’t require any changes to the code or long waiting times for kubernetes releases. It also gives the ability to regenerate a new readable snapshot for older data.
Data Collection and Workflow
There are two data points proposed (so far) to be collected.
List of all NodeInfo. This will be collected in static_autoscaler:RunOnce(), after we have added all the upcoming nodes. This will contain all the nodes, incl. properties not limited to resources, labels, taints. It will also have all pods scheduled on each of the nodes and all the properties related to each pod.
List of all UnschedulablePodsCanBeScheduled. They are collected in filter_out_schedulable:Process(). This list of Pods are what CA considers will fit on an existing / upcoming node(s), and hence does not consider it as part of the scale up.
The final list of data points and the location where they are set may change based on implementation details.
There are design decisions also made on how the interface will be used. We need to have an adaptive interface to capture all data fields and be extensible for each cloud provider.
There will be one interface in-core which cloud-providers will have to extend to add extra values from in-cloud-provider. And for each data field added, a new function is created in-cloud-provider interface with the correct data type of the data field.
This is actively trying to avoid using a generic function using
interface{}
as an argument as:func AddExtraData(interface{} d) { }
This is done to increase readability of the code and keep type-augmented functions.
The text was updated successfully, but these errors were encountered: