Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize how our CI creates debug logs #125

Open
ca-scribner opened this issue Feb 1, 2024 · 4 comments
Open

Standardize how our CI creates debug logs #125

ca-scribner opened this issue Feb 1, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@ca-scribner
Copy link
Contributor

Context

The ways charm repo CI generates logging for debugging is not standardized across all the charm repos. Some repos use the dump-charm-debug-artifacts action, others define their own logging, etc. These should be standardized

The cost here is that, for most repos, CI failures do not come with enough logging to debug the issue. This means engineers burn time having to reproduce CI errors locally, slowing down how quickly they can triage and fix issues.

However this problem is solved, a requirement is that this be a general solution that can be applied to each repo. No (or at worst, minimal) repo-level configuration should be necessary. This way we cannot forget to add a particular application, check a particular log, etc, in a specific repo. We should have a cookiecutter way to put CI logging in any charm repo from a template

What needs to get done

  1. define a consistent way all charms should print debug logs (either with dump-charm-debug-artifacts or something else
  2. deploy this across all charmed kubeflow repos

Definition of Done

  1. all charmed kubeflow repos have a common debug log mechanism
@ca-scribner ca-scribner added the enhancement New feature or request label Feb 1, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5290.

This message was autogenerated

@orfeas-k
Copy link
Contributor

Referencing this commit that adds loggging to the CI canonical/oidc-gatekeeper-operator@b7821c0

@NohaIhab
Copy link
Contributor

NohaIhab commented Sep 26, 2024

Extending on the context suggested by @ca-scribner, we also need a standardized way of providing workload logs when the CI fails. There is the Dump logs action, but it has its limitations.

Dump logs action limitations

  1. It is a huge clustered dump of:
  • juju debug logs
  • kubernetes pods descriptions in all namespaces
  • kubernetes pods logs in all namespaces for all containers
  • kubernetes deployments description in all namespaces
  • kubernetes replicaset description in all namespaces
  • kubernetes nodes information
  • charmcraft logs

The above contains many logs which are almost never relevant to the failure (for example from the kube-system namespace). This makes it very difficult to parse the output of the action, where the root cause of failure gets lost.
This also means that there's a duplication of the charm container logs, shown as part of the Juju debug logs and the kubernetes logs.
It would be nicer to have the different logs separated from each other in the GH CI i.e. have categories of logs (Kubernetes, Juju, Charmcraft) so we can immediately access what we need.

  1. It often misses the logs we actually need. Let's take kserve-controller as an example. If the integration tests fail, what I would like to see highlighted is:
  • the logs of kserve-controller pod, for both charm AND workload container
  • the details of the ConfigMap created by the controller
  • the details of the InferenceServices resources in the cluster
  • the logs from the InferenceService Pods created in the tests
  • the relation data between kserve-controller and resource-dispatcher charm

It can be challenging to optimize the logs of each repo this way, because every charm applies different manifests and creates different CRDs. Potentially, we could use something like this to view all resources of all kinds rather than what we get with kubectl get all, and filter it with only in the namespaces we are interested in (test namespace + user namespace if applicable)

dump-charm-debug-artifacts workflow solves some of these issues, but not all of them. We need to evaluate if we can work to enhance it, or if we'd rather have a completely new tool.

@NohaIhab
Copy link
Contributor

NohaIhab commented Sep 26, 2024

  • The dump-charm-debug-artifacts is currently not working as expected, where the dump artifact gets overwritten by each job. There needs to be an artifact for each job that calls the dump action.
  • The logs are not being collected in real-time of running the tests, rather after they finish, so the Events are missing from the dump
  • The resources created from the CRDs are often being cleaned up before the tests finish, so they are never visible in the logs
  • The action does not collect the workload logs
  • We would like to have a proper structure and organize:
    • The logs being dumped in the CI
    • The file structure of the logs saved in the artifact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Labeled
Development

No branches or pull requests

3 participants