Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[self-hosted] 20022.08.0 support bundle does not contain pod logs #13095

Closed
adrienthebo opened this issue Sep 19, 2022 · 11 comments · Fixed by #13233
Closed

[self-hosted] 20022.08.0 support bundle does not contain pod logs #13095

adrienthebo opened this issue Sep 19, 2022 · 11 comments · Fixed by #13233
Assignees
Labels
team: delivery Issue belongs to the self-hosted team type: bug Something isn't working

Comments

@adrienthebo
Copy link
Contributor

adrienthebo commented Sep 19, 2022

Bug description

In Gitpod 2022.07.x, support bundles contained logs and log-collector directories containing logs of running pods and prior pods, respectively. Recent troubleshooting has shown that support bundles created from Gitpod 2022.08.0 don't contain these directories.

Steps to reproduce

Create a support bundle based on 2022.08.0 and confirm that they are missing the expected log directories.

Workspace affected

No response

Expected behavior

No response

Example repository

No response

Anything else?

No response

Front logo Front conversations

@adrienthebo adrienthebo added type: bug Something isn't working team: delivery Issue belongs to the self-hosted team labels Sep 19, 2022
@mrsimonemms mrsimonemms self-assigned this Sep 20, 2022
@mrsimonemms
Copy link
Contributor

mrsimonemms commented Sep 20, 2022

My anecdotal evidence is that the Fluent Bit logs are missing on some installations and not on others, regardless of the version of Gitpod. When I developed it, I believe I tried it on k3s, GKE and AKS but I know of at least one GKE user that's not got these appearing in their support bundles in 2022.7.x.

Will investigate

@adrienthebo
Copy link
Contributor Author

adrienthebo commented Sep 20, 2022

I performed some preliminary testing yesterday using a dev license on AKS; the support bundle did contain logs as expected. It might be that this is a limitation in how we're sending logs via replicated for production licenses or a mis-firing redactor.

Edit: Good catch regarding the absent logs in 2022.7.x; I suspected it was a regression but if these aren't present on 2022.8.x then this scenario is ruled out.

@adrienthebo
Copy link
Contributor Author

I've rooted around the support-bundle log collector and kotsadm interface to support-bundle to look for obvious pitfalls or where more troubleshooting information might lie. The log collection code is a little bit strange and warrants a closer look; if logs can't be collected then logs should be stored in the support bundle but I'm not certain all of the error cases are checked.

On potential concern might be that the kotsadm service account might not have permissions to read logs from the Gitpod pods, but that's conjecture.

Speaking more broadly, this has now affected multiple customers and is more than a one-off error.

@jimmybrancaccio
Copy link

So I just got a support bundle from someone and they're using release-2022.08.0.10. It included both logs and log-collector directories with logs in each. (Front ticket).

@mrsimonemms
Copy link
Contributor

The support bundle in the first message has a log-collector directory, but the files are called log-collector-errors.json and the contents are about how it cannot create the /gitpod directory on the node.

I can see that it's on GKE. There's definitely at least one other customer who's having issues on GKE, so I'm going to focus efforts on that

@mrsimonemms
Copy link
Contributor

Having done some investigation, it looks like it's due to the /gitpod/log-collector directory is not always present on the nodes. I've done a fix in #13233 which ensures that directory is always present and that stops filling the support bundles up with all the log-collector-errors.json files.

Repository owner moved this from 🕶In Review / Measuring to ✨Done in 🚚 Security, Infrastructure, and Delivery Team (SID) Sep 23, 2022
@adrienthebo adrienthebo reopened this Sep 23, 2022
Repository owner moved this from ✨Done to 📓Scheduled in 🚚 Security, Infrastructure, and Delivery Team (SID) Sep 23, 2022
@adrienthebo
Copy link
Contributor Author

adrienthebo commented Sep 23, 2022

We've seen issues with both the KOTS provided log collector as well as the fluent-bit log collector; this issue will alleviate the latter issue but we haven't located the root issue of the KOTS log collector failure.

As a side note, analyzing a support bundle that was missing the log-collector directory also indicated that fluent-bit was not running, explaining the absence of those logs.

@adrienthebo
Copy link
Contributor Author

We generated and uploaded a support bundle from a customer that was affected by the absent /logs directory; this time around the logs were collected as expected by KOTS. This lends support to the theory that it could be an upstream issue; we'll start pushing on that angle.

The customer also reported that during the Gitpod install, they observed the fluent-bit daemonset being created, pods being scheduled, and then the daemonset and pods being deleted. They're looking into audit logs because this behavior raises the question of some other system coming in and automatically wiping the fluent-bit installation.

Edit: After checking their audit logs it appears that the gitpod installer itself is removing fluent-bit. This might be the reinstallation logic triggering the wipe but we'll need to look more closely.

@mrsimonemms
Copy link
Contributor

@adrienthebo I had a look at the Slack thread and I'm very confused. I don't even know why the system:serviceaccount:gitpod:installer would be deleting the FluentBit resources - AFAIK, the installer service account shouldn't be affecting anything with regards to the base KOTS manifests.

One thing to note is that we have removed the installer service account as part of the September release (see #13168) and shifted in favour of the kotsadm service account. My hope is that, if there is some conflict in the user's system with the two service account, this will get solved by this removal.

@mrsimonemms
Copy link
Contributor

It seems that the cause of the problem is this line, which was added to 2022.8.0 to avoid conflicts from the customisation patch label change (the reason why it was a required release).

As part of the September release, we should test that:

Given running Gitpod 2022.8.0
And there is a customization_patch deployed
And FluentBit pods are not running
When I update to Gitpod 2022.9.0
Then FluentBit pods should be running

@adrienthebo
Copy link
Contributor Author

This issue was resolved in 2022.09.0 as part of the Golang rewrite of the installer.

Repository owner moved this from 📓Scheduled to ✨Done in 🚚 Security, Infrastructure, and Delivery Team (SID) Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: delivery Issue belongs to the self-hosted team type: bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants