Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encryptedFile:gcs | interval pull secrets from bucket #5996

Closed
vijay-sapient opened this issue Aug 24, 2020 · 13 comments
Closed

encryptedFile:gcs | interval pull secrets from bucket #5996

vijay-sapient opened this issue Aug 24, 2020 · 13 comments
Labels

Comments

@vijay-sapient
Copy link

vijay-sapient commented Aug 24, 2020

Once i enabled "encryptedFile:gcs" to store my kubeconfig files. Clouddriver is pulling kubeconfig files every second. Is there a way i can control this interval, or tell clouddriver to pull it only once?
INFO 1 --- [igurationSource] c.n.s.k.secrets.engines.GcsSecretEngine : Getting contents of object .hal/default/credentials/kubeconfig-****
This is seems to be loading up the clouddriver and i see timeout during Deploy manifest.

This used to work fine, when i had the kubeconfig files in clouddriver itself.

@ezimanyi ezimanyi added the sig/ops Operations SIG label Aug 24, 2020
@vijay-sapient
Copy link
Author

vijay-sapient commented Aug 26, 2020

Looking into this a little more, i see clouddriver using 50GB of memory and could be failing(I dont see any logs, but i think this is definetly causing problem). This is also the reason why some of the deployments fails. I have around 600 accounts, with an account for every namespace.
1.What can be the optimal config like cpu and memory limit i should have in the hal config?
2.Does clouddriver support these many accounts?
3. Can we stop pulling the kubeconfig file every second and pull it only during startup?

@devorbitus
Copy link

devorbitus commented Aug 27, 2020

@vijay-sapient Are you using dynamic accounts? like this

@vijay-sapient
Copy link
Author

Hi @devorbitus not using, but looks like i should. does it support GCS as well?
we are only allowed to keep kubeconfigs in bucket.

@devorbitus
Copy link

The reason I asked, is some people have seen significantly increased memory usage when using git for the dynamic accounts backend if not properly configured. Dynamic accounts work well enough for kubernetes accounts only (or cloudfoundry) but it's not an ideal solution which is why there was a recent RFC to implement a different approach for external account management but it's not expected until the 1.23.x release at the earliest.

@vijay-sapient
Copy link
Author

In my case i have not enabled dynamic accounts, I have around 647 kubeconfig files, which is in a GCS bucket.(I moved all kubeconfigs to GCS bucket due to the limitation that hal creates a secret of more than 1MB and is not allowed in kubernetes).

Now when i look at the clouddriver logs I see all the replicas getting the kubeconfig file every second, I only see the below entries in my logs.
INFO 1 --- [igurationSource] c.n.s.k.secrets.engines.GcsSecretEngine : Getting contents of object .hal/default/credentials/kubeconfig-****
Isnt there any configuration that can stop gettting configs every second?

I think for me it would be good to create an admin account per cluster and remove write access to users.

Any suggestions or any other solutions i can try?

@devorbitus
Copy link

The kubernetes caching agents run every minute or so by default, Spinnaker is using those kubeconfig files each time it has to poll all the kubernetes accounts to see if things changed and on that many accounts, it has to do that a lot. There is a lot of work going on right now to help with some of the performance issues with the kubernetes v2 provider. There are also some proprietary commercial solutions that some companies are working on to help with this issue but nothing at the OSS level that I know of until some of the above changes come into place.

From an ops perspective:

To answer question number 1, this is highly dependant on so many factors related to your Spinnaker deployment environment that it's easiest to say give it a lot of resources and periodically check utilization using something like Goldilocks to get a handle on what you are actually using during your normal workloads as well as during some spike workload timeframes.

To answer question number 2, several companies have hundreds (if not a thousand or more) accounts, and each company has to monitor the memory utilization of their clouddriver replicas cautiously and add more replicas when needed. I am assuming you are already using clouddriver SQL instead of Redis?

To answer question number 3, the short answer is no, you need to pull down the kubeconfig files for each account and that will happen on each run of the caching agents. As long as things are operating properly there shouldn't be much to be concerned about.

@vijay-sapient
Copy link
Author

Thanks @devorbitus that helps decide what we we can do. Do you have any specific suggestions around which GC would be the best?

@devorbitus
Copy link

GC?

@vijay-sapient
Copy link
Author

sorry, I meant garbage collection.
am using "-XX:+UnlockExperimentalVMOptions -XX:MaxRAMFraction=2 -XX:+UseZGC

@erancx
Copy link

erancx commented Aug 31, 2020

I'm seeing the same behavior once I setup encryption using gcs.

$ kubectl logs spin-clouddriver-ro-7b47b9878-6tfrg clouddriver-ro | grep "Getting contents" | wc -l
   22051

@spinnakerbot
Copy link

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

@github-actions
Copy link
Contributor

This issue hasn't been updated in 45 days so we are tagging it as 'to-be-closed'. It will be closed in 45 days. Add some activity and the label will be removed within several hours.

@spinnakerbot
Copy link

This issue is tagged as 'to-be-closed' and hasn't been updated in 45 days, so we are closing it. You can always reopen this issue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants