Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to easily keep updating agones to the latest version #1742

Open
axot opened this issue Aug 7, 2020 · 13 comments
Open

How to easily keep updating agones to the latest version #1742

axot opened this issue Aug 7, 2020 · 13 comments
Labels
area/operations Installation, updating, metrics etc awaiting-maintainer Block issues from being stale/obsolete/closed kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones

Comments

@axot
Copy link

axot commented Aug 7, 2020

Is your feature request related to a problem? Please describe.
We are thinking about how to introduce agones to our more and more projects.
The problem we're facing now is, k8s has 4 minor upgrade in a single year.
This meaning we have to switch agones k8s cluster to the new one for each time.
If there are 5 projects are using agones this becomes 5x4=20 times a year.
And a single operation to switch all realtime traffic to new cluster also risk and take much operation time.
It will be an almost impossible mission for our limited k8s engineers.

Describe alternatives you've considered
Another thing to consider is, agones is heavily using CRD and custom controller,
this meaning we have to figure out all changes in source code level.
And do integration test for each version.

Describe the solution you'd like
We want to discuss together to find a safer and less costly way to operate.

@axot axot added the kind/feature New features for Agones label Aug 7, 2020
@markmandel
Copy link
Member

@axot - super interesting questions.

I'm curious does https://agones.dev/site/docs/installation/upgrading/ help at all?

Also, I would love to know, how automated is your infrastructure setup and testing? Are you doing it all by hand, or automating with something like Terraform and having a CI/CD pipeline to push out and test new versions of your game?

@markmandel markmandel added kind/design Proposal discussing new features / fixes and how they should be implemented area/operations Installation, updating, metrics etc labels Aug 7, 2020
@axot
Copy link
Author

axot commented Aug 7, 2020

Hi,

I'm curious does https://agones.dev/site/docs/installation/upgrading/ help at all?

Yes, we are planning to use Multiple Clusters strategy for fast rollback to reduce business impact.

For the question of infrastructure automation.
Yes, we are using Terraform to Build GKE related resources and GitlabCI, Spinnaker for CI/CD.
There also are some manual parts of it.

For example,

  1. Setup new GKE information to Spinnaker, and update related pipeline and more.
  2. Helm charts / k8s manifest were installed by hand, ArgoCD is under testing, but need more time for production ready.
  3. The team which using agones did not maintenance k8s part, my team take responsibility to build all resources exclude the application code.
  4. Have to create the upgrade manual and perform online GKE cluster switching.

@markmandel
Copy link
Member

Curious - are there specific blocking issues that stop those manual steps from being automatic? or automated on manual keypress?

Also what are we looking for in a solution here? A best practices document? Changes to Agones? (if so, what changes?) Something else?

@axot
Copy link
Author

axot commented Aug 7, 2020

Let me talk with other members next week to confirm the rest work for achieving automatic release process!

@markmandel
Copy link
Member

Thanks! This is super good questions, and a good topic to get stuck into.

As an aside, been wanting a "Solutions" section to the documentation for some opinionated best practices for specific scenarios for a while, so if that's were we're headed, then 👍

@roberthbailey
Copy link
Member

Another alternative to consider, which I believe is that @steven-supersolid has said his team does, is that they pick a k8s+agones pair each time they roll out a new release.

So instead of worrying about how many k8s / agones upgrades you need to do per year, instead you say that for each release (every week / month / quarter / etc) you pick the version that you want to qualify and support (upgrading when you choose) and roll that out to production. Then on the next release you are free to pick another version.

This requires using new clusters for each release, but also allows you to skip agones / k8s releases if you don't need to use them rather than worrying about always keeping up with the latest release.

@axot
Copy link
Author

axot commented Aug 11, 2020

The opinion of k8s+agones pair is more easier to apply to the current environment with less effort.

In our case, we are using managed k8s(GKE), one difficult thing is we have to contact GCP side to disable auto upgrade feature for these specific clusters. So if any security patch need to be applied to GKE/node/agones, we then plan to upgrade to the latest k8s+agones pair.

This is a viable option for us. Internally we are keep discussing how to make infrastructure operation more automatically.

Thanks!

@roberthbailey
Copy link
Member

One difficult thing is we have to contact GCP side to disable auto upgrade feature for these specific clusters.

You cannot disable auto upgrade for patch releases (often security fixes). And if you wait long enough, you will eventually be forced to upgrade to a new minor version as well.

The idea of picking the k8s+agones pair is that if you do it frequently enough (say, more often that GKE upgrades minor releases) then you can avoid having clusters upgrade in place and instead replace aging clusters with new ones at the new version as part of your normal rollout.

Security patches for k8s are generally backported 3 or so minor versions, so even if you don't upgrade to the latest k8s on GKE you should still be getting patches as they come out. To date Agones has only applied patches to the latest release (and back porting fixes is a lot of work that we haven't yet seen the need for).

So what we would recommend is picking up the latest Agones release, paring it with the supported k8s version on GKE, and rolling those out once you have qualified them in your environment.

@axot
Copy link
Author

axot commented Aug 11, 2020

Thank for for clarifying the details.

If my understand is correct, to make k8s+agones pair works smoothly, the key is how to automate the release process.
I will feedback AFAP once our internal discussion was done.

@steven-supersolid
Copy link
Collaborator

We are not fully automated yet so one thing we do to save time is to maintain a dormant set of clusters per project. This way the k8s and agones upgrade can happen in place so time is saved recreating clusters.

We've found the GKE support schedule for k8s versions does not force us to upgrade. E.g. k8s 1,14 is still available

@axot
Copy link
Author

axot commented Aug 12, 2020

There are two main factors that we've been talking about within the team that are inhibiting.

One is a little bit more about the organization, which team is responsible for maintenance, and I'll skip this part.

The other technical factor is that we're using Spinnaker for continuous deployment, but Spinnaker is not good at updating config like cluster information, pipeline dynamically, so for example using gitlab-ci instead would be a bit easier to achieve this.

@roberthbailey
Copy link
Member

From @aimuz in #2843:

After reading the upgrade documentation, I was frustrated,
I found that he doesn't support smooth upgrades, which I think is a big flaw in the feature. I think we should support non-stop upgrades that,

When using k8s, we will consider deploying different services inside a cluster if we want to
If we have to migrate the whole cluster for agones upgrade, I think this may be a drawback. In order to use agones, we have to build a cluster dedicated to agones, so that we can avoid migrating other applications and other services when upgrading agones.

The cluster where agones is located it has a lot of things, such as monitoring, persistence, host optimization, etc., then these means that it needs to be done again, although there are quite a few tools to simplify this part of the work. But I think this is still unnecessary. That's why we should support smooth upgrades. Avoid migrating clusters in a way.

Perhaps, the way to upgrade we can refer to istio

@github-actions
Copy link

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

@github-actions github-actions bot added the stale Pending closure unless there is a strong objection. label Jul 15, 2023
@roberthbailey roberthbailey added awaiting-maintainer Block issues from being stale/obsolete/closed and removed stale Pending closure unless there is a strong objection. labels Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc awaiting-maintainer Block issues from being stale/obsolete/closed kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

4 participants