Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate updates for k8s.io redirector service #176

Closed
ixdy opened this issue Feb 6, 2019 · 28 comments
Closed

Automate updates for k8s.io redirector service #176

ixdy opened this issue Feb 6, 2019 · 28 comments
Assignees
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@ixdy
Copy link
Member

ixdy commented Feb 6, 2019

Currently, changes to the k8s.io redirector service (i.e any changes to the configs under the k8s.io subdirectory) require @ixdy or @thockin to manually update the cluster, basically following something like the following process:

cd k8s.io/
kubectl -n k8s-io-canary apply -f configmap-nginx.yaml
# pick up new configs by forcing nginx to restart
kubectl -n k8s-io-canary scale deployment k8s-io --replicas=0
kubectl -n k8s-io-canary scale deployment k8s-io --replicas=1
TARGET_IP=[canary namespace service IP] make test

# if tests pass, deploy to production
kubectl -n k8s-io-prod apply -f configmap-nginx.yaml
# pick up new configs by forcing nginx to restart
kubectl -n k8s-io-prod scale deployment k8s-io --replicas=0
# note we scale back to 2, not 1
kubectl -n k8s-io-prod scale deployment k8s-io --replicas=2
# verify everything on prod
make test

There are lots of steps to automate here:

  • restarting nginx manually
  • testing on the canary namespace before updating the prod namespace
    • ideally we test before merge even, which is currently a manual process
  • requiring a human to deploy changes (rather than automatically updating on merge)
@bartsmykla
Copy link
Contributor

Hi @ixdy what do you think about me helping with that. Can you give me some hints or directions about people who could give me more informations? :-)

@ixdy
Copy link
Member Author

ixdy commented Feb 26, 2019

Take one of the steps I listed above and figure out some solution? e.g. figure out how to automatically update nginx when pushing a new configmap. (There are several approaches with different tradeoffs, and nobody's taken the time to figure out what makes the most sense here.)

@bartsmykla
Copy link
Contributor

bartsmykla commented Feb 27, 2019

In one of my past projects we were using the approach of creating a sidecar container which was watching changes inside specified paths, and when they appeared it was reading configmap, parsing it and sending to specified endpoint., but here it would be different.

One of the approaches would be to use feature, which is currently in beta -> Share Process Namespace between Containers in a Pod. We would create a sidecar component which will be watching configmap changes and if they will appear we could send a HUP signal to the nginx.

It's good because we are separating logic related to watching changes and sending appropriate signals to separate place, and not touching basic nginx image. Problem can appear if we don't want to use beta feature, or if our kubernetes cluster's version will be lower than 1.13.

Another approach will be related to running two binaries/scripts inside a container. One for nginx and second for our watcher which will send HUP signal to nginx process when changes appear.
Cons:

Actually I succeeded to create a solution using https://github.com/ochinchina/supervisord and https://github.com/fsnotify/fsnotify so I can clean it and push somewhere to test.

If it would be possible I think I would choose the first option, but I don't have experience with that feature so probably I couldn't see some downsides.

@bartsmykla
Copy link
Contributor

@ixdy I have created simple Go app using: https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/ which can be added as a sidecar container. I have to figure out how to test it yet, but for tests we can play with it: https://github.com/bartsmykla/nginx-reloader

@bartsmykla
Copy link
Contributor

Hi @ixdy did you have some time to look at my suggestions maybe?

@dims
Copy link
Member

dims commented Mar 12, 2019

@ixdy WDYT? ^

@ixdy
Copy link
Member Author

ixdy commented Mar 15, 2019

@bartsmykla I also think I prefer the sidecar option, since that seems like a cleaner, more generalizable pattern, though I don't think GKE supports kubernetes 1.13 yet.

Do you have access to a kubernetes cluster for testing? You could try updating the manifests in the k8s.io directory of this repo to try out your approach.

@ixdy
Copy link
Member Author

ixdy commented Mar 22, 2019

though, a counterpoint: it'd be nice to take advantage of the fact that we're using a Deployment right now, and so we should perform a rolling update of the config (in case it causes nginx to crash).

I've seen patterns where the ConfigMap containing the nginx config is somehow munged to work with a Deployment rolling update - maybe something like this?

(I think I've seen similar but different patterns elsewhere, but I'm not immediately finding them now.)

@bartsmykla
Copy link
Contributor

Let me dig into it a little bit more

@spiffxp spiffxp added this to the migrate-low-risk milestone Apr 30, 2019
@spiffxp spiffxp added the area/infra Infrastructure management, infrastructure design, code in infra/ label May 1, 2019
@thockin thockin added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jul 8, 2019
@dims
Copy link
Member

dims commented Jul 24, 2019

pending getting a cluster up

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2019
@stp-ip
Copy link
Member

stp-ip commented Oct 22, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2019
@stp-ip
Copy link
Member

stp-ip commented Jan 8, 2020

/assign

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2020
@stp-ip
Copy link
Member

stp-ip commented Apr 8, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2020
@stp-ip
Copy link
Member

stp-ip commented Jul 7, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2020
@stp-ip
Copy link
Member

stp-ip commented Oct 5, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2021
@spiffxp spiffxp removed this from the migrate-low-risk milestone Jan 13, 2021
@spiffxp
Copy link
Member

spiffxp commented Jan 22, 2021

/remove-priority backlog
/priority important-longterm
/milestone v1.21

Since we're going to want to make changes to dl.k8s.io as part of #1569, now might be the right time to re-examine how this is deployed/managed

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jan 22, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 22, 2021
@spiffxp
Copy link
Member

spiffxp commented Jan 22, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2021
@spiffxp
Copy link
Member

spiffxp commented Mar 16, 2021

FYI @ameukam @nikhita

@spiffxp
Copy link
Member

spiffxp commented Apr 15, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.21, v1.22 Apr 15, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 15, 2021

/remove-lifecycle stale
Still relevant. We are close with the deploy.sh script but it's not as simple as run once parameterless. Not that far off from something an appropriately privileged prowjob could run though

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 30, 2021

/close
This was accomplished by kubernetes/test-infra#22970 as part of #2151

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
This was accomplished by kubernetes/test-infra#22970 as part of #2151

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra Infrastructure management, infrastructure design, code in infra/ priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

8 participants