Set all custom charts image.pullPolicy to IfNotPresent #258

willgraf · 2020-02-11T19:11:08Z

No need to pull images for all containers if it already exists. This should reduce the deployment time of new pods.

dylanbannon

I'm testing this right now and it appears that all cluster functionality is intact. So, that's good.

Beyond that, I want to confirm when we see a speedup from this change, and to what extent. (For testing, it's probably easiest to look at creation of Tensorflow-serving pods, since those use the largest Docker image of any pods we scale up and down.)

To this end, I think there are three independent cases to test:

What speedup do we see when we try to create a second TF-serving pod on a node that already has one? Does the second pod download the Docker image, or does it use one already on the node? (With our standard node types, this situation would never occur, since we only have one GPU per node, so I don't expect to actually test it. It's listed here only for completeness.)
What speedup do we see when a second concurrent node tries to start a TF-serving pod? If there's already one running node with TF-serving, does a second running node starting a TF-serving pod download its image from Docker Hub, or does it download it from the other node in the cluster?
What speedup do we see when the cluster previously have a running TF-serving pod, but the node that hosted that pod has shut down, and now a new node is starting up and trying to run a TF-serving pod? Does the new node download the TF-serving image from Docker Hub, or can it copy it from some Kubernetes-level "image pool" within the cluster?

Tbh, it'd probably be sufficient to just rtfd to generate hypotheses for each of these cases first. Then, we could just test the ones that should show a performance improvement.

willgraf · 2020-02-22T00:40:52Z

It looks like each new node will need to re-pull the tf-serving image. This means that the performance of our benchmarking will likely not be greatly improved.

However, when a tf-serving pod crashes, it should be able to come back up faster?

dylanbannon · 2020-03-05T01:59:35Z

@willgraf

It looks like each new node will need to re-pull the tf-serving image. This means that the performance of our benchmarking will likely not be greatly improved.

However, when a tf-serving pod crashes, it should be able to come back up faster?

Yeah, I bet you're right about that.

I'm actually comfortable working off the assumption that every node maintains its own pool of downloaded images, so that the second time something happens on a node, it'll be faster now. Otherwise, though, I don't think think we'll see any speedup, since I'm guessing nodes can't share images directly, so new nodes will always download images from Docker Hub or the Google Container Registry or somewhere else outside the cluster.

I really don't feel the need for testing this, tbh.

dylanbannon

This all looks good. We'll see speedups in certain situations and I can't think of any situation where this would have a downside.

* Feature/cicd (#268) * Set all custom charts image.pullPolicy to IfNotPresent (#258) * setting TRANSLATE_COLON_NOTATION=false by default (#289) * Update Getting Started (#287) * Update PULL_REQUEST.md for grammar (#292) * Use gomplate to template patches/hpa.yaml. (#293) * default account has 100 firewalls, not 200. (#297) * Update all documentation and links to reference kiosk-console instead of kiosk (#295) * Use yq and helmfile build to dynamically deploy helm charts based on release name. (#300) * Upgrade the openvpn chart to latest 4.2.1. (#301) * Change CLUSTER in Makefile to kiosk-console to fix binary name issue. (#302) * update raw.gif and tracked.gif with new nearly perfect gif (#303) * Update default values for tf-serving (#306) * Update Redis to the latest helm chart before they migrate to bitnami (#307) * Update autoscaler to 0.4.1 (#308) * Update redis-janitor to 0.3.1 (#309) * Update frontend to 0.4.1. (#310) * Update OpenVPN command for version 4.2.1 (#313) * Upgrade consumers to 0.5.1 and update models to DeepWatershed. (#311) * Set no-appendfsync-on-rewrite=yes to prevent Redis latency issues during AOF fsync (#316) * Install yq in install_script.sh (#319) * Use 4 random digits for cluster names. (#318) * update to latest version of the frontend (#322) * Change default consumer machine type to n1-standard-2 (#323) * Upgrade benchmarking to 0.2.4 and fix for Deep Watershed models (#324) * Use GRAFANA_PASSWORD env var to override the default grafana password. (#325) * Update Getting Started docs with new user feedback (#321) * Add basic unit tests (#326) * Use the docker container to run integration tests. (#327) * Warn users if bucket's region and cluster's region do not match (#329) * Bump benchmarking to latest 0.2.5 release (#331) * Add Logo Banner and Update README (#332) * Add new menu option for default settings with 4 GPUs (#333) * Update HPA target to 2 keys per zip consumer pod. (#334) * Bump consumers to version 0.5.2 (#336) * Update consumer and benchmarking versions (#337) * Bump redis-janitor to 0.3.2 to fix empty key bug. (#339) * bump benchmarking to 0.3.1 to fix No route to host bug. (#341) * Allow users to select which zone(s) to deploy the cluster (#340) * Pin KUBERNETES_VERSION to 1.14. (#346) * Fix bug indexing into last array element of valid_zones. (#348) * Fix logs to indicate finality and be less redundant. (#351) * If KUBERNETES_VERSION is 1.14, warn user of potential future version removal (#352) Co-authored-by: dylanbannon <[email protected]> Co-authored-by: MekWarrior <[email protected]>

* remove chart quotes and set all image.pullPolicy to IfNotPresent * remove pullPolicy from helmfile, no need to override by default

willgraf added 2 commits February 10, 2020 17:26

remove chart quotes and set all image.pullPolicy to IfNotPresent

1713aee

remove pullPolicy from helmfile, no need to override

1c06a12

willgraf changed the title ~~Set all custom charts image.pullPolicy to IfNotAvailable~~ Set all custom charts image.pullPolicy to IfNotPresent Feb 11, 2020

willgraf added the enhancement New feature or request label Feb 11, 2020

willgraf requested a review from dylanbannon February 11, 2020 20:50

dylanbannon reviewed Feb 12, 2020

View reviewed changes

dylanbannon changed the base branch from master to stable March 3, 2020 22:53

dylanbannon approved these changes Mar 5, 2020

View reviewed changes

willgraf merged commit 3ea79ca into stable Mar 5, 2020

willgraf deleted the willgraf/chart-update branch March 5, 2020 18:27

willgraf added a commit that referenced this pull request May 23, 2020

Set all custom charts image.pullPolicy to IfNotPresent (#258)

0ec7307

* remove chart quotes and set all image.pullPolicy to IfNotPresent * remove pullPolicy from helmfile, no need to override by default

willgraf added a commit that referenced this pull request May 23, 2020

Set all custom charts image.pullPolicy to IfNotPresent (#258)

e31540e

* remove chart quotes and set all image.pullPolicy to IfNotPresent * remove pullPolicy from helmfile, no need to override by default

willgraf added a commit that referenced this pull request May 23, 2020

Set all custom charts image.pullPolicy to IfNotPresent (#258)

b8ae5dd

* remove chart quotes and set all image.pullPolicy to IfNotPresent * remove pullPolicy from helmfile, no need to override by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set all custom charts image.pullPolicy to IfNotPresent #258

Set all custom charts image.pullPolicy to IfNotPresent #258

willgraf commented Feb 11, 2020

dylanbannon left a comment •

edited

Loading

willgraf commented Feb 22, 2020

dylanbannon commented Mar 5, 2020

dylanbannon left a comment

Set all custom charts image.pullPolicy to IfNotPresent #258

Set all custom charts image.pullPolicy to IfNotPresent #258

Conversation

willgraf commented Feb 11, 2020

dylanbannon left a comment • edited Loading

Choose a reason for hiding this comment

willgraf commented Feb 22, 2020

dylanbannon commented Mar 5, 2020

dylanbannon left a comment

Choose a reason for hiding this comment

dylanbannon left a comment •

edited

Loading