Make entrypoint script fail if any step fails, remove unnecessary command substitution #839
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available: None
Description of changes:
The entrypoint script added in #735 is not run with
set -e
, so if any of the commands in it fails (such as installing the CNI), the script will silently succeed. This is a regression, since the previous code would catch this and error out.I ran into this when I noticed some nodes on our cluster running 1.6rc6 were failing to start pods with CNI errors, despite having an apparently-healthy CNI pod running. Upon looking at the pod's logs, I saw the following:
The issue turned out to be on our end: an overly-aggressive memory limit (which worked fine on rc4, but the shell script approach increases the peak memory required by quite a bit). But nevertheless,
cp
failed, and the script blindly charged on, when it should probably have crashed (which would have drawn my attention to it much more quickly).The main change I made to the script was to add
set -e
, which will make the script fail if any command it invokes fails (if not in the context of anif
or||
or similar). While I was at it, I made two additional changes that aren't causing issues but are probably good ideas:set -u
, which will make the script abort on undefined variable access (helps with typos)$(...)
(command substitution) when they don't intend to execute the output of the command inside. One instance was completely unnecessary, the other I replaced with{ }
to group the commands.I'm happy to revert either or both of these extra changes, but they jumped out at me while I was making the first one so I figured I'd submit them.
I've tested this change on our development cluster and it doesn't seem to have any issues. Additionally, I simulated an issue where it couldn't write to the
aws-cni
binary (by creating a bogus symlink), and confirmed that the pod failed to start with the following log:which is the expected outcome with this change.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.