Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade manager fails to actually upgrade nodes because of AWS token issues #30

Closed
shrinandj opened this issue Dec 3, 2019 · 0 comments

Comments

@shrinandj
Copy link
Collaborator

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
Upgrade manager fails because it cannot get AWS credentials. It consistent fails with the following errors:

time="2019-12-03T05:09:43Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 2.898737352s"
time="2019-12-03T05:09:49Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 3.287156464s"
time="2019-12-03T05:09:51Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 3.542636997s"
time="2019-12-03T05:09:51Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 4.523316716s"
time="2019-12-03T05:09:52Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 4.209153918s"
time="2019-12-03T05:09:57Z" level=info msg="retryable: RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers) -- ec2metadata/GetToken, will retry after 4.761618292s"

This is most likely because of aws/aws-sdk-go#2972.

What you expected to happen:
There should be no errors and upgrade-manager should actually perform the upgrade.

How to reproduce it (as minimally and precisely as possible):
Just update the launch-config of an IG and run upgrade-manager.

Anything else we need to know?:
Manually changing the version of aws-sdk to v1.25.0 (something < 1.25.38) works.

Environment:

  • rolling-upgrade-controller version
  • Kubernetes version :
$ kubectl version -o yaml

Other debugging information (if applicable):

  • RollingUpgrade status:
$ kubectl describe rollingupgrade <rollingupgrade-name>
  • controller logs:
$ kubectl logs <rolling-upgrade-controller pod>
shrinandj added a commit to shrinandj/upgrade-manager that referenced this issue Dec 3, 2019
Testing Done:

- Verified that the new docker image can be built with aws-sdk v1.25.0
- Verified that rolling upgrade actually completed.
kianjones4 added a commit to kianjones4/upgrade-manager that referenced this issue Dec 9, 2019
* check error message instead of code for instance not found

* fix test

* Fixes for autoscaling API changes (keikoproj#26)

* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* Release v0.2 (keikoproj#28)

* Bump version to 0.3-dev. (keikoproj#29)

* Fix keikoproj#30 and release v0.3 (keikoproj#31)

Testing Done:

- Verified that the new docker image can be built with aws-sdk v1.25.0
- Verified that rolling upgrade actually completed.

* Bump version to 0.4-dev. (keikoproj#32)

* Added uniformAcrossAzUpdate strategy (keikoproj#27)

* Modified ClusterState to capture AZ details of a node

* - Added UniformAcrossAzUpdate strategy
- Added Unit tests (More to come)

* Added Unit tests

* Updated validation

* Added uniformAcrossAzUpdate update strategy sample

* Updated README

* Externalize node selectors behind interface

* Removed unnecessary comments

* Log maxUnavailable value
shrinandj pushed a commit that referenced this issue Dec 19, 2019
* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* Add badge (#9)

* check error message instead of code for instance not found

* fix test

* Fixes for autoscaling API changes (#26)

* Add semaphore.yml

* build

* build

* squashed commits

* remove badge

* readme

* test

* pr

* fix kafka issue

* update to fix potential throttling issue

* Mod fix (#7)

* revert go sum

* fix mod build

* add go mod files for logger

* Fix terminate error (#8)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* Release v0.2 (#28)

* Bump version to 0.3-dev. (#29)

* Fix #30 and release v0.3 (#31)

Testing Done:

- Verified that the new docker image can be built with aws-sdk v1.25.0
- Verified that rolling upgrade actually completed.

* Bump version to 0.4-dev. (#32)

* Added uniformAcrossAzUpdate strategy (#27)

* Modified ClusterState to capture AZ details of a node

* - Added UniformAcrossAzUpdate strategy
- Added Unit tests (More to come)

* Added Unit tests

* Updated validation

* Added uniformAcrossAzUpdate update strategy sample

* Updated README

* Externalize node selectors behind interface

* Removed unnecessary comments

* Log maxUnavailable value

* Add badge (#10)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* update readme

* readme

* Add badge (#11)

* check error message instead of code for instance not found

* fix test

* add build badge

* fix image

* update readme

* fix mod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant