Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow short name matching #857

Closed
dcarrion87 opened this issue Jun 28, 2023 · 3 comments
Closed

Allow short name matching #857

dcarrion87 opened this issue Jun 28, 2023 · 3 comments
Assignees
Labels
Pending-Release Pending an NTH or eks-charts release stale Issues / PRs with no activity Type: Enhancement New feature or request

Comments

@dcarrion87
Copy link

dcarrion87 commented Jun 28, 2023

Using SQS Mode when the event comes in it's checking against the FQDN only:

E.g.

2023/06/28 05:10:57 WRN Unable to list Nodes w/ label, falling back to direct Get lookup of node
2023/06/28 05:10:57 ERR Unable to fetch node labels for node 'ip-10-X-X-X.ap-southeast-2.compute.internal'  error="nodes \"ip-10-X-X-X.ap-southeast-2.compute.internal\" not found"

But the kubernetes.io/hostname assigned by K3s servers is the short name as we're running on Ubuntu. E.g. just ip-10-X-X-X.

Don't see any way to instruct the node termination handler to do a short name match. Any ideas and/or thoughts welcomed.

At this stage I have overriden host to fqdn.

@ssoriche
Copy link

ssoriche commented Jul 6, 2023

We're experiencing the same issue with kubernetes 1.24 running under kOps. The node names are now the instance IDs, and node termination handler is trying to apply the taint by the hostname, not the node name.

When looking through the logs, anywhere else node_name is mentioned, the value is correct, the taint application log line is:

ERR Unable to taint node with taint aws-node-termination-handler/asg-lifecycle-termination:asg-lifecycle-term- error="Unable to fetch kubernetes node from API: nodes "i-123456789abcdef.us-west-2.compute.internal" not found

however the next line is:

INF Pods on node node_name=i-123456789abcdef pod_names=...

has the correct node name

I do see this in the logs though:

INF Requesting instance drain event-id=asg-lifecycle-term-
 instance-id=i-123456789abcdef kind=ASG_LIFECYCLE node-name=i-123456789abcdef.us-west-2.compute.internal provider-id=aws:///us-west-2b/i-123456789abcdef

which indicates the node name is wrong, the instance-id and provider-id are correct.

I've enabled the flag --use-provider-id=true

@github-actions
Copy link

github-actions bot commented Sep 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this issue to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

@github-actions github-actions bot added the stale Issues / PRs with no activity label Sep 3, 2023
@github-actions
Copy link

github-actions bot commented Sep 9, 2023

This issue was closed because it has become stale with no activity.

@github-actions github-actions bot closed this as completed Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pending-Release Pending an NTH or eks-charts release stale Issues / PRs with no activity Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants