Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hcloud manager errors: Couldn't reconcile node routes error listing routes context deadline exceeded #308

Closed
mmpetarpeshev opened this issue Aug 30, 2022 · 21 comments · Fixed by #309
Assignees
Labels

Comments

@mmpetarpeshev
Copy link

mmpetarpeshev commented Aug 30, 2022

We are using hcloud manager in cluster deployed on Hetzner VMs. Hcloud manager is deployed with network support. After few days it started hit the hetzner cloud api limits and log the following errors :

E0830 16:20:57.819595 1 route_controller.go:118] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: 192.168.1.7 hcops/AllServersCache.getCache: Get "https://api.hetzner.cloud/v1/servers?page=1&per_page=50": context deadline exceeded

Hcloud manager is deployed with kubespray addons and without some specific configurations.
It doesnt look to effect the cluster somehow for now , but from the logs it looks like an issue and effects our terraform commands even that they are using different api keys for hetzner cloud api.

@ym
Copy link
Contributor

ym commented Sep 4, 2022

We have had the same issues recently and constantly hit the Hetzner cloud's rate limit probably due to retries.

Also, the document says the rate limit is per project, not per API key and the support team refused to increase rate limit :-(

@talex-de
Copy link

talex-de commented Sep 4, 2022 via email

@4ND3R50N 4ND3R50N self-assigned this Sep 5, 2022
@4ND3R50N 4ND3R50N linked a pull request Sep 5, 2022 that will close this issue
@mmpetarpeshev
Copy link
Author

@LKaemmerling Thanks linked the fix to the Issue .Do you know , when it will be released ?

@4ND3R50N
Copy link
Contributor

4ND3R50N commented Sep 6, 2022

We'll release it this week, maybe tomorrow! I'll keep you up to date 👌🏼

@mmpetarpeshev
Copy link
Author

mmpetarpeshev commented Sep 18, 2022

After the release , hcloud manager still hits the api limit :

E0918 04:11:36.921648 1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: hcops/AllServersCache.getCache: Get "https://api.hetzner.cloud/v1/servers?page=1&per_page=50": context deadline exceeded

Honestly , thats not serious , how to deploy your production workloads in Hetzner in that case?

@mmpetarpeshev
Copy link
Author

Please reopen , we cant use our terraform , because the hcloud manager always hits the API limit.

@4ND3R50N 4ND3R50N reopened this Sep 23, 2022
@4ND3R50N
Copy link
Contributor

4ND3R50N commented Sep 26, 2022

Please reopen , we cant use our terraform , because the hcloud manager always hits the API limit.

@mmpetarpeshev Sorry for the late reply. We will ofc take care of this! Two questions here:

  1. Do you use the newest version? (v1.13.0)
  2. If yes, is the error still the same?

I just want to make sure that its only the API limits hitting you. The context deadline exceeded was kinda blurry error message. The newest version should print a more specific error (besides the deadline exceeded)

@mmpetarpeshev
Copy link
Author

Hi @4ND3R50N , thanks for taking care of that.

1.We are using docker image tag as I pull the image few days ago.Will check later today the version from the logs.
2.Error message was little bit different I think , something like :

route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)

Will check everything later today and will provide update

@4ND3R50N
Copy link
Contributor

@mmpetarpeshev

Ok, good news, so its the API limit. Dont worry, this is an internal mechanism to prevent spam. I will talk to some collegues how we gonna proceed with those cases since youre not the only one having trouble with it.

Waiting for your final update, i will also keep u up to date :-)

@mmpetarpeshev
Copy link
Author

mmpetarpeshev commented Sep 26, 2022

I checked the logs and there is the line : Hetzner Cloud k8s cloud controller v1.9.1 started
I tried with latest docker image tag and with v1.13.0. Tried with helm deployment and ansible (aka daemon set or deployment).
Not sure is that the correct version as you said v1.13.0 , from what I saw the docker images latest tag is updated few days ago.

@ym
Copy link
Contributor

ym commented Sep 30, 2022

@mmpetarpeshev

Ok, good news, so its the API limit. Dont worry, this is an internal mechanism to prevent spam. I will talk to some collegues how we gonna proceed with those cases since youre not the only one having trouble with it.

Waiting for your final update, i will also keep u up to date :-)

Hi, is there any ETA of this issue? We're still constantly hitting this issue even after upgrading to v1.13.1.


E0930 08:01:14.976446       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:15.832617       1 node_controller.go:364] Failed to update node addresses for node "us-east1-prd-worker-13": failed to get node address from cloud provider that matches ip: 10.241.0.28
E0930 08:01:15.932972       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:18.931466       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:19.040918       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:19.677038       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)

Tried to ask support to increase the API limit temporarily but they said no.

@LKaemmerling
Copy link
Member

LKaemmerling commented Sep 30, 2022

Tried to ask support to increase the API limit temporarily but they said no.

@ym can you please the ticket ID to us or reply to this ticket with the explicit mention of my name?

@ym
Copy link
Contributor

ym commented Sep 30, 2022

@LKaemmerling

Thanks, the ticket ID is #2022083103009613

@LKaemmerling
Copy link
Member

@ym you will get an answer :)

We want to debug this even further. With one of the last releases, we got a contribution that added metrics to all API calls (#303). You should be able to see how often specific endpoints were called by looking at the metrics of the CCM. Can you send us maybe a screenshot from your grafana dashboard - or if possible - send us access to this dashboard via mail to lukas.kaemmerling(at)hetzner-cloud.de ?

@LKaemmerling
Copy link
Member

@ym okay you won't get a mail :D i have the honor to say that your limit was just increased :)

@Kjarrigan
Copy link

@ym And we apologize for the trouble, because

Sad that Hetzner is not willing to increase the rate limits

we do increase API Limits for various use-cases. In this case the request was unfortunately not forwarded to the responsible department. We already contacted the support to refresh the knowledge of the proper workflow for these requests.

@maaft
Copy link

maaft commented Nov 1, 2022

I'm also currently running into rate limit issues.. Are there any plans to maybe increase limits for endpoints used by this CCM?

Especially when doing maintenance on your cluster (adding nodes, testing nodes, removing nodes, ... ) you'll be rate limited very fast. It's quite annoying tbh.

@mmpetarpeshev
Copy link
Author

mmpetarpeshev commented Nov 1, 2022

I'm struggling everyday with that ,if I hadn't invested so much time to deploy k8s and all apps in hetzner , the first thing that I would do is to move out . Sorry guys , but thats absolute amateur work here, the worst api service that seen ever.

@LKaemmerling
Copy link
Member

@maaft @mmpetarpeshev could you please try to do what I requested here: #308 (comment)

We need to understand what you cluster is doing :)

@mmpetarpeshev
Copy link
Author

thanks @LKaemmerling will try these days to get these metrics and provide it to you.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2023

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@github-actions github-actions bot added the stale label Jan 1, 2023
@github-actions github-actions bot closed this as completed Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants