Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate 504 Gateway Timeout errors #3585

Closed
jkfran opened this issue Jun 22, 2021 · 7 comments
Closed

Investigate 504 Gateway Timeout errors #3585

jkfran opened this issue Jun 22, 2021 · 7 comments
Assignees

Comments

@jkfran
Copy link
Contributor

jkfran commented Jun 22, 2021

Investigate 504 errors, see https://forum.snapcraft.io/t/many-504-gateway-timeout-errors/23152

Could https://sentry.is.canonical.com/canonical/snapcraft-io/issues/8351/ be related?

@rhys-the-davies
Copy link

Hey @jkfran I see this has been pushed to this iteration, just wondering if you had a chance to look at it at all and if you could maybe update the discourse thread with progress? 😊

@jkfran
Copy link
Contributor Author

jkfran commented Jul 29, 2021

I consider these events to be related to the store API slowness or being down. We can't reproduce them. On the Discourse topic, someone is also mentioning issues with the CLI command, which would also mean that it is an issue with the API.

@jkfran
Copy link
Contributor Author

jkfran commented Jul 30, 2021

I am going to close the issue for now. If the issues are still happening we can take a look again.

@jkfran jkfran closed this as completed Jul 30, 2021
@popey
Copy link
Contributor

popey commented Aug 8, 2021

It still happens. I don't notice it as much since I left Canonical I have few snaps to maintain. But when I do, I almost always get a 504 at some point.

@lucyllewy
Copy link

yeah, this isn't fixed at all.

@lucyllewy
Copy link

@jkfran please reopen this.

@jkfran
Copy link
Contributor Author

jkfran commented Jan 21, 2022

Hello! In an effort to improve this situation we did an internal investigation. Here is a list of related changes we implemented:

#3832 Split publisher (Debugging purposes, it was reverted later on)
#53 Flask-base v0.9.3 - Include PID on talisker logs (Improve server logging)
#3836 Extra logging (Debugging purposes, some changes on this PR will be reverted soon)
#3865 Test with gevent.Timeout (Debugging purposes, we would like to keep it a bit longer)
#3866 Remove gunicorn max-requests (Actual fix of these issues)
#3867 Use two gunicorn workers (Configuration improvement)

I created a follow-up issue to revert unnecessary changes.

So far the cause of this issue disappear from our logs. We will keep an eye on it.

Please keep in mind that many parts of the website rely on the Snapcraft APIs, any downtime of these APIs will be reflected on snapcraft.io, the best way to find out if these APIs are down is status.snapcraft.io.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants