Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When hard refresh fails with an exit code, make SILO exit #3151

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

theosanderson
Copy link
Member

This is an RFC.

Currently if a Silo hard refresh fails this doesn't trigger the container to exit and so can't be seen, for example, in the argoCD status. This is confusing. At some point we should make this cause the container to exit. However this could lead to downtime of the query API so potentially we could do it only after splitting up preprocessing into different pods than the main SILO. But that only makes sense if we expect the refresh to fail which I'm not sure we do?

@theosanderson theosanderson added the discussion Open questions label Nov 1, 2024
@theosanderson
Copy link
Member Author

theosanderson commented Nov 1, 2024

Specifically atm I think we have an inconsistency between what happens in a hard refresh and a non hard refresh in silo preprocessing (we should check if that's true). The inconsistency seems bad.

Copy link
Contributor

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure having preprocessing fail is without negative side effects. There might have been a reason that I didn't make it exit. But we can try.

Note that this PR only makes it exit in the first invocation. Not in the second. I think we want same treatment in both cases.

@corneliusroemer
Copy link
Contributor

Why do you think there's an asymmetry? We never exit if there are errors afaict. It's only the logging that's different, no exit code logged in second invocation with Etag.

Preprocessing can crash for example if backend is down, or if keycloak is down. I'm not sure it should exit in those cases. Maybe we can decide what to do based on exit codes. If silo proper fails, sure, we can exit, but not if there's a network issue in curl for example?

@theosanderson
Copy link
Member Author

Why do you think there's an asymmetry? We never exit if there are errors afaict. It's only the logging that's different, no exit code logged in second invocation with Etag.

Yes having looked I agree - and I will add crossed lines to my post above. I was writing quickly with my interpretation of the patterns we observed today, which I now suspect were incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open questions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants