Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust back-end connection handling #18

Closed
soxofaan opened this issue Oct 12, 2021 · 9 comments
Closed

More robust back-end connection handling #18

soxofaan opened this issue Oct 12, 2021 · 9 comments

Comments

@soxofaan
Copy link
Member

At the moment, the aggregator creates connection objects to the back-ends at startup time, and re-uses these "infinitely".
This worked fine as proof-of-concept, but has some issues.

  • it would be good refresh state from time to time by dropping old connections and starting fresh (e.g. to make sure latest capabilities as discovered properly)
  • sometimes back-end go down, causing failure on most aggregator endpoints. It would be good if aggregator could continue working in best-effort way with back-ends that are still up
@m-mohr
Copy link
Member

m-mohr commented Oct 15, 2021

  • sometimes back-end go down, causing failure on most aggregator endpoints. It would be good if aggregator could continue working in best-effort way with back-ends that are still up

This is an important one as we found out the hard way during the Editor demo at the Launch event today. One back-end was offline and afterward, the Platform was mostly unusable for anything except discovery.

@soxofaan
Copy link
Member Author

FYI: is high prio on my planning now

@m-mohr some kind of warning system ("warning: this response is partial/incomplete/best effort") would be handy in this context. Did you see Open-EO/openeo-api#412 already?

@m-mohr
Copy link
Member

m-mohr commented Oct 19, 2021

No, sorry, that slipped through in my vacation, I think. I'll have a look, although this seems a bit out of scope for the core API spec and would more belong into an extension that handles federation aspects.

@m-mohr
Copy link
Member

m-mohr commented Oct 19, 2021

A related question is how clients should communicate this (assuming we go for the 206 status code). Except for the Web Editor, I don't really see yet how clients would communicate and handle this in a good way. Do you have any ideas yet, @soxofaan ?

@soxofaan
Copy link
Member Author

I don't really see yet how clients would communicate and handle this in a good way. Do you have any ideas yet, @soxofaan ?

In Python context, I would just trigger a logging.warning or warnings.warn , that's a pretty common thing to do. By default it will be shown in notebooks (message with red background) and non-notebook runs (standard error).

@m-mohr
Copy link
Member

m-mohr commented Oct 19, 2021

Sounds good to me, I think that's possible in all clients, just the Web Editor would need a bit more additional code for it.

@soxofaan
Copy link
Member Author

Web Editor would need a bit more additional code for it.

at minimum you could just do a console.warning I guess?

@m-mohr
Copy link
Member

m-mohr commented Oct 19, 2021

Sure, but 99+% of (targeted) Web Editor users would not look at the Browser console. I'd better open a toast warning or so, but the JS client right now doesn't support passing through such additional details while for the JS client itself a warning in the console would be enough. So most of the code will likely be written in the JS client itself...

soxofaan added a commit that referenced this issue Oct 27, 2021
current implementation fails to update OIDC provider id mapping
soxofaan added a commit that referenced this issue Oct 27, 2021
current implementation fails to update OIDC provider id mapping
soxofaan added a commit that referenced this issue Oct 27, 2021
current implementation fails to update OIDC provider id mapping
@soxofaan
Copy link
Member Author

merged #21 in develop:

  • instead of holding on to same connection objects to back-ends all the time, they are re-refreshed every 5 minutes (for now), so that changes in availability can be adapted to more properly
  • aggregator now also an start up when one back-end is down (before the aggregator could only be (re)started when all back-ends where up)
  • various other harderning and caching tweaks

This should cover the most important resilience problems. Will close this for now.
Open new ticket when we find new situations where resilience could be improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants