Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make API resilient to a Redis outage #3385

Closed
AetherUnbound opened this issue Nov 21, 2023 · 1 comment · Fixed by #3505
Closed

Make API resilient to a Redis outage #3385

AetherUnbound opened this issue Nov 21, 2023 · 1 comment · Fixed by #3505
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API

Comments

@AetherUnbound
Copy link
Collaborator

Description

Presently, our API relies heavily on Redis and expects it to be highly available. This can be seen by performing the following:

  1. just a (start the API)
  2. docker stop openenverse-cache-1
  3. Visit http://localhost:50270/v1/images/?q=cat

This high dependence on Redis makes it difficult for us to make changes to the cache, e.g. version upgrades like #3382.

We will need to build some resiliency into the API in every instance where one of the caches is used:

CACHES = {
# Site cache writes to 'default'
"default": _make_cache_config(0),
# For rapidly changing stats that we don't want to hammer the database with
"traffic_stats": _make_cache_config(1),
# For ensuring consistency among multiple Django workers and servers.
# Used by Redlock.
"locks": _make_cache_config(2),
# Used for tracking tallied figures that shouldn't expire and are indexed
# with a timestamp range (for example, the key could a timestamp valid
# for a given week), allowing historical data analysis.
"tallies": _make_cache_config(3, TIMEOUT=None),
}

In cases where we can continue without the cache (even at the detriment to performance), we should continue execution. If cache is absolutely required for a certain operation, we might issue an HTTP 424 for the request (though perhaps a 500 would make more sense?).

Additional context

This is part of the larger effort around upgrading Redis, see #3382

@AetherUnbound AetherUnbound added 💻 aspect: code Concerns the software code in the repository 🟧 priority: high Stalls work on the project or its dependents 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Nov 21, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Nov 21, 2023
@AetherUnbound AetherUnbound mentioned this issue Nov 21, 2023
3 tasks
@sarayourfriend
Copy link
Collaborator

we might issue an HTTP 424 for the request (though perhaps a 500 would make more sense?).

I think 424 is the right option if we can handle it somehow. I think 5xx generally means "something unexpected went wrong and we don't know why"?

@AetherUnbound AetherUnbound moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Nov 28, 2023
@dhruvkb dhruvkb self-assigned this Nov 29, 2023
@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Dec 10, 2023
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants