Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Extend valid link cache time to 30 days and make cache expiration times minimally configurable #878

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions api/catalog/api/utils/validate_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import django_redis
import grequests
from decouple import config

from catalog.api.utils.dead_link_mask import get_query_mask, save_query_mask

Expand All @@ -18,6 +19,10 @@ def _get_cached_statuses(redis, image_urls):
return [int(b.decode("utf-8")) if b is not None else None for b in cached_statuses]


def _get_expiry(status, default):
return config(f"LINK_VALIDATION_CACHE_EXPIRY__{status}", default=default, cast=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be bikeshedding but the __ in the name looks weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's new to the API. But it's cool 👍. Could you move this to settings (so that almost all env reading happens in one place) and also add it to the env.template file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the _get_expiry function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad. I didn't realise that the default= argument to config was set by the caller to _get_expiry. Please ignore my suggestion.



def validate_images(query_hash, start_slice, results, image_urls):
"""
Make sure images exist before we display them. Treat redirects as broken
Expand Down Expand Up @@ -69,14 +74,17 @@ def validate_images(query_hash, start_slice, results, image_urls):
# Cache successful links for a day, and broken links for 120 days.
if status == 200:
logger.debug("healthy link " f"key={key} ")
pipe.expire(key, twenty_four_hours_seconds)
expiry = _get_expiry(200, twenty_four_hours_seconds * 30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a variable called thirty_days_seconds might make this clearer.

elif status == -1:
logger.debug("no response from provider " f"key={key}")
# Content provider failed to respond; try again in a short interval
pipe.expire(key, thirty_minutes)
expiry = _get_expiry("_1", thirty_minutes)
else:
logger.debug("broken link " f"key={key} ")
pipe.expire(key, twenty_four_hours_seconds * 120)
expiry = _get_expiry("DEFAULT", twenty_four_hours_seconds * 120)

pipe.expire(key, expiry)

pipe.execute()

# Merge newly verified results with cached statuses
Expand Down