Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download extra channels in the background #762

Merged
merged 7 commits into from
Aug 24, 2023
Merged

Conversation

dbnicholson
Copy link
Member

This introduces a database model to keep track of the background download tasks to run after the initial download completes. That allows serializing the tasks to run one at a time with a StorageHook as the needed tasks are tracked in a persistent location. With that in place, we can defer downloading the extra channels until after the initial download completes. Together these changes have 2 major benefits: the initial download finishes faster and the background downloads cause less load.

Fixes: #592

Copy link
Contributor

@pwithnall pwithnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have a small collection of thoroughly insignificant comments.

kolibri_explore_plugin/jobs.py Outdated Show resolved Hide resolved
kolibri_explore_plugin/jobs.py Outdated Show resolved Hide resolved
kolibri_explore_plugin/models.py Show resolved Hide resolved
kolibri_explore_plugin/jobs.py Outdated Show resolved Hide resolved
The only modules left out are the empty `__init__.py` package
initialization modules and the generated Django migration modules.
This separates the helpers for managing jobs from the task definitions
and the views using them. This should allow for cleaner reuse. There
should be no functional changes.

One important change here is that the task types are initially stored as
callable paths. This will allow the task data to be serialized later
without immediately resolving the task functions.
Using the Django app `ready` method wasn't the right approach. Those run
whenever Django loads the app and before the database models are
prepared. Instead, use a Kolibri `StorageHook`, which allows immediately
restarting the jobs. Previously jobs would only be restarted when the
Kolibri was restarted.

While here, include `CANCELED` jobs in the restart logic. We're not
cancelling these jobs, but the worker will cancel running jobs when it
stops. Since the storage hook runs even when the worker is shutting
down, this will requeue an incomplete job for when the worker next
starts.
@dbnicholson dbnicholson force-pushed the background-downloads branch from 422dd5f to 2804d97 Compare August 24, 2023 15:21
Use a database model to keep track of background tasks. This will allow
persistently keeping track of tasks that need to be run in the
background.
Since Kolibri will run as many jobs as there are available workers,
queueing up background tasks from the collection downloader will likely
start many in parallel. That does not make for an ideal user experience
as the app will make the system busy and prevent any user initiated
downloads from being started.

Instead, create the desired background tasks in the `BackgroundTask`
model and use the existing storage hook to enqueue the next background
task when one completes. This will ensure that only one background task
is running at a time.
When initial downloads complete, the user is shown the Discovery view
with the channels from the content pack. Since they're likely to explore
these channels first, prioritize downloading these thumbnails over the
extra channels from other manifests.
Now that there's a way to serialize background tasks, defer downloading
the extra channels so that the initial download completes faster. Since
the thumbnail tasks depend on this, they need to be generated on the fly
when the channel download completes.

Fixes: #592
@dbnicholson dbnicholson force-pushed the background-downloads branch from 2804d97 to e9de59d Compare August 24, 2023 15:22
@dbnicholson dbnicholson requested a review from pwithnall August 24, 2023 15:22
@pwithnall
Copy link
Contributor

pwithnall commented Aug 24, 2023

🚢 I’ll hit merge once CI has finished

@pwithnall pwithnall merged commit 8702e58 into master Aug 24, 2023
@pwithnall pwithnall deleted the background-downloads branch August 24, 2023 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download additional channel metadata in the background
2 participants