Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download backend: Request all thumbnails for channel #584

Merged
merged 2 commits into from
Jun 20, 2023

Conversation

dbnicholson
Copy link
Member

@dbnicholson dbnicholson commented Jun 9, 2023

In order to provide a rich display of unavailable content, thumbnails for the missing content is desired. When downloading the starter pack content, request all thumbnails for each channel. This is a new option for the remotecontentimport Kolibri task added after 0.16.0-alpha14. The option is ignored on older releases.

#549

@dbnicholson dbnicholson requested review from manuq and dylanmccall June 9, 2023 23:01
@dbnicholson
Copy link
Member Author

Since learningequality/kolibri#10780 was merged, I tested this out on my local checkout. Good news - it works. Bad news - it significantly slows down the initial starter pack download. I tried this with the Scientist starter pack since it appeared to be the smallest and it was spending a lot of time downloading thumbnails. Presumably this is because it includes wikiHow, which has 88 MB of thumbnails.

@wjt
Copy link
Member

wjt commented Jun 12, 2023

I tried this with the Scientist starter pack since it appeared to be the smallest and it was spending a lot of time downloading thumbnails. Presumably this is because it includes wikiHow, which has 88 MB of thumbnails.

Might it work to download the thumbnails later as a separate background step that happens after the onboarding flow?

@manuq manuq marked this pull request as draft June 12, 2023 13:19
@manuq
Copy link
Collaborator

manuq commented Jun 12, 2023

This looks good. Moving it to draft to avoid merging it until Kolibri 0.16 alpha15 is released.

@dbnicholson
Copy link
Member Author

dbnicholson commented Jun 12, 2023

I tried this with the Scientist starter pack since it appeared to be the smallest and it was spending a lot of time downloading thumbnails. Presumably this is because it includes wikiHow, which has 88 MB of thumbnails.

Might it work to download the thumbnails later as a separate background step that happens after the onboarding flow?

Yeah, that's a good idea. I believe we can just start tasks to do that from the backend and it will happen in the background. I've never done it exactly that way, but I don't think it should be too hard.

@dbnicholson
Copy link
Member Author

I changed it so that once the initial download completes, the remaining thumbnails are queued up for downloading in the background. It worked, but I can't say this is particularly robust since the background tasks aren't tracked and I'm not sure if Kolibri will restart them if they fail.

Here's a screenshot with TechBridge Girls showing 2 unavailable nodes after I ran kolibri with KOLIBRI_NAVIGATE_UNAVAILABLE=true:

image

@manuq
Copy link
Collaborator

manuq commented Jun 13, 2023

@dbnicholson it is a bit hacky but I think it will work. As long as the for loop (the one after marking "COMPLETED") ends without Kolibri stopping. Once the tasks are scheduled Kolibri persists them and they are ran across Kolibri sessions.

The less-hacky way is: direct the collection download from the App root Vue component (kolibri_explore_plugin/assets/src/app.js). Today it is directed from the Download page (kolibri_explore_plugin/assets/src/views/DownloadPage.vue).

@manuq
Copy link
Collaborator

manuq commented Jun 13, 2023

@dbnicholson the other problem I see is that the UI won't be refreshed with the thumbnails after the background download completes. That's another other reason to have a frontend component directing/monitoring the background job.

@dbnicholson
Copy link
Member Author

@dbnicholson it is a bit hacky but I think it will work. As long as the for loop (the one after marking "COMPLETED") ends without Kolibri stopping. Once the tasks are scheduled Kolibri persists them and they are ran across Kolibri sessions.

The less-hacky way is: direct the collection download from the App root Vue component (kolibri_explore_plugin/assets/src/app.js). Today it is directed from the Download page (kolibri_explore_plugin/assets/src/views/DownloadPage.vue).

Agreed it is hacky. I probably need a bit more help here. I don't understand how directing it from the app component rather than the download view affects what's going on here. Could you explain?

Regardless, I think the idea would be that once the collection download completes then the frontend would call a separate backend API to initiate the background thumbnail download. Is that right? Would we want to track those tasks in the session state like the collection download?

@manuq
Copy link
Collaborator

manuq commented Jun 13, 2023

@dbnicholson it is a bit hacky but I think it will work. As long as the for loop (the one after marking "COMPLETED") ends without Kolibri stopping. Once the tasks are scheduled Kolibri persists them and they are ran across Kolibri sessions.
The less-hacky way is: direct the collection download from the App root Vue component (kolibri_explore_plugin/assets/src/app.js). Today it is directed from the Download page (kolibri_explore_plugin/assets/src/views/DownloadPage.vue).

Agreed it is hacky. I probably need a bit more help here. I don't understand how directing it from the app component rather than the download view affects what's going on here. Could you explain?

I wish I have a clear understanding too! My gut feeling is that we want to monitor the background task in order to refresh the nodes thumbnail property in the frontend state after the download is completed. And we want that to happen in any view, not just the download view. The component that is always present is the root app, the parent of any every component.

Regardless, I think the idea would be that once the collection download completes then the frontend would call a separate backend API to initiate the background thumbnail download. Is that right? Would we want to track those tasks in the session state like the collection download?

Yes, maybe split the foreground collection stages from the background stages (extra metadata and thumbnails). My point is that this will need more thinking to happen in the background. Maybe start easy by adding this in fhe foreground first?

About persisting, we decided to go with session state for the download. Maybe it deserves a Django model to persist it in the database? I wouldn't change it for now.

@dbnicholson
Copy link
Member Author

@dbnicholson it is a bit hacky but I think it will work. As long as the for loop (the one after marking "COMPLETED") ends without Kolibri stopping. Once the tasks are scheduled Kolibri persists them and they are ran across Kolibri sessions.
The less-hacky way is: direct the collection download from the App root Vue component (kolibri_explore_plugin/assets/src/app.js). Today it is directed from the Download page (kolibri_explore_plugin/assets/src/views/DownloadPage.vue).

Agreed it is hacky. I probably need a bit more help here. I don't understand how directing it from the app component rather than the download view affects what's going on here. Could you explain?

I wish I have a clear understanding too! My gut feeling is that we want to monitor the background task in order to refresh the nodes thumbnail property in the frontend state after the download is completed. And we want that to happen in any view, not just the download view. The component that is always present is the root app, the parent of any every component.

OK, I think I understand. We more or less want something attached to the root app that tracks any outstanding download tasks (whether foreground or background) so that any view can see the status of those tasks if they need to.

Regardless, I think the idea would be that once the collection download completes then the frontend would call a separate backend API to initiate the background thumbnail download. Is that right? Would we want to track those tasks in the session state like the collection download?

Yes, maybe split the foreground collection stages from the background stages (extra metadata and thumbnails). My point is that this will need more thinking to happen in the background. Maybe start easy by adding this in fhe foreground first?

Note that @dylanmccall is doing something very similar in #579. I'm going to sync up with Dylan to see if we can converge on a way to handle these background downloads. If that doesn't seem like it's going to materialize quickly then I'll go back to the original commit that simply adds all_thumbnails=True to the content download. It really slows things down, though.

About persisting, we decided to go with session state for the download. Maybe it deserves a Django model to persist it in the database? I wouldn't change it for now.

I think the session state is fine. Kolibri already has all the information we need as long as we keep the job IDs. A couple lists of job IDs stored in the session state seems fine to me.

@manuq
Copy link
Collaborator

manuq commented Jun 14, 2023

Note that @dylanmccall is doing something very similar in #579

Oh that's great! Seems to be the way to go for these thumbnails and the extras metadata.

@dbnicholson
Copy link
Member Author

This is ready for review again. I attempted to use the process from #579 to queue the background tasks and it didn't work. That method depends on the frontend to keep calling the ek-collections/update-download API to query and enqueue the tasks. Obviously it can't do that if it wants to move on.

Instead, I simply queue them in the background as before but a little nicer. I added a separate queue for background tasks, made the plugin restart them when the app starts, and made the enqueing happen in an explicit stage so that we don't reach COMPLETED until they're enqueued.

The idea is that the frontend can query the background and foreground queues for completion and update the UI as needed. I started to play with that but my Vue is too weak. I'm hoping someone else (looking at you @manuq) can implement that part since I believe it's generally needed to monitor the content tasks like the Device plugin does.

@manuq
Copy link
Collaborator

manuq commented Jun 16, 2023

@dbnicholson sorry for missing this one today. @dylanmccall could you review? Otherwise I'll be back on Tuesday.

Copy link
Collaborator

@dylanmccall dylanmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through this and it looks good to me. Adding a separate queue for background tasks definitely seems like the way to go, and we can work through the UI stuff separately 👍

@dylanmccall dylanmccall force-pushed the all-thumbnails branch 2 times, most recently from 040384d to 02c60b9 Compare June 20, 2023 00:47
In order to differentiate foreground tasks from background tasks, a
separate background job queue is added. When the plugin is loaded, it
will immediately restart any failed background tasks. No frontend
integration is added here, but the idea is that the frontend will query
both the foreground and background queues for completion and update the
UI as needed.
In order to provide a rich display of unavailable content, thumbnails
for the missing content are desired. After the initial starter pack
download has completed, queue up tasks to download all missing content
thumbnails in the background. This uses a new `all_thumbnails` option
for the `remotecontentimport` Kolibri task added after 0.16.0-alpha14.
The option is ignored on older releases.

This introduces a new `FOREGROUND_COMPLETED` stage in for the download
manager. It has no tasks in it, but it separates the foreground tasks
from the background tasks. Once `FOREGROUND_COMPLETED` has been reached,
progress is reported to the frontend as complete and the remaining tasks
will be enqueued in the background without any interaction from the
frontend.

Fixes: #549
@dylanmccall dylanmccall merged commit 5a2d5f8 into master Jun 20, 2023
@dbnicholson dbnicholson deleted the all-thumbnails branch June 20, 2023 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants