Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download thumbnails for content that is not available locally #549

Closed
wjt opened this issue May 26, 2023 · 12 comments
Closed

Download thumbnails for content that is not available locally #549

wjt opened this issue May 26, 2023 · 12 comments
Assignees
Milestone

Comments

@wjt
Copy link
Member

wjt commented May 26, 2023

In Kolibri's terms, these are not part of the channel's metadata but are content items in their own right.

When we download metadata for additional collections besides the one the user picks (#548) we'll then want to download thumbnails for the content in those collections, so that users can explore it (#545) visually.

@wjt
Copy link
Member Author

wjt commented May 26, 2023

I wrote:

Can someone more experienced than me check how many files and how much data this corresponds to for all the thumbnails on key.endlessos.org (or give me a crash course on the Kolibri data model and how to get an interactive Python console with the Django ORM models loaded? :) )

@manuq wrote:

My pleasure! The tricky part is to have the corresponding nodes in the database:

from kolibri.core.content.models import ContentNode
total = 0
for c in ContentNode.objects.all():
    f = next((f for f in c.files.all() if f.thumbnail), None)
    if f is not None:
        total += f.get_file_size()
print(total)

To start having an idea I ran the above with my current database which has a fresh artist-0001 just imported. I also calculated the thumbnails of available content only, by changing the above to ContentNode.objects.filter(available=True):

  • total thumbs size = 142.9 Megabytes (142954746 bytes) for all 955 nodes in all 10 channels imported by artist-0001.json
  • available thumbs size = 5.2 Megabytes (5249466 bytes) for 58 available nodes

The real measure would be to import all metadata for each JSON files that represent the EK collection and filter by them. Something like:

ek_node_ids = ['1520f018610256549c98ca0140cceebe', 'deb6566eede6513c9f262f367c2b5f8d', ...]
ContentNode.objects.filter(id__in=ek_node_ids)

@wjt wjt added this to the Ladybug milestone May 26, 2023
@dbnicholson
Copy link
Member

I ran the following script on the prod key instance:

#!/usr/bin/env python3

from collections import defaultdict
from kolibri.core.content.models import File, ChannelMetadata
from operator import itemgetter


def nice_size(num):
    for unit in ('bytes','KB','MB','GB'):
        if num < 1024.0:
            return "%3.1f %s" % (num, unit)
        num /= 1024.0
    return "%3.1f %s" % (num, 'TB')


total = 0
channels = defaultdict(int)
for thumbnail in File.objects.filter(thumbnail=True):
    size = thumbnail.local_file.file_size
    total += size
    channels[thumbnail.contentnode.channel_id] += size

nice_total = nice_size(total)
print(f'Total {nice_total} ({total})')
for channel_id, thumbnail_size in sorted(channels.items(), key=itemgetter(1), reverse=True):
    channel_name = ChannelMetadata.objects.get(id=channel_id).name
    channel_nice = nice_size(thumbnail_size)
    print(f'{channel_id} ({channel_name}) {channel_nice} ({thumbnail_size})')

And I ran that by having it be read on stdin like kolibri manage shell < sizes.py when run as the kolibri user with all the environment variables from /etc/default/kolibri set.

Here's what it came up with:

Total 1.7 GB (1844833202)
c9d7f950ab6b5a1199e3d6c10d7f0103 (Khan Academy (English - US curriculum)) 1.1 GB (1186440792)
7aca54975a2c415c888d5fe73e0e8163 (हिन्दी) 166.5 MB (174574651)
59b8deeb90f544da923187e77c8d3820 (wikiHow) 88.1 MB (92409113)
914fee213ee146de869016c287116b23 (Chapter Books) 55.2 MB (57849018)
000409f81dbe5d1ba67101cb9fed4530 (Touchable Earth (en)) 50.4 MB (52894914)
bbb4ea407a3c450cb18cbaa76f2d75cd (CSpathshala (English)) 47.5 MB (49830241)
08897e003ea9489eb3d86fc94ba08c21 (Українською) 22.6 MB (23665950)
74f36493bb475b62935fa8705ed59fed (Thoughtful Learning) 20.8 MB (21826123)
f061fce103ff5d4e9b8433e67802e666 (Arts & Crafts) 20.3 MB (21326248)
79cd09863eed51e98576c35ede6f9c9d (Cooking) 16.0 MB (16797114)
fc47aee82e0153e2a30197d3fdee1128 (Open Stax) 15.4 MB (16113723)
2f95235c3709511fa12d007f31ed6a7b (STEAM) 9.3 MB (9803758)
efcc464be5a85ba5a58d1636b00313fc (Gardening) 9.1 MB (9556010)
f5f6729f95b55753badeaa066fa6e986 (Healthy Body) 7.6 MB (7921762)
e9d0d54d209344849e9bed0aa8c222ad (Sikana DIY) 7.4 MB (7737800)
3fcffebc58d15175b948b140434ef6e6 (Sports) 7.2 MB (7531679)
0418cc231e9c5513af0fff9f227f7172 (Free English with Hello Channel) 7.0 MB (7367609)
97111903de564de49483a9705d41a8ac (Career Girls) 6.1 MB (6359663)
ee52db4a62a94e9683599af8782f2d03 (The SciGirls Collection (en español)) 5.5 MB (5807639)
1b1fc9bd453a4c52bb5628d9ae804ede (The SciGirls Collection) 5.5 MB (5782572)
92e96efc082e5c62b0aac3847bdcdb33 (Staff Playlist) 4.7 MB (4940529)
e11462f71c6f5472b113311c69071b05 (Dance) 4.7 MB (4934302)
197934f144305350b5820c7c4dd8e194 (PhET Interactive Simulations (English)) 4.3 MB (4508692)
1520f018610256549c98ca0140cceebe (Virtual Field Trips) 4.0 MB (4198784)
359e048230974c8f80db1a95dc80d544 (EiE Familias) 3.9 MB (4092851)
9c33eb395508447d96c96682cb18c57a (Techbridge Girls @ Home) 3.6 MB (3802707)
f1ada7abc4194ff48a958337a31972c7 (EiE Families) 3.6 MB (3749048)
bcc6e12a0ddf4a17a8b600c6b880e3ed (Common Sense Student Resources) 3.3 MB (3499386)
2091ca47ff544c96b4ae02b3a92346e1 (TED-Ed) 3.1 MB (3298810)
bf0260ed911f44cda27a263db93a8512 (49ers EDU Digital Playbook) 2.6 MB (2697563)
4968191fba07548c9592fc174a70b5d6 (Beauty) 2.5 MB (2610982)
57e23812e0dc562581958e39acedd717 (Games & Gaming) 2.5 MB (2573844)
e409b964366a59219c148f2aaa741f43 (Blockly Games) 2.2 MB (2260272)
4e413158eac55422a5343af9fcfa8d59 (Healthy Mind) 2.1 MB (2162902)
2b43973f53f1538bad5ece63ad847606 (Financial Literacy) 2.0 MB (2143450)
3160899a73564d8a8467284d9219b91c (Terminal Two) 2.0 MB (2124581)
057f871caa405ec29d62ba0523c193d7 (Music) 2.0 MB (2072904)
bf36d8e7e1ee56b194fe52cafbfd9db3 (Fashion) 1.8 MB (1863063)
a8e6591f1afa426d859318a0a29d1237 (SAMHSA) 1.5 MB (1587918)
eb4373b5da054c07879d0c969dc1976a (Virtual Science Teachers) 1.2 MB (1281591)
b40491d1ef8b5506b8c6ae861372e9de (Jewelry Making) 1.1 MB (1191929)
79a50be66bad5eb686c42617c914fd45 (Careers) 908.4 KB (930183)
85b42a40745f4e2392ed62e72d4dad6e (OceanX) 616.0 KB (630786)
f62db29be20453c4a267132e93a9e602 (Wikipedia) 77.9 KB (79746)

Note that I did not filter on available as the current expectation is that we'd want to ship all the content thumbnails for a channel so that the full channel can be browsed.

@dbnicholson
Copy link
Member

I started looking at how we would ingest all the thumbnails for a channel rather than just the thumbnails for the desired content nodes. It looks like it will need some work. Kolibri's importing works at the content node level, but thumbnails are a level below content nodes. Kolibri has no "import just the thumbnail for a content node but not the actual content" knob. I think there are 3 options:

  1. Add an all_thumbnails option to the Kolibri's content import methods ASAP. This has to be wired all the way from the API interface down to the file selection function.
  2. Provide our own import API handler in the explore plugin that duplicates Kolibri's remote download manager and file selection function.
  3. Provide an out of band method in the explore plugin for fetching all the content thumbnails for a given channel and then manually import them into the Kolibri database.

@dbnicholson
Copy link
Member

I asked on Slack if the all_thumbnails feature would be acceptable and Richard said yes, so I started working on that. I'm a bit bogged down in testing but it doesn't seem too hard.

@dbnicholson
Copy link
Member

I opened learningequality/kolibri#10770 upstream. learningequality/kolibri@develop...dbnicholson:kolibri:all-thumbnails has my WIP branch, but I'm still working through some test failures.

@dbnicholson
Copy link
Member

PR ready upstream learningequality/kolibri#10780.

@erikos
Copy link
Contributor

erikos commented Jun 14, 2023

The app has been updated to Kolibri v0.16.0-alpha15. Ready to make use of the all_thumbnails feature.

@manuq
Copy link
Collaborator

manuq commented Jun 14, 2023

I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.

@manuq
Copy link
Collaborator

manuq commented Jun 16, 2023

Cleanup merged.

dbnicholson added a commit that referenced this issue Jun 16, 2023
In order to provide a rich display of unavailable content, thumbnails
for the missing content are desired. After the initial starter pack
download has completed, queue up tasks to download all missing content
thumbnails in the background. This uses a new `all_thumbnails` option
for the `remotecontentimport` Kolibri task added after 0.16.0-alpha14.
The option is ignored on older releases.

This introduces a new `FOREGROUND_COMPLETED` stage in for the download
manager. It has no tasks in it, but it separates the foreground tasks
from the background tasks. Once `FOREGROUND_COMPLETED` has been reached,
progress is reported to the frontend as complete and the remaining tasks
will be enqueued in the background without any interaction from the
frontend.

Fixes: #549
@dbnicholson
Copy link
Member

dbnicholson commented Jun 16, 2023

I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.

Sorry, I missed this message. I can still do that and punt on backgrounding, but I think the current iteration of #584 works nicely minus the frontend integration.

dylanmccall pushed a commit that referenced this issue Jun 20, 2023
In order to provide a rich display of unavailable content, thumbnails
for the missing content are desired. After the initial starter pack
download has completed, queue up tasks to download all missing content
thumbnails in the background. This uses a new `all_thumbnails` option
for the `remotecontentimport` Kolibri task added after 0.16.0-alpha14.
The option is ignored on older releases.

This introduces a new `FOREGROUND_COMPLETED` stage in for the download
manager. It has no tasks in it, but it separates the foreground tasks
from the background tasks. Once `FOREGROUND_COMPLETED` has been reached,
progress is reported to the frontend as complete and the remaining tasks
will be enqueued in the background without any interaction from the
frontend.

Fixes: #549
@manuq
Copy link
Collaborator

manuq commented Jun 20, 2023

Deployed:

kolibri-explore-plugin v6.17.0
Android internal testers: Ladybird 6.17-348
Windows alpha test flight: v6.17.0

@erikos
Copy link
Contributor

erikos commented Jun 22, 2023

Depending on how fast you press the "show me" button after the content has been downloaded during the onboarding the thumbnails might be there or not. The download of the thumbnails might still be ongoing. There is no way to tell the user "new thumbnails, wanna refresh?" yet - this is covered https://github.com/orgs/endlessm/projects/3/views/8?pane=issue&itemId=31379558

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants