-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow including all thumbnails in import #10780
Conversation
Also, if you prefer, I can squash the last 4 commits together. |
Build Artifacts
|
"fields": { | ||
"available": true, | ||
"extension": "png", | ||
"file_size": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is with setting the file size of these to null
. Our current fixture is terrible and already has some completely unrealistic data in it, so my thought is not to make it any more unrealistic by having files with null size.
I assume you maybe tried with a non-null file size and lots of tests broke?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed what was already in use. I can try setting them to 1 and see what happens. I did run into a few other hard to debug issues touching this fixture, though. It's actually how I ran into #8255 since it seems some of the tests actually depend on not only real network requests but for the resources to exist on studio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - in that case, we can leave as is, it's been a long time since I manually edited this fixture!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed them to 1 and it just changed the test expectations.
My only other question here is that it seems like you need to import a specific resource in order to import all the thumbnails? Is this the desired behaviour? It seems like just being able to import all the thumbnails at the same time the channel metadata is imported seems more desired? So would adding this to the import channel task/management command be better for your purposes? |
This helps exercise more of the `get_import_export_data` functionality that was previously doing nothing since there were no thumbnails in the test data. A few tests needed to be updated to account for the added files.
If you specify an empty |
Add an `all_thumbnails` keyword argument to `get_import_export_data` to request that all thumbnails in a channel should be included and not just those associated with the desired nodes and their associated topics. This can be useful for the frontend to display content that is available to download in a richer way. The option default to `False` to preserve the existing behavior.
The CLI option is `--all-thumbnails` and defaults to `False`.
fd090b8
to
7a1a6cf
Compare
Re-reading what you said, I see what you mean now. It's a bit more a property of the channel rather than the channel content. I still feel like this is the better way to go in the short term as the channel import code is currently very simple and adding some content import on the side would make it more complicated. What I think would be better long term is if the thumbnails weren't separately downloaded files but all inlined in the database like the channel thumbnail. To maintain the deduplication, I think a separate |
We do this currently for the channel thumbnails specifically - so it would not be a huge technical challenge - however, as you say, the backwards compatibility would be a big issue, and I do worry particularly about the channel database bloat. Given that a large channel can already have ~100MB of metadata in its database loading in several thousand image files seems like it would be a big change. As an example, the thumbnail images for Khan Academy English total about 600MB, which would make the total channel database size now ~750MB. Given we've already had significant issues raised from the size of the current channel databases, I am not inclined to think this is a good change. I think there may be a place for embedding thumbnails alongside metadata, but only in the context of more granular metadata preview and import as we are pushing on with 0.16. |
I did some research on this the other day over here. Khan Academy has by far the most thumbnails. Most of the channels we're looking at are under 20 MB of total thumbnails, though. Repeating that data here:
It's definitely a non-trivial amount of growth but not that excessive for most channels. Anyways, some other time. I'm pretty sure this should fill the gap for now. |
Summary
Add an option to include all thumbnails in import/export data and wire it up to be used when importing via the API or CLI. The idea is that you have a subset of a channel's nodes that you actually want to import but you want all the thumbnails so that you can display the nodes that aren't available in a visually appealing way.
Note that I only wired up the import side, but I don't think there's any reason it can't be wired up on the export side.
References
#10770
Reviewer guidance
This can be tested with the CLI. Here's an example using the "How to get started with Kolibri" channel:
After that you should find that all thumbnails in the channel. You can do something like this from the shell:
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)