-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content data model should use File Storage. #29
Comments
For discussion purposes: What do you think about having a high-level API for reading/writing these blobs which transparently chooses FileField or BinaryField for you automatically? (Can we use As for OLX:
|
I think the access characteristics are different enough (10-100X latency) where the apps should be made to understand the tradeoffs and make a conscious choice about it. So in this case, whatever XBlock related apps/runtime we have that uses these backing models should have the logic for what goes where.
We can probably make the library use it, but in practice people are going to use it synchronously anyway...?
Yeah, I think we just set a limit, say 100K to start, and see if it breaks things. The largest thing I see in my current data set is about 50K for a particularly crazy ProblemBlock. I have definitely seen HTML ones that are much larger, but that was because they copy-pasted from a Word doc that included large base64-encoding images inline on the page. My stab at a guideline would be that if a Content is going to be used by server-side code during the rendering of the page, it should be stored in the BinaryField in the database. So that would include HTML files that are referenced by the OLX of the HTMLBlock. If the Content is only going to be delivered to the the client (images, videos, subtitles, etc.) it should go in file storage with a pointer in the Content. Some advantages of that approach:
Disadvantages:
Random thoughts:
|
Some more uncollected thoughts because I started hacking on this over the weekend... The File Serving Solution I Wish I Had Actually building one of these is hard, but what I wish I had was a server that I could instruct to serve this hash-identified file with this URL to this client. I would give each ContentVersion its own virtual space of sorts, so that the urls could look like So an example might be:
But it would really know to map that to the file named after its hash (e.g. The nice thing about having versioning done in the URL at the ComponentVersion level is that relative links will continue to work. If there's a piece of HTML that's referencing On the downside, this would mean that browsers wouldn't properly cache things across versions–when the video component is updated, it's very likely that this subtitle file didn't change, but it would look like a new file to the browser. I think that's acceptable, because content doesn't change that often relative to the people consuming it. It's the per-XBlock JS and CSS code files that are important to keep properly cached for client-side performance, and those would be accessed elsewhere. This would only be for user-uploaded media files. Since I'm making wishes, it would also have support for authenticated requests, range-requests, great caching, and a pony. |
... okay, this is a side thing, and probably completely impractical, but I'm totally going to try to hack that in starlette now. |
Said hack is: |
Updated approach: |
All files are currently stored as a BinaryField. That's okay for now, but it brings scalability problems over the long term:
On the other hand, there are conveniences with having this data in the database directly:
We can probably implement this via an optional FileField on Content, preserving the basic metadata in that model (mime_type, size, hash), and storing the data as hash-named files in a directory that represents a LearningPackage (using the UUID).
The big question is: How do we determine whether something is offloaded or not? We can do this purely by size, or we can flag it by type (with some size limits). For instance, we might say that image files will always be put onto the file system, while the OLX is always stored locally, with the caveat that the OLX cannot be larger than a certain limit. It might actually be best to leave this logic completely up to apps to decide since usage of the same type of file (JSON, XML) might vary depending on the application usage.
PRs related to this:
The text was updated successfully, but these errors were encountered: