-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data models for Component-level Versioning Only #28
Conversation
We've had a number of conversations since this ADR was created, in an effort to simplify the data model and bring the terminology more in line with existing Studio vocabulary.
FYI @bradenmacdonald |
FYI @feanil, @kdmccormick |
These were placeholders, and they're cluttering the PR at the moment. There might be some useful scraps to pull from them, but they exist in another branch we can look at later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee @feanil You guys were right, reading the description made this all very clear.
I like that you've scoped this down to just Content and Component for now. I think that models here are extremely reasonable and will be a solid base as we work our way back up the hierarchy.
I had been resisting Component-level versioning for a while, but in retrospect trying to do Segment- or Unit-level versioning just made things more complex. I'm not sure why I felt so strongly--maybe because allowing library components to be versioned separately sounds like a UX nightmare? Even so, the Library Authoring system could mandate that library components are versioned together, without that restriction having to appear at this level.
Just some nits/thoughts below but nothing blocking.
contents = models.ManyToManyField( | ||
"Content", | ||
through="ComponentVersionContent", | ||
related_name="component_versions", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should every ComponentVersion have at least one contents?
If so, consider noting that with a comment (or a db constraint, but I assume that's not possible).
Otherwise, consider setting blank=True
here, as that would let us construct a content-less CV in django admin.
contents = models.ManyToManyField( | |
"Content", | |
through="ComponentVersionContent", | |
related_name="component_versions", | |
) | |
contents = models.ManyToManyField( | |
"Content", | |
through="ComponentVersionContent", | |
related_name="component_versions", | |
blank=True, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh... I guess it's possible that we'd make one without Content (like just a title). I'd rather make the admin read-only altogether though, because ComponentVersion creation should include adding Publish entries rather than just direct model manipulation (I guess I really should do the api.py pieces of this soon).
# type is a way to help sub-divide namespace if that's convenient. This | ||
# field cannot be null, but it can be blank if it's not necessary. For an | ||
# XBlock, type corresponds to tag, e.g. "video". It's also the block_type in | ||
# the UsageKey. | ||
type = models.CharField(max_length=100, null=False, blank=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't love type
as a name, but trying to think about this field outside the context of XBlocks is so abstract and hypothetical that I have no better name to suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect that the only actual usage of this field will be for subdividing the ID namespace for XBlocks to match the historical behavior, then we could conceivably just call this field block_type
. I don't think it should be used outside of XBlocks, as having two levels of namespacing is not ideal.
Personally, I would probably remove this field and combine the block_type
and block_id
into the identifier
field with a special character string separator, and leave it up to the "identifier <=> usage key" logic to sort it out. But I'm completely fine with this approach here, so not suggesting we change this.
Hah, I was just thinking about this recently and I ended up concluding to myself that component-level versioning seemed like the way to go. Glad to see you trying it out and to hear it seems to be proving out so far from what you've tried. Ran out of time to review today, will check out more tomorrow or Fri :) |
|
||
Unit | ||
This is a list of one or more Segments that is displayed to the user on one page. A Unit may be stitched together using content that comes from multiple sources, such as content libraries. Units do not have to be strictly instructional content, as things like upgrade offers and error messages may also be injected. | ||
A Unit is an ordered list of one or more Components. A Unit is addressable in the browser at some URL, and is displayed together. A common use case might be to display some introductory Text, a Video, and some followup Problem (all separate Components). An individual Component in a Unit may or may not make sense when taken outside of that Unit–e.g. a Video may be reusable elsewhere, but the Problem referencing the video might not be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Unit is addressable in the browser at some URL
Does this imply that a Component is not accessible at some URL in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a leftover from when Segments were not going to necessarily be separately addressable.
I'm honestly not sure how I feel about Components being separately rendered vs. having implicit Units with single Components in them as the thing being rendered. I like the idea of having one type of thing being rendered at that level, and the idea that some Components aren't meant to be rendered in isolation. But that's super-hand-wavy at this point, so I'll just remove this wording and kick that decision down the road a little.
…ht also end up that way
@kdmccormick, @feanil, @bradenmacdonald: I think I addressed all the comments. Also added a couple small conveniences to the Django admin, a sprinkling of indexes, and some more fleshed out comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee This is excellent and I really like the direction. I don't have any major feedback, just a couple little questions. Nothing blocking. Looking forward to building on this!
about scalability issues. For instance, video files should not be stored | ||
here directly. There is a 10 MB limit set for the moment, to accomodate | ||
things like PDF files and images, but the itention is for the vast majority | ||
of rows to be much smaller than that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should data files over 10 MB be stored?
I would almost prefer a much lower limit like 250 kB and an integrated way to store and reference larger Content instances as objects on S3. I suspect you may already have implemented or planned something that but I forgot or don't see it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I've punted on this issue. I came to the same conclusion that you did: S3 objects (probably being django-storages) + database references, probably through an optional FileField on Content.
Captured this in #29
Architecture Guidelines | ||
----------------------- | ||
|
||
* We're keeping nearly unlimited history, so per-version metadata (i.e. the space/time cost of making a new version) must be kept low. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suppose there's any way to tell which versions are in use or not with this model, because any application may or may not have foreign keys to specific ComponentVersion
s?
(I'm wondering if we can auto-prune unused versions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can using Django ORM introspection or a mixin, but it wouldn't be perfect. I'm honestly hoping the per-version overhead is low enough now where it's okay not to prune, because that just makes content lifecycle management so much easier.
High Level Overview
This is a pared down version of the last experimental pull request, with Unit-level work and top level versioning of the entire LearningContext stripped out. A lot of other changes as well.
The
load_components
management command is functional and should pull in the XBlock contents for HTML, Problem, and Video types. However it doesn't yet pull in all the references files (e.g. HTML files, subtitle files, etc.) yet. It will pull in regex matched static asset references from HTML and Capa, but not things where it has to understand conventions based on XML attribute values (like how some video subtitles are specified).Apps
There are two apps at work here:
openedx_learning.core.publishing
has just enough to tie content together (LearningPackage) and to track when things get published (PublishLogEntry).openedx_learning.core.components
has the models needed to track Components, their versioning, the raw data they address, and the publishing status of those models. It depends on the publishing app.Versioning
Versioning happens at the level of each Component (e.g. an individual block, like a problem or video) via the ComponentVersion model.
The Content model just holds raw bytes with some very minimal metadata (size, MIME type). Content has a M:M relationship with ComponentVersion via the custom ComponentVersionContent model (which provides an identifier that we use like a filename). That allows us to have the same Content associated with multiple ContentVersions if we want, making identifier renames cheap. The
load_components
management command will associate static assets with the appropriate ContentVersions.The latest published version of a component is tracked in PublishedComponent, with a historical log available in ComponentPublishLogEntry which is M:1 with the
publish
app's PublishLogEntry. This allow us to model "a bunch of ComponentVersions were published at the same time".Overall, the shift to versioning at a more granular level has made the modeling simpler, faster, and with less per-version overhead. It also makes diffs easier. That comes at a cost of making "point-in-time" snapshots more difficult to construct (you'd have to look at ComponentPublishLogEntry (joined with PublishLogEntry) to look for the most recent
component_version
before a certain time for eachcomponent
. Certainly possible, but more expensive.Status
data.py
definedapi.py
layer at the moment, only direct model access.