Data models for Component-level Versioning Only #28

ormsbee · 2023-02-06T05:21:36Z

High Level Overview

This is a pared down version of the last experimental pull request, with Unit-level work and top level versioning of the entire LearningContext stripped out. A lot of other changes as well.

The load_components management command is functional and should pull in the XBlock contents for HTML, Problem, and Video types. However it doesn't yet pull in all the references files (e.g. HTML files, subtitle files, etc.) yet. It will pull in regex matched static asset references from HTML and Capa, but not things where it has to understand conventions based on XML attribute values (like how some video subtitles are specified).

Apps

There are two apps at work here:

openedx_learning.core.publishing has just enough to tie content together (LearningPackage) and to track when things get published (PublishLogEntry).
openedx_learning.core.components has the models needed to track Components, their versioning, the raw data they address, and the publishing status of those models. It depends on the publishing app.

Versioning

Versioning happens at the level of each Component (e.g. an individual block, like a problem or video) via the ComponentVersion model.

The Content model just holds raw bytes with some very minimal metadata (size, MIME type). Content has a M:M relationship with ComponentVersion via the custom ComponentVersionContent model (which provides an identifier that we use like a filename). That allows us to have the same Content associated with multiple ContentVersions if we want, making identifier renames cheap. The load_components management command will associate static assets with the appropriate ContentVersions.

The latest published version of a component is tracked in PublishedComponent, with a historical log available in ComponentPublishLogEntry which is M:1 with the publish app's PublishLogEntry. This allow us to model "a bunch of ComponentVersions were published at the same time".

Overall, the shift to versioning at a more granular level has made the modeling simpler, faster, and with less per-version overhead. It also makes diffs easier. That comes at a cost of making "point-in-time" snapshots more difficult to construct (you'd have to look at ComponentPublishLogEntry (joined with PublishLogEntry) to look for the most recent component_version before a certain time for each component. Certainly possible, but more expensive.

Status

there is no data.py defined
there is no api.py layer at the moment, only direct model access.

We've had a number of conversations since this ADR was created, in an effort to simplify the data model and bring the terminology more in line with existing Studio vocabulary.

ormsbee · 2023-02-06T05:22:08Z

FYI @bradenmacdonald

ormsbee · 2023-02-06T21:09:38Z

FYI @feanil, @kdmccormick

These were placeholders, and they're cluttering the PR at the moment. There might be some useful scraps to pull from them, but they exist in another branch we can look at later.

kdmccormick

@ormsbee @feanil You guys were right, reading the description made this all very clear.

I like that you've scoped this down to just Content and Component for now. I think that models here are extremely reasonable and will be a solid base as we work our way back up the hierarchy.

I had been resisting Component-level versioning for a while, but in retrospect trying to do Segment- or Unit-level versioning just made things more complex. I'm not sure why I felt so strongly--maybe because allowing library components to be versioned separately sounds like a UX nightmare? Even so, the Library Authoring system could mandate that library components are versioned together, without that restriction having to appear at this level.

Just some nits/thoughts below but nothing blocking.

kdmccormick · 2023-02-08T19:12:55Z

openedx_learning/core/components/models.py

+    contents = models.ManyToManyField(
+        "Content",
+        through="ComponentVersionContent",
+        related_name="component_versions",
+    )


Should every ComponentVersion have at least one contents?

If so, consider noting that with a comment (or a db constraint, but I assume that's not possible).

Otherwise, consider setting blank=True here, as that would let us construct a content-less CV in django admin.

Suggested change

contents = models.ManyToManyField(

"Content",

through="ComponentVersionContent",

related_name="component_versions",

)

contents = models.ManyToManyField(

"Content",

through="ComponentVersionContent",

related_name="component_versions",

blank=True,

)

Huh... I guess it's possible that we'd make one without Content (like just a title). I'd rather make the admin read-only altogether though, because ComponentVersion creation should include adding Publish entries rather than just direct model manipulation (I guess I really should do the api.py pieces of this soon).

openedx_learning/core/components/models.py

kdmccormick · 2023-02-08T19:59:28Z

openedx_learning/core/components/models.py

+    # type is a way to help sub-divide namespace if that's convenient. This
+    # field cannot be null, but it can be blank if it's not necessary. For an
+    # XBlock, type corresponds to tag, e.g. "video". It's also the block_type in
+    # the UsageKey.
+    type = models.CharField(max_length=100, null=False, blank=True)


Again, I don't love type as a name, but trying to think about this field outside the context of XBlocks is so abstract and hypothetical that I have no better name to suggest.

If we expect that the only actual usage of this field will be for subdividing the ID namespace for XBlocks to match the historical behavior, then we could conceivably just call this field block_type. I don't think it should be used outside of XBlocks, as having two levels of namespacing is not ideal.

Personally, I would probably remove this field and combine the block_type and block_id into the identifier field with a special character string separator, and leave it up to the "identifier <=> usage key" logic to sort it out. But I'm completely fine with this approach here, so not suggesting we change this.

bradenmacdonald · 2023-02-08T22:46:30Z

Hah, I was just thinking about this recently and I ended up concluding to myself that component-level versioning seemed like the way to go. Glad to see you trying it out and to hear it seems to be proving out so far from what you've tried.

Ran out of time to review today, will check out more tomorrow or Fri :)

bradenmacdonald · 2023-02-08T23:01:30Z

docs/decisions/0002-content-flexibility.rst


 Unit
-  This is a list of one or more Segments that is displayed to the user on one page. A Unit may be stitched together using content that comes from multiple sources, such as content libraries. Units do not have to be strictly instructional content, as things like upgrade offers and error messages may also be injected.
+  A Unit is an ordered list of one or more Components. A Unit is addressable in the browser at some URL, and is displayed together. A common use case might be to display some introductory Text, a Video, and some followup Problem (all separate Components). An individual Component in a Unit may or may not make sense when taken outside of that Unit–e.g. a Video may be reusable elsewhere, but the Problem referencing the video might not be.


A Unit is addressable in the browser at some URL

Does this imply that a Component is not accessible at some URL in general?

I think this was a leftover from when Segments were not going to necessarily be separately addressable.

I'm honestly not sure how I feel about Components being separately rendered vs. having implicit Units with single Components in them as the thing being rendered. I like the idea of having one type of thing being rendered at that level, and the idea that some Components aren't meant to be rendered in isolation. But that's super-hand-wavy at this point, so I'll just remove this wording and kick that decision down the road a little.

… Component

…ht also end up that way

…/MB)

ormsbee · 2023-02-10T19:36:39Z

@kdmccormick, @feanil, @bradenmacdonald: I think I addressed all the comments. Also added a couple small conveniences to the Django admin, a sprinkling of indexes, and some more fleshed out comments.

bradenmacdonald

@ormsbee This is excellent and I really like the direction. I don't have any major feedback, just a couple little questions. Nothing blocking. Looking forward to building on this!

bradenmacdonald · 2023-02-10T20:03:14Z

openedx_learning/core/components/models.py

+    about scalability issues. For instance, video files should not be stored
+    here directly. There is a 10 MB limit set for the moment, to accomodate
+    things like PDF files and images, but the itention is for the vast majority
+    of rows to be much smaller than that.


How should data files over 10 MB be stored?

I would almost prefer a much lower limit like 250 kB and an integrated way to store and reference larger Content instances as objects on S3. I suspect you may already have implemented or planned something that but I forgot or don't see it yet.

You're right, I've punted on this issue. I came to the same conclusion that you did: S3 objects (probably being django-storages) + database references, probably through an optional FileField on Content.

Captured this in #29

bradenmacdonald · 2023-02-10T20:07:58Z

openedx_learning/core/components/readme.rst

+Architecture Guidelines
+-----------------------
+
+* We're keeping nearly unlimited history, so per-version metadata (i.e. the space/time cost of making a new version) must be kept low.


I don't suppose there's any way to tell which versions are in use or not with this model, because any application may or may not have foreign keys to specific ComponentVersions?

(I'm wondering if we can auto-prune unused versions.)

I think we can using Django ORM introspection or a mixin, but it wouldn't be perfect. I'm honestly hoping the per-version overhead is low enough now where it's okay not to prune, because that just makes content lifecycle management so much easier.

David Ormsbee added 14 commits February 6, 2023 00:15

refactor: change Item to Item + Component

0c30eb9

create all the models and some admin views of them

37b6327

stubbing out a LOR app

fd4c118

before massive refactor of REST API into openedx_learning

9a7351f

create a rest_api package

658e3ac

itemstore features, pre-Denver

e95af5c

refactor: rename itemstore to content

f2841b5

docs: update terminology for content flexibility ADR

a6db2e2

We've had a number of conversations since this ADR was created, in an effort to simplify the data model and bring the terminology more in line with existing Studio vocabulary.

refactor: move content -> components

cbfb371

chore: make git ignore Visual Studio Code preferences

8fe42df

add publishing related models, regenerate migrations

e10c1b4

tweak version tracking

c9f98d4

redo migrations

a868942

model tweaks and data loading script for components

b3e1bab

David Ormsbee added 7 commits February 6, 2023 00:24

cleanup: remove file that never should have gotten in there

e008673

removing some more noise

dcaffb1

remove debug print

ad0841f

use identifiers in the Context/ComponentVersion relationship

8937a2d

switch around arg order for more convenient testing

4bfae55

Add content from static asset regex matching

f7deb37

browsable admin for Component/ComponentVersion/Content

e3765c3

David Ormsbee added 4 commits February 6, 2023 18:50

refactor: black reformatting

b092c91

more tweaking to the admin to show inline links for ComponentVersion

abd29bd

remove scratch api/data modules

825cca9

These were placeholders, and they're cluttering the PR at the moment. There might be some useful scraps to pull from them, but they exist in another branch we can look at later.

Remove models_api,py files for now.

e15f57d

kdmccormick approved these changes Feb 8, 2023

View reviewed changes

bradenmacdonald reviewed Feb 8, 2023

View reviewed changes

David Ormsbee added 3 commits February 8, 2023 20:24

adjust models in response to review feedback

cbd2d17

make the Published Components admin useful

f00973b

okay, now I'm just getting silly wih aggregates in Django Admin

8e81de7

ormsbee self-assigned this Feb 9, 2023

David Ormsbee added 5 commits February 9, 2023 18:46

tweak PubilshedComponent admin to link to ComponentVersion instead of…

d3292ab

… Component

add some indexes, shift to using one field for mime_type

b0f9d41

remove bit about Units being browser accessible, since Components mig…

d697e20

…ht also end up that way

comment fixup around indexing

9c32f7d

make django admin display the file sizes in human readable format (KB…

9e6be31

…/MB)

bradenmacdonald approved these changes Feb 10, 2023

View reviewed changes

ormsbee merged commit 867ea8d into openedx:main Feb 10, 2023

ormsbee deleted the denver2 branch February 10, 2023 21:18

ormsbee mentioned this pull request Feb 10, 2023

More Prototyping #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data models for Component-level Versioning Only #28

Data models for Component-level Versioning Only #28

ormsbee commented Feb 6, 2023 •

edited

Loading

ormsbee commented Feb 6, 2023

ormsbee commented Feb 6, 2023

kdmccormick left a comment

kdmccormick Feb 8, 2023

ormsbee Feb 8, 2023

kdmccormick Feb 8, 2023

bradenmacdonald Feb 10, 2023 •

edited

Loading

bradenmacdonald commented Feb 8, 2023 •

edited

Loading

bradenmacdonald Feb 8, 2023

ormsbee Feb 9, 2023

ormsbee commented Feb 10, 2023

bradenmacdonald left a comment

bradenmacdonald Feb 10, 2023

ormsbee Feb 10, 2023

bradenmacdonald Feb 10, 2023

ormsbee Feb 10, 2023

Data models for Component-level Versioning Only #28

Data models for Component-level Versioning Only #28

Conversation

ormsbee commented Feb 6, 2023 • edited Loading

High Level Overview

Apps

Versioning

Status

ormsbee commented Feb 6, 2023

ormsbee commented Feb 6, 2023

kdmccormick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradenmacdonald Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

bradenmacdonald commented Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormsbee commented Feb 10, 2023

bradenmacdonald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormsbee commented Feb 6, 2023 •

edited

Loading

bradenmacdonald Feb 10, 2023 •

edited

Loading

bradenmacdonald commented Feb 8, 2023 •

edited

Loading