Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data models for Component-level Versioning Only #28

Merged
merged 33 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
0c30eb9
refactor: change Item to Item + Component
Aug 14, 2022
37b6327
create all the models and some admin views of them
Aug 23, 2022
fd4c118
stubbing out a LOR app
Sep 3, 2022
9a7351f
before massive refactor of REST API into openedx_learning
Sep 5, 2022
658e3ac
create a rest_api package
Sep 16, 2022
e95af5c
itemstore features, pre-Denver
Feb 3, 2023
f2841b5
refactor: rename itemstore to content
Feb 3, 2023
a6db2e2
docs: update terminology for content flexibility ADR
Feb 4, 2023
cbfb371
refactor: move content -> components
Feb 4, 2023
8fe42df
chore: make git ignore Visual Studio Code preferences
Feb 4, 2023
e10c1b4
add publishing related models, regenerate migrations
Feb 6, 2023
c9f98d4
tweak version tracking
Feb 6, 2023
a868942
redo migrations
Feb 6, 2023
b3e1bab
model tweaks and data loading script for components
Feb 6, 2023
e008673
cleanup: remove file that never should have gotten in there
Feb 6, 2023
dcaffb1
removing some more noise
Feb 6, 2023
ad0841f
remove debug print
Feb 6, 2023
8937a2d
use identifiers in the Context/ComponentVersion relationship
Feb 6, 2023
4bfae55
switch around arg order for more convenient testing
Feb 6, 2023
f7deb37
Add content from static asset regex matching
Feb 6, 2023
e3765c3
browsable admin for Component/ComponentVersion/Content
Feb 6, 2023
b092c91
refactor: black reformatting
Feb 6, 2023
abd29bd
more tweaking to the admin to show inline links for ComponentVersion
Feb 7, 2023
825cca9
remove scratch api/data modules
Feb 7, 2023
e15f57d
Remove models_api,py files for now.
Feb 8, 2023
cbd2d17
adjust models in response to review feedback
Feb 9, 2023
f00973b
make the Published Components admin useful
Feb 9, 2023
8e81de7
okay, now I'm just getting silly wih aggregates in Django Admin
Feb 9, 2023
d3292ab
tweak PubilshedComponent admin to link to ComponentVersion instead of…
Feb 9, 2023
b0f9d41
add some indexes, shift to using one field for mime_type
Feb 10, 2023
d697e20
remove bit about Units being browser accessible, since Components mig…
Feb 10, 2023
9c32f7d
comment fixup around indexing
Feb 10, 2023
9e6be31
make django admin display the file sizes in human readable format (KB…
Feb 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,6 @@ requirements/private.txt

# database file
dev.db
s

.vscode

5 changes: 2 additions & 3 deletions .importlinter
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ layers=
# This is layering within our Core apps.
#
# The lowest layer is "publishing", which holds the basic primitives needed to
# create LearningContexts and versioning.
# create LearningPackages and versioning.
#
# One layer above that is "itemstore" which stores single Items (e.g. Problem,
# Video).
Expand All @@ -40,6 +40,5 @@ layers=
name = Core App Dependency Layering
type = layers
layers=
openedx_learning.core.composition
openedx_learning.core.itemstore
openedx_learning.core.components
openedx_learning.core.publishing
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Parts
~~~~~

* ``openedx_learning.lib`` is for shared utilities, and may include things like custom field types, plugin registration code, etc.
* ``openedx_learning.core`` contains our Core Django apps.
* ``openedx_learning.core`` contains our Core Django apps, where foundational data structures and APIs will live.

App Dependencies
~~~~~~~~~~~~~~~~
Expand Down
21 changes: 11 additions & 10 deletions docs/decisions/0002-content-flexibility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,11 @@ Decision

The following are foundational, extensible concepts in the Learning Core, which can be combined in different ways:

Item
An Item is a small piece of content, like a video, problem, or bit of HTML text. It has an identity, renderer, and potentially student state. It is not a container for other content, and has no child elements.

Items are analogous to the "Module" portion of the traditional Open edX course.

Segment
A Segment is an ordered list of Items that must be presented to the user together. The Items inside a Segment may be of different types, but it does not make sense to show one of these Items in isolation. An example could be one Item that explains a problem scenario, along with a problem Item that asks a question about it–a common scenario in content libraries. By default, each Item is its own Segment.

Open edX currently models these as nested Verticals (a.k.a. Units), but this often causes problems for code that traverses the content without realizing that such a nesting is possible.
Component
A Component is a small piece of content, like a video, problem, or bit of HTML text. It has an identity, renderer, and potentially student state. It is not a container for other content, and has no child elements.

Unit
This is a list of one or more Segments that is displayed to the user on one page. A Unit may be stitched together using content that comes from multiple sources, such as content libraries. Units do not have to be strictly instructional content, as things like upgrade offers and error messages may also be injected.
A Unit is an ordered list of one or more Components. A Unit is addressable in the browser at some URL, and is displayed together. A common use case might be to display some introductory Text, a Video, and some followup Problem (all separate Components). An individual Component in a Unit may or may not make sense when taken outside of that Unit–e.g. a Video may be reusable elsewhere, but the Problem referencing the video might not be.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Unit is addressable in the browser at some URL

Does this imply that a Component is not accessible at some URL in general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a leftover from when Segments were not going to necessarily be separately addressable.

I'm honestly not sure how I feel about Components being separately rendered vs. having implicit Units with single Components in them as the thing being rendered. I like the idea of having one type of thing being rendered at that level, and the idea that some Components aren't meant to be rendered in isolation. But that's super-hand-wavy at this point, so I'll just remove this wording and kick that decision down the road a little.


Sequence
A Sequence is a collection of Units that are presented one after the other, either to assess student understanding or to achieve some learning objective.
Expand All @@ -50,3 +43,11 @@ Consequences
This is aligned with the ADR on the `Role of XBlock <https://github.com/openedx/edx-platform/blob/master/docs/decisions/0006-role-of-xblock.rst>`_, which envisions XBlocks as leaf nodes of instructional content like Videos and Problems, and not as container structures like Units or Sequences.

To realize the benefits of this system would require significant changes to Studio and especially the LMS. In particular, this would involve gradually removing the XBlock runtime from much of the courseware logic. This would allow for substantial simplifications of the LMS XBlock runtime itself, such as removing field inheritance.

Changelog
---------

2023-02-06:

* Renamed "Item" to "Component" to be consistent with user-facing Studio terminology.
* Collapsed the role of Segment into Unit simplify the data model.
214 changes: 214 additions & 0 deletions olx_importer/management/commands/load_components.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
"""
Quick and hacky management command to dump Component data into our model for
experimentation purposes. This lives in its own app because it's not intended to
be a part of this repo in the longer term. Think of this as just an example app
to validate the data model can do what we need it to do.

This script manipulates the data models directly, instead of using stable API
calls. This is only because those APIs haven't been created yet, and this is
trying to validate basic questions about the data model. This is not how apps
are intended to use openedx-learning in the longer term.

Open Question: If the data model is extensible, how do we know whether a change
has really happened between what's currently stored/published for a particular
item and the new value we want to set? For Content that's easy, because we have
actual hashes of the data. But it's not clear how that would work for something
like an ComponentVersion. We'd have to have some kind of mechanism where every
pp that wants to attach data gets to answer the question of "has anything
changed?" in order to decide if we really make a new ComponentVersion or not.
"""
from datetime import datetime, timezone
import codecs
import logging
import mimetypes
import pathlib
import re
import xml.etree.ElementTree as ET

from django.core.management.base import BaseCommand
from django.db import transaction

from openedx_learning.core.publishing.models import LearningPackage, PublishLogEntry
from openedx_learning.core.components.models import (
Content, Component, ComponentVersion, ComponentVersionContent,
ComponentPublishLogEntry, PublishedComponent,
)
from openedx_learning.lib.fields import create_hash_digest

SUPPORTED_TYPES = ['problem', 'video', 'html']
logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = 'Load sample Component data from course export'

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.learning_package = None
self.course_data_path = None
self.init_known_types()

def init_known_types(self):
"""Intialize mimetypes with some custom mappings we want to use."""
mimetypes.add_type("application/vnd.openedx.srt+json", ".sjson")
mimetypes.add_type("text/markdown", ".md")
mimetypes.add_type("image/svg+xml", ".svg")

def add_arguments(self, parser):
parser.add_argument('course_data_path', type=pathlib.Path)
parser.add_argument('learning_package_identifier', type=str)

def handle(self, course_data_path, learning_package_identifier, **options):
self.course_data_path = course_data_path
self.learning_package_identifier = learning_package_identifier
self.load_course_data(learning_package_identifier)

def get_course_title(self):
course_type_dir = self.course_data_path / 'course'
course_xml_file = next(course_type_dir.glob('*.xml'))
course_root = ET.parse(course_xml_file).getroot()
return course_root.attrib.get("display_name", "Unknown Course")

def load_course_data(self, learning_package_identifier):
print(f"Importing course from: {self.course_data_path}")
now = datetime.now(timezone.utc)
title = self.get_course_title()

with transaction.atomic():
learning_package, _created = LearningPackage.objects.get_or_create(
identifier=learning_package_identifier,
defaults={
'title': title,
'created': now,
'updated': now,
},
)
self.learning_package = learning_package

publish_log_entry = PublishLogEntry.objects.create(
learning_package=learning_package,
message="Initial Import",
published_at=now,
published_by=None,
)

for block_type in SUPPORTED_TYPES:
self.import_block_type(block_type, now, publish_log_entry)

def create_content(self, static_local_path, now, component_version):
identifier = pathlib.Path('static') / static_local_path
real_path = self.course_data_path / identifier
mime_type, _encoding = mimetypes.guess_type(identifier)
if mime_type is None:
logger.error(f" no mimetype found for {real_path}, defaulting to application/binary")
type, sub_type = "application", "binary"
else:
type, sub_type = mime_type.split('/')

try:
data_bytes = real_path.read_bytes()
except FileNotFoundError:
logger.warning(f" Static reference not found: {real_path}")
return # Might as well bail if we can't find the file.

hash_digest = create_hash_digest(data_bytes)

content, _created = Content.objects.get_or_create(
learning_package=self.learning_package,
type=type,
sub_type=sub_type,
hash_digest=hash_digest,
defaults = dict(
data=data_bytes,
size=len(data_bytes),
created=now,
)
)
ComponentVersionContent.objects.get_or_create(
component_version=component_version,
content=content,
identifier=identifier,
)

def import_block_type(self, block_type, now, publish_log_entry):
components_found = 0

# Find everything that looks like a reference to a static file appearing
# in attribute quotes, stripping off the querystring at the end. This is
# not fool-proof as it will match static file references that are
# outside of tag declarations as well.
static_files_regex = re.compile(r"""['"]\/static\/(.+?)["'\?]""")
block_data_path = self.course_data_path / block_type

for xml_file_path in block_data_path.glob('*.xml'):
components_found += 1
identifier = xml_file_path.stem

# Find or create the Component itself
component, _created = Component.objects.get_or_create(
learning_package=self.learning_package,
namespace='xblock.v1',
type=block_type,
identifier=identifier,
defaults = {
'created': now,
'modified': now,
}
)

# Create the Content entry for the raw data...
data_bytes = xml_file_path.read_bytes()
hash_digest = create_hash_digest(data_bytes)
data_str = codecs.decode(data_bytes, 'utf-8')
content, _created = Content.objects.get_or_create(
learning_package=self.learning_package,
type='application',
sub_type=f'vnd.openedx.xblock.v1.{block_type}+xml',
hash_digest=hash_digest,
defaults = dict(
data=data_bytes,
size=len(data_bytes),
created=now,
)
)
# TODO: Get associated file contents, both with the static regex, as
# well as with XBlock-specific code that examines attributes in
# video and HTML tag definitions.

try:
block_root = ET.fromstring(data_str)
except ET.ParseError as err:
logger.error(f"Parse error for {xml_file_path}: {err}")
continue

display_name = block_root.attrib.get('display_name', "")

# Create the ComponentVersion
component_version = ComponentVersion.objects.create(
component=component,
created=now,
version_num=1, # This only works for initial import
title=display_name,
)
ComponentVersionContent.objects.create(
component_version=component_version,
content=content,
identifier='source.xml',
)
static_files_found = static_files_regex.findall(data_str)
for static_local_path in static_files_found:
self.create_content(static_local_path, now, component_version)

# Mark that Component as Published
component_publish_log_entry = ComponentPublishLogEntry.objects.create(
component=component,
component_version=component_version,
publish_log_entry=publish_log_entry,
)
PublishedComponent.objects.create(
component=component,
component_version=component_version,
component_publish_log_entry=component_publish_log_entry,
)

print(f"{block_type}: {components_found}")
Loading