Skip to content

Commit

Permalink
feat: add Component-level versioning
Browse files Browse the repository at this point in the history
This is a major simplification and refactoring of the data model code
in order to get to something that we might be able to build v2 content
library tagging on top of (the first tagging implementation will be
strictly on Components and not Units).

Some high level points:

* Creates the openedx_learning.core.components app where Component
  content data models are implemented.
* Adds the PublishLogEntry model to openedx_learning.core.publishing
  to track when various things are published in a central location.
* Renames things to align better with existing Studio conventions,
  like "Components" instead of "Items".
* Removes some aspirational-but-not-really-implemented modules like
  data.py and api.py for apps, as well as apps that didn't have real
  implementations yet (staticassets, composition).
* Removes LearningPackage-level versioning in favor of versioning at
  the Component level. This makes per-version storage overhead smaller
  at the cost of making "point-in-time" snapshots more difficult to
  construct–you'd have to look at ComponentPublishLogEntry (joined
  with PublishLogEntry) to look for the most recent component_version
  before a certain time for each component. Certainly possible, but
  more expensive.
  • Loading branch information
ormsbee authored Feb 10, 2023
1 parent f8cba42 commit 867ea8d
Show file tree
Hide file tree
Showing 50 changed files with 1,643 additions and 1,007 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,6 @@ requirements/private.txt

# database file
dev.db
s

.vscode

5 changes: 2 additions & 3 deletions .importlinter
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ layers=
# This is layering within our Core apps.
#
# The lowest layer is "publishing", which holds the basic primitives needed to
# create LearningContexts and versioning.
# create LearningPackages and versioning.
#
# One layer above that is "itemstore" which stores single Items (e.g. Problem,
# Video).
Expand All @@ -40,6 +40,5 @@ layers=
name = Core App Dependency Layering
type = layers
layers=
openedx_learning.core.composition
openedx_learning.core.itemstore
openedx_learning.core.components
openedx_learning.core.publishing
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Parts
~~~~~

* ``openedx_learning.lib`` is for shared utilities, and may include things like custom field types, plugin registration code, etc.
* ``openedx_learning.core`` contains our Core Django apps.
* ``openedx_learning.core`` contains our Core Django apps, where foundational data structures and APIs will live.

App Dependencies
~~~~~~~~~~~~~~~~
Expand Down
21 changes: 11 additions & 10 deletions docs/decisions/0002-content-flexibility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,11 @@ Decision

The following are foundational, extensible concepts in the Learning Core, which can be combined in different ways:

Item
An Item is a small piece of content, like a video, problem, or bit of HTML text. It has an identity, renderer, and potentially student state. It is not a container for other content, and has no child elements.

Items are analogous to the "Module" portion of the traditional Open edX course.

Segment
A Segment is an ordered list of Items that must be presented to the user together. The Items inside a Segment may be of different types, but it does not make sense to show one of these Items in isolation. An example could be one Item that explains a problem scenario, along with a problem Item that asks a question about it–a common scenario in content libraries. By default, each Item is its own Segment.

Open edX currently models these as nested Verticals (a.k.a. Units), but this often causes problems for code that traverses the content without realizing that such a nesting is possible.
Component
A Component is a small piece of content, like a video, problem, or bit of HTML text. It has an identity, renderer, and potentially student state. It is not a container for other content, and has no child elements.

Unit
This is a list of one or more Segments that is displayed to the user on one page. A Unit may be stitched together using content that comes from multiple sources, such as content libraries. Units do not have to be strictly instructional content, as things like upgrade offers and error messages may also be injected.
A Unit is an ordered list of one or more Components that is typically displayed together. A common use case might be to display some introductory Text, a Video, and some followup Problem (all separate Components). An individual Component in a Unit may or may not make sense when taken outside of that Unit–e.g. a Video may be reusable elsewhere, but the Problem referencing the video might not be.

Sequence
A Sequence is a collection of Units that are presented one after the other, either to assess student understanding or to achieve some learning objective.
Expand All @@ -50,3 +43,11 @@ Consequences
This is aligned with the ADR on the `Role of XBlock <https://github.com/openedx/edx-platform/blob/master/docs/decisions/0006-role-of-xblock.rst>`_, which envisions XBlocks as leaf nodes of instructional content like Videos and Problems, and not as container structures like Units or Sequences.

To realize the benefits of this system would require significant changes to Studio and especially the LMS. In particular, this would involve gradually removing the XBlock runtime from much of the courseware logic. This would allow for substantial simplifications of the LMS XBlock runtime itself, such as removing field inheritance.

Changelog
---------

2023-02-06:

* Renamed "Item" to "Component" to be consistent with user-facing Studio terminology.
* Collapsed the role of Segment into Unit simplify the data model.
219 changes: 219 additions & 0 deletions olx_importer/management/commands/load_components.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
"""
Quick and hacky management command to dump Component data into our model for
experimentation purposes. This lives in its own app because it's not intended to
be a part of this repo in the longer term. Think of this as just an example app
to validate the data model can do what we need it to do.
This script manipulates the data models directly, instead of using stable API
calls. This is only because those APIs haven't been created yet, and this is
trying to validate basic questions about the data model. This is not how apps
are intended to use openedx-learning in the longer term.
Open Question: If the data model is extensible, how do we know whether a change
has really happened between what's currently stored/published for a particular
item and the new value we want to set? For Content that's easy, because we have
actual hashes of the data. But it's not clear how that would work for something
like an ComponentVersion. We'd have to have some kind of mechanism where every
pp that wants to attach data gets to answer the question of "has anything
changed?" in order to decide if we really make a new ComponentVersion or not.
"""
from datetime import datetime, timezone
import codecs
import logging
import mimetypes
import pathlib
import re
import xml.etree.ElementTree as ET

from django.core.management.base import BaseCommand
from django.db import transaction

from openedx_learning.core.publishing.models import LearningPackage, PublishLogEntry
from openedx_learning.core.components.models import (
Content, Component, ComponentVersion, ComponentVersionContent,
ComponentPublishLogEntry, PublishedComponent,
)
from openedx_learning.lib.fields import create_hash_digest

SUPPORTED_TYPES = ['problem', 'video', 'html']
logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = 'Load sample Component data from course export'

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.learning_package = None
self.course_data_path = None
self.init_known_types()

def init_known_types(self):
"""Intialize mimetypes with some custom mappings we want to use."""
# This is our own hacky video transcripts related format.
mimetypes.add_type("application/vnd.openedx.srt+json", ".sjson")

# Python's stdlib doesn't include these files that are sometimes used.
mimetypes.add_type("text/markdown", ".md")
mimetypes.add_type("image/svg+xml", ".svg")

# Historically, JavaScript was "application/javascript", but it's now
# officially "text/javascript"
mimetypes.add_type("text/javascript", ".js")
mimetypes.add_type("text/javascript", ".mjs")


def add_arguments(self, parser):
parser.add_argument('course_data_path', type=pathlib.Path)
parser.add_argument('learning_package_identifier', type=str)

def handle(self, course_data_path, learning_package_identifier, **options):
self.course_data_path = course_data_path
self.learning_package_identifier = learning_package_identifier
self.load_course_data(learning_package_identifier)

def get_course_title(self):
course_type_dir = self.course_data_path / 'course'
course_xml_file = next(course_type_dir.glob('*.xml'))
course_root = ET.parse(course_xml_file).getroot()
return course_root.attrib.get("display_name", "Unknown Course")

def load_course_data(self, learning_package_identifier):
print(f"Importing course from: {self.course_data_path}")
now = datetime.now(timezone.utc)
title = self.get_course_title()

with transaction.atomic():
learning_package, _created = LearningPackage.objects.get_or_create(
identifier=learning_package_identifier,
defaults={
'title': title,
'created': now,
'updated': now,
},
)
self.learning_package = learning_package

publish_log_entry = PublishLogEntry.objects.create(
learning_package=learning_package,
message="Initial Import",
published_at=now,
published_by=None,
)

for block_type in SUPPORTED_TYPES:
self.import_block_type(block_type, now, publish_log_entry)

def create_content(self, static_local_path, now, component_version):
identifier = pathlib.Path('static') / static_local_path
real_path = self.course_data_path / identifier
mime_type, _encoding = mimetypes.guess_type(identifier)
if mime_type is None:
logger.error(f" no mimetype found for {real_path}, defaulting to application/binary")
mime_type = "application/binary"

try:
data_bytes = real_path.read_bytes()
except FileNotFoundError:
logger.warning(f" Static reference not found: {real_path}")
return # Might as well bail if we can't find the file.

hash_digest = create_hash_digest(data_bytes)

content, _created = Content.objects.get_or_create(
learning_package=self.learning_package,
mime_type=mime_type,
hash_digest=hash_digest,
defaults = dict(
data=data_bytes,
size=len(data_bytes),
created=now,
)
)
ComponentVersionContent.objects.get_or_create(
component_version=component_version,
content=content,
identifier=identifier,
)

def import_block_type(self, block_type, now, publish_log_entry):
components_found = 0

# Find everything that looks like a reference to a static file appearing
# in attribute quotes, stripping off the querystring at the end. This is
# not fool-proof as it will match static file references that are
# outside of tag declarations as well.
static_files_regex = re.compile(r"""['"]\/static\/(.+?)["'\?]""")
block_data_path = self.course_data_path / block_type

for xml_file_path in block_data_path.glob('*.xml'):
components_found += 1
identifier = xml_file_path.stem

# Find or create the Component itself
component, _created = Component.objects.get_or_create(
learning_package=self.learning_package,
namespace='xblock.v1',
type=block_type,
identifier=identifier,
defaults = {
'created': now,
}
)

# Create the Content entry for the raw data...
data_bytes = xml_file_path.read_bytes()
hash_digest = create_hash_digest(data_bytes)
data_str = codecs.decode(data_bytes, 'utf-8')
content, _created = Content.objects.get_or_create(
learning_package=self.learning_package,
mime_type=f'application/vnd.openedx.xblock.v1.{block_type}+xml',
hash_digest=hash_digest,
defaults = dict(
data=data_bytes,
size=len(data_bytes),
created=now,
)
)
# TODO: Get associated file contents, both with the static regex, as
# well as with XBlock-specific code that examines attributes in
# video and HTML tag definitions.

try:
block_root = ET.fromstring(data_str)
except ET.ParseError as err:
logger.error(f"Parse error for {xml_file_path}: {err}")
continue

display_name = block_root.attrib.get('display_name', "")

# Create the ComponentVersion
component_version = ComponentVersion.objects.create(
component=component,
version_num=1, # This only works for initial import
title=display_name,
created=now,
created_by=None,
)
ComponentVersionContent.objects.create(
component_version=component_version,
content=content,
identifier='source.xml',
)
static_files_found = static_files_regex.findall(data_str)
for static_local_path in static_files_found:
self.create_content(static_local_path, now, component_version)

# Mark that Component as Published
component_publish_log_entry = ComponentPublishLogEntry.objects.create(
component=component,
component_version=component_version,
publish_log_entry=publish_log_entry,
)
PublishedComponent.objects.create(
component=component,
component_version=component_version,
component_publish_log_entry=component_publish_log_entry,
)

print(f"{block_type}: {components_found}")
Loading

0 comments on commit 867ea8d

Please sign in to comment.