Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune unused content #154

Open
ormsbee opened this issue Feb 7, 2024 · 1 comment
Open

Prune unused content #154

ormsbee opened this issue Feb 7, 2024 · 1 comment

Comments

@ormsbee
Copy link
Contributor

ormsbee commented Feb 7, 2024

The PublishLog and having LearningPackage-local Content entries makes it easier for us to do pruning in small cycles, like as a post-publish task.

Proposed Solution

Step 1: As a post-publish async task for any given PublishableEntity, delete all PublishableEntityVersions that are older than a certain period (1 week?), but preserve the following:

  • The current Draft version
  • Any PublishableEntityVersion that has ever been published (appears in a PublishLogRecord)

Rely on cascading deletion behavior to delete Component/ComponentVersion.

Step 2: After the deletions in Step 1, find any unreferenced Content entries and delete those.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2024

Some more thoughts on this...

We can delete old PublishableEntityVersions as new ones are created–no need to wait for publish. We should be able to do this relatively quickly, particularly since we'd only be deleting one at a time in that case.

The hard part about pruning is determining which Content are safe to delete. The components app knows how to prune unused Content for Content that it has associations with, but other things might use that same content. Esp. if we model large collections of files as something other than Components. Also, pruning the files backing Content will be a slower operation.

If we're willing to allow Content pruning to be slow, we can have a pluggable thing where multiple apps get to contribute querysets to exclude from pruning.

So Content pruning would do this:

  • Get a queryset for all content in a LearningPackage
  • but exclude a pluggable list of querysets
  • Components would be the first one to implement it. But FileSystems could as well. In any event, it's a contents-level concern.

So this prune gets called more periodically, say after a publish. It also works in small increments.

Edge case: Multiple Content entries can point to the same backing file if they're of different media types, so we need to be careful not to delete that file if there is any other Content referencing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant