Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file storage option #12590

Draft
wants to merge 10 commits into
base: release-v0.17.x
Choose a base branch
from

Conversation

nucleogenesis
Copy link
Member

@nucleogenesis nucleogenesis commented Aug 22, 2024

Summary

In order to allow us to use a cloud backend for the File Storage in Django, this adds a value to options.py which defaults to the Django FileSystemStorage (which... it did anyway but now we make sure of it).

This should make it configurable on BCK such that if there is a module that is a Google cloud backend class that implements the Django Storage class, then it can be added as the value for the settings.

For example, if we have a new class "GCloudStorage" in kolibri.core.storage then we would use that class if we set the option added here to kolibri.core.storage.GCloudStorage.


This is very much a first whack -- one thing I'm not clear on is if by naming the option by the name that Django would look to in the env vars DEFAULT_FILE_STORAGE does the Kolibri options.py stuff automatically apply that setting because of the matching name?

References

Fixes #9441 (or at least begins to address it)

Reviewer guidance

  • Run Kolibri without changing the options.ini file, do a file storagey thing (I generate a Facility Data CSV)
  • Uncomment the STORAGE_BACKEND and change it's value to gcs and restart Kolibri, you should get an error message that the module is not available
  • pip install -r requirements/storages.txt then try again, no error and it should start up

Next steps

  • Work out a way to test this in a BCK environment - will chat to @jredrejo for guidance from his experience w/ KDP.
  • Test it that way too :)

Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@nucleogenesis nucleogenesis requested a review from rtibbles August 22, 2024 21:23
@github-actions github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Aug 22, 2024
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to keep the options.py interface restricted, others can override settings if they see fit, but we should focus on enabling the specific options we need.

Hint: if you do kolibri configure list-env you will see a complete list of available env vars for configuration (which will also show how your new option can be set as an env var).

@@ -359,6 +377,16 @@ def lazy_import_callback_list(value):


base_option_spec = {
"FileStorage": {
"DEFAULT_FILE_STORAGE": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd want to hew closer to the pattern we have for the Cache and Database options here - and offer simple, pre-specified string options that refer to specific backends. If someone really wants to run a custom backend, they can override the settings file and do what they like.

That way, with specific backends in mind, we can then explicitly enumerate the additional things that need to be specified in each case - for example the default "file_system" backend value will need a path, if it's a GCS backend then other things might be required (or may be automagically configured in some cases).

except ImportError:
logger.error("Default file storage is not available.")
raise VdtValueError(value)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ever be catching a bare Exception, unless for very good reason - it can hide a multitude of sins.

modules = value.split(".")
klass = modules.pop()
module_path = ".".join(modules)
module = importlib.import_module(module_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Django exposes a utility called import_string which we already use in this module for loading classes by a string dot path, so this seems preferable to use here.

@@ -15,6 +15,7 @@
from configobj import ConfigObj
from configobj import flatten_errors
from configobj import get_extra_values
from django.core.files.storage import Storage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably why flake8 wouldn't let pre-commit pass but it didn't give me useful output

@nucleogenesis nucleogenesis requested a review from jredrejo August 26, 2024 23:13
@rtibbles rtibbles self-assigned this Aug 27, 2024
@@ -737,6 +766,7 @@ def _get_validator():
"url_prefix": url_prefix,
"bytes": validate_bytes,
"multiprocess_bool": multiprocess_bool,
"storage_option": storage_option,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this work with database based cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - I'm not sure. I assumed this was only related to validation on initialization

@github-actions github-actions bot added the APP: Facility Re: Facility App (user/class management, facility settings, csv import/export, etc.) label Dec 20, 2024
Comment on lines +218 to +219
logger.info("File saved - Path: {}".format(file_storage.url(file)))
logger.info("File saved - Size: {}".format(file_storage.size(file)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jredrejo I've tried several things around here, but no matter what I do I the file always shows size 0... I've confirmed that there are users (usernames is full of data, for example) -- so I'm not sure why the writer isn't updating the file object here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APP: Facility Re: Facility App (user/class management, facility settings, csv import/export, etc.) DEV: backend Python, databases, networking, filesystem...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants