Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote content serving #8005

Merged
merged 2 commits into from
May 21, 2021
Merged

Conversation

rtibbles
Copy link
Member

@rtibbles rtibbles commented Apr 14, 2021

Summary

Rather than attempt to integrate with cloud storage, this instead just delegates any management of content files to an external URL, and just assumes that it is dealing with it.
In order to achieve this, this PR does the following:

  • Adds a new option REMOTE_CONTENT that is a boolean to interpret whether this option is being used
  • Removes the DownloadContentView as it relies on local storage being available. Replaces this with client side file downloading (this means that for IE11, files will have their raw names).
  • Removes generation of download_url for files in the ContentNodeViewset, as no longer needed
  • Adds a frontend downloadUrl utility that just points at the content storage.
  • Intercepts content downloading and transfer to just assume it worked and not attempt
  • Prevents content export.
  • Doesn't make any changes to zipcontent, as it will now just 404 - this should be used in conjunction with the configurability of the zipcontent URL.

References

Fixes #7980
Depends on #7924 as it also adds options so didn't want to merge conflict with myself

Reviewer guidance

  • Does content import 'work'? i.e. tell you that everything was successful but do nothing filewise
  • Does downloading content work?
  • Does viewing content with REMOTE_CONTENT_URL set work?

Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@rtibbles rtibbles added the work-in-progress Not ready for review label Apr 14, 2021
@rtibbles rtibbles added this to the 0.15.0 milestone Apr 14, 2021
@rtibbles rtibbles requested a review from aronasorman April 14, 2021 15:57
@aronasorman
Copy link
Collaborator

Testing this now -- what's a good way to run Kolibri 0.15? Is it different from kolibri start --foreground?

@aronasorman
Copy link
Collaborator

Note: running kolibri start --foreground starts listening on both port 8080 (expected) and 8765. I'm guessing port 8765 can be changed by defining ZIP_CONTENT_PORT?

@rtibbles
Copy link
Member Author

I'm guessing port 8765 can be changed by defining ZIP_CONTENT_PORT?

Correct - also, when being invoked via uwsgi, for a remote content scenario, we would presumably not be running the zipcontent server locally, so would not be running that.

The 8765 port server is a separately defined wsgi application.

@aronasorman
Copy link
Collaborator

I'm also getting this error when backed by a postgres database:

postgres_1  | ERROR:  relation "sqlite_master" does not exist at character 18
postgres_1  | STATEMENT:  SELECT name FROM sqlite_master WHERE type='table';

It seems like we have a hardcoded dependency to the sqlite_master relation somewhere? It's not a big deal yet though, as kolibri continues running.

@aronasorman
Copy link
Collaborator

I'm guessing port 8765 can be changed by defining ZIP_CONTENT_PORT?

Correct - also, when being invoked via uwsgi, for a remote content scenario, we would presumably not be running the zipcontent server locally, so would not be running that.

The 8765 port server is a separately defined wsgi application.

Gotcha. Sounds good! I'll test out that env var then.

@rtibbles
Copy link
Member Author

We do hard code references to it in a couple of places: https://github.com/learningequality/kolibri/search?q=sqlite_master but we catch them in the case they cause an error, hence you're seeing them continue. Could probably conditionalize these calls further.

@aronasorman
Copy link
Collaborator

I got a test environment with this Dockerfile:

FROM python:3.8.9-buster

RUN pip install psycopg2-binary

COPY kolibri-0.15.0.dev0+git.20210414160655-py2.py3-none-any.whl .
RUN pip install ./kolibri-0.15.0.dev0+git.20210414160655-py2.py3-none-any.whl

And this docker-compose script:

version: '3.4'

services:
  kolibri:
    build: .
    environment:
      KOLIBRI_DATABASE_ENGINE: postgres
      KOLIBRI_DATABASE_PORT: 5432
      KOLIBRI_DATABASE_NAME: kolibri
      KOLIBRI_DATABASE_USER: kolibri
      KOLIBRI_DATABASE_PASSWORD: kolibri
      KOLIBRI_DATABASE_HOST: postgres
      KOLIBRI_RUN_MODE: arons-testing
      KOLIBRI_CONTENT_REMOTE_URL: https://studio.learningequality.org/content    
    ports:
      - "8080:8080"
      - "8765:8765"
    depends_on:
      - postgres
    command: kolibri start --foreground

  postgres:
    image: postgres:9.6
    environment:
      PGDATA: /var/lib/postgresql/data/pgdata
      POSTGRES_USER: kolibri
      POSTGRES_PASSWORD: kolibri
      POSTGRES_DB: kolibri
    volumes:
      - pgdata:/var/lib/postgresql/data/pgdata

volumes:
  pgdata:

@aronasorman
Copy link
Collaborator

I didn't run collectstatic expecting whitenoise to automatically serve the static assets. Unfortunately I'm getting 404s :(

image

Did I have the wrong expectations here? Should I have run collectstatic?

@rtibbles
Copy link
Member Author

KOLIBRI_CONTENT_REMOTE_URL: https://studio.learningequality.org/content

should be:

KOLIBRI_CONTENT_REMOTE_URL: https://studio.learningequality.org/

@rtibbles
Copy link
Member Author

And yes, whitenoise should be serving without collectstatic - do you get the same issue when running locally?

@aronasorman
Copy link
Collaborator

@rtibbles without postgres? Let me try!

@aronasorman
Copy link
Collaborator

Just running the kolibri without any special settings still returns a 404 alas

@aronasorman
Copy link
Collaborator

Serving videos straight from Studio :D

image

image

@aronasorman
Copy link
Collaborator

Content data has persisted across restarts as well!

@aronasorman
Copy link
Collaborator

Getting this error for a piece of content on touchable earth:

http://localhost:8080/en/learn/#/topics/c/114ab24ec98e585dae3cd57559275394

image

@aronasorman
Copy link
Collaborator

I'm guessing that zip files are currently unsupported.

@rtibbles
Copy link
Member Author

Updates from in person review:

  • Switch from URL to a simple Boolean flagging whether content is remotely served or not. Actual routing can be handled in server configuration.
  • Consolidate H5P and Hashi static serving under content/static
  • Cloud function will be needed for extracting zip files on the studio content bucket into a zipcontent bucket - we will maintain the current relative location of zipcontent to reduce changes

@rtibbles rtibbles force-pushed the remote_content branch 2 times, most recently from 8549f7e to 457887f Compare April 22, 2021 13:01
@aronasorman
Copy link
Collaborator

After updating to the new wheel file, i'm getting this error:

postgres_1  | ERROR:  operator does not exist: character varying = integer at character 105
postgres_1  | HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
postgres_1  | STATEMENT:  SELECT EXISTS (SELECT 1
postgres_1  |   FROM scheduledjobs
postgres_1  |   WHERE scheduledjobs.queue = 'kolibri' AND scheduledjobs.id = 0) AS anon_1
kolibri_1   | ERROR    ENGINE Error in 'start' listener <bound method ServicesPlugin.start of <kolibri.utils.server.ServicesPlugin object at 0x7fd59d4c6e20>>
kolibri_1   | Traceback (most recent call last):
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1283, in _execute_context
kolibri_1   |     self.dialect.do_execute(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/default.py", line 590, in do_execute
kolibri_1   |     cursor.execute(statement, parameters)
kolibri_1   | psycopg2.errors.UndefinedFunction: operator does not exist: character varying = integer
kolibri_1   | LINE 3: ...eduledjobs.queue = 'kolibri' AND scheduledjobs.id = 0) AS an...
kolibri_1   |                                                              ^
kolibri_1   | HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
kolibri_1   |
kolibri_1   |
kolibri_1   | The above exception was the direct cause of the following exception:
kolibri_1   |
kolibri_1   | Traceback (most recent call last):
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/cherrypy/process/wspbus.py", line 230, in publish
kolibri_1   |     output.append(listener(*args, **kwargs))
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/utils/server.py", line 93, in start
kolibri_1   |     if SCH_PING_JOB_ID not in scheduler:
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/django/utils/functional.py", line 239, in inner
kolibri_1   |     return func(self._wrapped, *args)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/core/tasks/scheduler.py", line 78, in __contains__
kolibri_1   |     return session.query(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/orm/query.py", line 3469, in scalar
kolibri_1   |     ret = self.one()
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/orm/query.py", line 3436, in one
kolibri_1   |     ret = self.one_or_none()
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/orm/query.py", line 3405, in one_or_none
kolibri_1   |     ret = list(self)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/orm/query.py", line 3481, in __iter__
kolibri_1   |     return self._execute_and_instances(context)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/orm/query.py", line 3506, in _execute_and_instances
kolibri_1   |     result = conn.execute(querycontext.statement, self._params)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1020, in execute
kolibri_1   |     return meth(self, multiparams, params)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
kolibri_1   |     return connection._execute_clauseelement(self, multiparams, params)
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
kolibri_1   |     ret = self._execute_context(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1323, in _execute_context
kolibri_1   |     self._handle_dbapi_exception(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1517, in _handle_dbapi_exception
kolibri_1   |     util.raise_(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/util/compat.py", line 178, in raise_
kolibri_1   |     raise exception
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/base.py", line 1283, in _execute_context
kolibri_1   |     self.dialect.do_execute(
kolibri_1   |   File "/usr/local/lib/python3.8/site-packages/kolibri/dist/sqlalchemy/engine/default.py", line 590, in do_execute
kolibri_1   |     cursor.execute(statement, parameters)
kolibri_1   | sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: character varying = integer
kolibri_1   | LINE 3: ...eduledjobs.queue = 'kolibri' AND scheduledjobs.id = 0) AS an...
kolibri_1   |                                                              ^
kolibri_1   | HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

@aronasorman
Copy link
Collaborator

It's weird, i'm not sure how a content import change makes a (seemingly unrelated) iceqube error.

@aronasorman
Copy link
Collaborator

I'm guessing scheduledjobs.id has been changed to a string from an integer.

@rtibbles rtibbles force-pushed the remote_content branch 2 times, most recently from ca3c659 to 733057d Compare April 28, 2021 22:49
@rtibbles rtibbles marked this pull request as ready for review April 28, 2021 22:50
@rtibbles rtibbles changed the title Remote content serving proof of concept Remote content serving Apr 28, 2021
@rtibbles rtibbles added TODO: needs review Waiting for review and removed work-in-progress Not ready for review labels Apr 28, 2021
@rtibbles rtibbles force-pushed the remote_content branch 2 times, most recently from 950ea01 to a6c57de Compare May 6, 2021 01:42
@aronasorman
Copy link
Collaborator

aronasorman commented May 10, 2021

I seem to be stuck with a certain migration, even after clearing all kinds of data:

kolibri_1   |   Applying morango.0005_auto_20170629_2139... OK
kolibri_1   |   Applying morango.0006_instanceidmodel_system_id... OK
kolibri_1   |   Applying morango.0007_auto_20171018_1615... OK
kolibri_1   |   Applying morango.0008_auto_20171114_2217... OK
kolibri_1   |   Applying morango.0009_auto_20171205_0252... OK
kolibri_1   |   Applying morango.0010_auto_20171206_1615... OK
kolibri_1   |   Applying morango.0011_sharedkey... OK
kolibri_1   |   Applying morango.0012_auto_20180927_1658... OK
kolibri_1   |   Applying morango.0013_auto_20190627_1513... OK
kolibri_1   |   Applying morango.0014_syncsession_extra_fields... OK
kolibri_1   |   Applying morango.0015_auto_20200508_2104... OK
kolibri_1   |   Applying morango.0016_store_deserialization_error... OK
kolibri_1   |   Applying notifications.0001_initial... OK
kolibri_1   |   Applying notifications.0002_notificationslog... OK
kolibri_1   |   Applying notifications.0003_learnerprogressnotification_quiz_num_correct... OK
kolibri_1   |   Applying notifications.0004_learnerprogressnotification_quiz_num_answered... OK
kolibri_1   |   Applying notifications.0005_learnerprogressnotification_assignment_collections... OK

It doesn't seem to indicate though on what migration it's stuck on. Currently blocked for 20 minutes as of this writing!

@rtibbles
Copy link
Member Author

Any info from pg-stat?

@aronasorman
Copy link
Collaborator

Did a final test. Kolibri is running and content is served without downloading!

@rtibbles rtibbles merged commit 1269eef into learningequality:develop May 21, 2021
@rtibbles rtibbles deleted the remote_content branch May 21, 2021 15:30
@rtibbles rtibbles mentioned this pull request Aug 2, 2021
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow serving of content from a remote URL
2 participants