-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import content issue fixes #9242
Import content issue fixes #9242
Conversation
Properly break from file transfer loop when canceled. Cancel all futures when canceled.
Add upper bound to number of workers. Update tests for additional cancel checks.
@radinamatic I tested this on both the Windows and Linux builds and both channel import cancelation and simultaneous channel import are functioning correctly without any noticeable issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual QA passes, good to go! 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my pov, the code is fine.
Canceling a job works perfectly, no matter the size of the channel or how many channels are being downloaded.
However I've not been able to import five channels at the same time. In all the ocasions I had four channels being imported, one of them failed when I tried to import a new one. This is the flow I saw:
- Begin to import up to 4 channels, some of them big
- Select another channel to import
- Then I saw errors in the console while this screen was in the browser:
- One of the channels being imported failed (it happened whenver I tried to import the fifth):
again:
The traceback in the console is always the same (a db locking is happening):
INFO: 127.0.0.1 - - "GET /api/content/remotechannel/0a3446937e3340fa86e6010ba80e16e1/" 200 0 "http://localhost:8080/en/device/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"
ERROR An error occurred during content import: (sqlite3.OperationalError) database is locked
[SQL: UPDATE jobs SET saved_job=? WHERE jobs.id = ?]
[parameters: ('{"job_id": "53fd3a8f5246429dba17e782a4ea5695", "state": "RUNNING", "exception": null, "traceback": null, "track_progress": true, "cancellable": true, ... (633 characters truncated) ... _id": null, "node_ids": ["c9d7f950ab6b5a1199e3d6c10d7f0103"], "exclude_node_ids": []}, "func": "django.core.management.call_command", "result": null}', '53fd3a8f5246429dba17e782a4ea5695')]
(Background on this error at: http://sqlalche.me/e/e3q8)
INFO Setting availability to True of 220 LocalFile objects based on passed in checksums
INFO Setting availability of non-topic ContentNode objects based on LocalFile availability in 6 batches of 10000
INFO Annotating ContentNode objects with children for 6 levels
INFO Annotating ContentNode objects with children for level 6
INFO Annotating ContentNode objects with children for level 5
INFO Annotating ContentNode objects with children for level 4
INFO Annotating ContentNode objects with children for level 3
INFO Annotating ContentNode objects with children for level 2
INFO Annotating ContentNode objects with children for level 1
ERROR Job 53fd3a8f5246429dba17e782a4ea5695 raised an exception: Traceback (most recent call last):
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
self.dialect.do_execute(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/datos/le/mio/kolibri/kolibri/core/tasks/worker.py", line 61, in handle_finished_future
result = future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/datos/le/mio/kolibri/kolibri/core/tasks/job.py", line 117, in execute_job
result = func(*args, **kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 131, in call_command
return command.execute(*args, **defaults)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
File "/datos/le/mio/kolibri/kolibri/core/tasks/management/commands/base.py", line 119, in handle
return self.handle_async(*args, **options)
File "/datos/le/mio/kolibri/kolibri/core/content/management/commands/importcontent.py", line 532, in handle_async
self.download_content(
File "/datos/le/mio/kolibri/kolibri/core/content/management/commands/importcontent.py", line 164, in download_content
self._transfer(
File "/datos/le/mio/kolibri/kolibri/core/content/management/commands/importcontent.py", line 476, in _transfer
raise self.exception
File "/datos/le/mio/kolibri/kolibri/core/content/management/commands/importcontent.py", line 406, in _transfer
overall_progress_update(data_transferred)
File "/datos/le/mio/kolibri/kolibri/core/tasks/management/commands/base.py", line 62, in update_progress
self.update_callback(p.progress_fraction, p)
File "/datos/le/mio/kolibri/kolibri/core/tasks/management/commands/base.py", line 115, in _update_all_progress
self.update_progress(progress_list[0].progress_fraction, 1.0)
File "/datos/le/mio/kolibri/kolibri/core/tasks/management/commands/base.py", line 123, in update_progress
self.job.update_progress(progress_fraction, total_progress)
File "/datos/le/mio/kolibri/kolibri/core/tasks/job.py", line 254, in update_progress
self.storage.update_job_progress(self.job_id, progress, total_progress)
File "/datos/le/mio/kolibri/kolibri/core/tasks/storage.py", line 263, in update_job_progress
self._update_job(job_id, progress=progress, total_progress=total_progress)
File "/datos/le/mio/kolibri/kolibri/core/tasks/storage.py", line 302, in _update_job
return job, orm_job
File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/datos/le/mio/kolibri/kolibri/core/tasks/storage.py", line 65, in session_scope
session.commit()
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1042, in commit
self.transaction.commit()
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 504, in commit
self._prepare_impl()
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
self.session.flush()
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
self._flush(objects)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
transaction.rollback(_capture_exception=True)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
flush_context.execute()
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
persistence.save_obj(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 230, in save_obj
_emit_update_statements(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 994, in _emit_update_statements
c = cached_connections[connection].execute(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1020, in execute
return meth(self, multiparams, params)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
ret = self._execute_context(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1323, in _execute_context
self._handle_dbapi_exception(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1517, in _handle_dbapi_exception
util.raise_(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context
self.dialect.do_execute(
File "/datos/le/mio/kolibri/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: UPDATE jobs SET saved_job=? WHERE jobs.id = ?]
[parameters: ('{"job_id": "53fd3a8f5246429dba17e782a4ea5695", "state": "RUNNING", "exception": null, "traceback": null, "track_progress": true, "cancellable": true, ... (633 characters truncated) ... _id": null, "node_ids": ["c9d7f950ab6b5a1199e3d6c10d7f0103"], "exclude_node_ids": []}, "func": "django.core.management.call_command", "result": null}', '53fd3a8f5246429dba17e782a4ea5695')]
(Background on this error at: http://sqlalche.me/e/e3q8)
I'm aproving this PR because it solves the issue that it mentions. There is absolutely no resource limits problems, but a traditional db lack of concurrency. In any case it would be good to beware of this db problem. I don't think there's much to do with it, excepting advicing to switch to Postgresql when such concurrent work is a need.
|
||
if self.is_cancelled(): | ||
break | ||
max_workers = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this relate with the two new options REGULAR_PRIORITY_WORKERS
and HIGH_PRIORITY_WORKERS
, i.e. what happens if REGULAR_PRIORITY_WORKERS + HIGH_PRIORITY_WORKERS > 10
. It seems these values are ignored according to the below code, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those options are only used to estimate how many total content import jobs could be running at any one time - as these additional workers are being run within the context of a single worker.
@jredrejo thanks - it seems like we have an existing concurrency issue, which is meant to be fixed by the DB write lock machinery that we have in place, but seems like it's still not functioning as needed. I think we need a better solution for essentially waiting as long as we need to within async tasks for the SQLite DB to be available for writing. |
Summary
References
Fixes #9229
…
Reviewer guidance
Import all of a Khan Academy channel. Cancel the task, it should cancel quickly.
Attempt to run imports of at least 5 channels simultaneously with the ulimit set to 256 - confirm that no errors occur.
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)